I'm trying to evaluate/test how well my data fits a particular distribution.
有几个问题,我被告知使用 scipy.stats.kstest 或 scipy.stats.ks_2samp . 这似乎很简单,给它:(A)数据; (2)分配; (3)拟合参数 . 唯一的问题是我的结果不适合不同的发行版但是从 kstest 的输出中,我不知道我是否可以这样做?
“[SciPy]包含K-S”
“第一个值是测试统计数据,第二个值是p值 . 如果p值小于95(显着性水平为5%),这意味着你不能拒绝Null-Hypothese那个两个样本分布完全相同 . “
np.random.seed(2)
# Sample from a normal distribution w/ mu: -50 and sigma=1
x = np.random.normal(loc=-50, scale=1, size=100)
x
#array([-50.41675785, -50.05626683, -52.1361961 , -48.35972919,
# -51.79343559, -50.84174737, -49.49711858, -51.24528809,
# -51.05795222, -50.90900761, -49.44854596, -47.70779199,
# ...
# -50.4635, -49.64911151, -49.61813377, -49.43372456,
# -49.79579202, -48.59330376, -51.7379595 , -48.95917605,
# -49.61952803, -50.21713527, -48.8264685 , -52.34360319])
# Try against a Gamma Distribution
distribution = "gamma"
distr = getattr(stats, distribution)
params = distr.fit(x)
stats.kstest(x,distribution,args=params)
KstestResult(statistic=0.078494356486987549, pvalue=0.55408436218441004)
A p_value of pvalue=0.55408436218441004 is saying that the normal and gamma sampling are from the same distirbutions?
现在反对正常分布:
# Try against a Normal Distribution
distribution = "norm"
distr = getattr(stats, distribution)
params = distr.fit(x)
stats.kstest(x,distribution,args=params)
KstestResult(statistic=0.070447707170256002, pvalue=0.70801104133244541)
根据这个,如果我采用最低的p_值,那么 I would conclude my data came from a gamma distribution even though they are all negative values?
np.random.seed(0)
distr = getattr(stats, "norm")
x = distr.rvs(loc=0, scale=1, size=50)
params = distr.fit(x)
stats.kstest(x,"norm",args=params, N=1000)
KstestResult(statistic=0.058435890774587329, pvalue=0.99558592119926814)
This means at a 5% level of significance, I can reject the null hypothesis that distributions are identical. So I conclude they are different but they clearly aren't? 我是否错误地解释了这个?如果我把它设为单尾,是否会使它越大,它们来自同一分布的可能性越大?