====== 描述与假设检验 ====== >在线运行R:[[https://rdrr.io/snippets/]] [[https://learningstatisticswithr.com/|R入门教程]] >[[https://repl.it/languages/rlang]] >R 基础[[stat:3descntests:rsnipp]] >[[https://rstudio.cloud/projects]] >Scipy.stats:[[https://docs.scipy.org/doc/scipy/reference/stats.html]] >statsmodels:[[http://www.statsmodels.org/stable/py-modindex.html]] >DataFrame:[[http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame]] >sklearn:[[http://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model]] >Matplotlib subplot ax: [[https://matplotlib.org/api/axes_api.html]] >Seaborn [[https://seaborn.pydata.org/api.html]] >py-Bokeh作图包:[[http://bokeh.pydata.org/en/latest/docs/user_guide.html]] >CheatSheet:[[https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf]] ====CSV操作==== [[http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html|Pandas.read_csv]] [[http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html|pandas.to_csv]] df = pd.read_csv('data.csv', encoding='utf-8') df.to_csv('data.csv') R df<-read.table("save.dat") df<-read.csv("file.csv") write.table ====groupby操作==== [[http://blog.csdn.net/shenshaoqiu/article/details/78888602]] =====数据初始处理===== ^ Name ^ Python ^ R ^ 备注 ^ ^ 统计描述·describe | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.describe.html#scipy.stats.describe|stats.describe]] | summary() misc包describe() psych包describe() pastecs包stat.desc() | | ^ 算数均数·tmean | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.tmean.html#scipy.stats.tmean|stats.tmean]] | | | ^ 方差·variance | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.tvar.html#scipy.stats.tvar|stats.tvar]] | | | ^ 标准差·standard deviation | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.tstd.html#scipy.stats.tstd|stats.tstd]] | | | ^ 标准误·sem | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.tsem.html#scipy.stats.tsem|stats.tsem]] | | | ^ 变异系数·variation | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.variation.html#scipy.stats.variation|stats.variation]] | | | ^ 几何均数·gmean | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gmean.html#scipy.stats.gmean|stats.gmean]] | | | ^ 贝叶斯均数 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.bayes_mvs.html#scipy.stats.bayes_mvs|stats.bayes_mvs]] | | | ^ 调和平均数·hmean | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.hmean.html#scipy.stats.hmean|stats.hmean]] | | | ^ 减尾后均数·trim_mean | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.trim_mean.html#scipy.stats.trim_mean|stats.trim_mean]] | | | ^ 峰度·kurtosis | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html#scipy.stats.kurtosis|stats.kurtosis]] | | | ^ 偏度·skewness | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skew.html#scipy.stats.skew|stats.skew]] | | | ^ 查找重复值 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.find_repeats.html#scipy.stats.find_repeats|stats.find_repeats]] | | | ^ 双减尾 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.trimboth.html#scipy.stats.trimboth|stats.trimboth]] | | | ^ 单减尾 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.trim1.html#scipy.stats.trim1|stats.trim1]] | | | ===分布检验=== ^ Name ^ Python ^ R ^ 备注 ^ ^ normaltest | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.normaltest.html#scipy.stats.normaltest|stats.normaltest]] | | | ^ Shapiro-Wilk test for normality | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html#scipy.stats.shapiro|stats.shapiro]] | | 正态性 | ^ Kolmogorov-Smirnov test·KS检验 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html#scipy.stats.kstest|stats.kstest]] | | 分布检验 需要连续分布资料 | | ::: | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html#scipy.stats.ks_2samp|stats.ks_2samp]] | | 比较分布 | ^ Anderson-Darling test | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson.html#scipy.stats.anderson|stats.anderson]] | | 修改版的KS检验 | | ::: | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson_ksamp.html#scipy.stats.anderson_ksamp|stats.anderson_ksamp]] | | | ^ kurtosistest | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosistest.html#scipy.stats.kurtosistest|stats.kurtosistest]] | | | ^ skewtest | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skewtest.html#scipy.stats.skewtest|stats.skewtest]] | | | | |||| ^ O’Brien transform 方差齐性 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.obrientransform.html#scipy.stats.obrientransform|stats.obrientransform]] | | | ^ Bartlett’s test for equal variances | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.bartlett.html#scipy.stats.bartlett|stats.bartlett]] | | | ^ Levene test for equal variances | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.levene.html#scipy.stats.levene|stats.levene]] | | 显著非正态时 | ^ Jarque-Bera | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.jarque_bera.html#scipy.stats.jarque_bera|stats.jarque_bera]] | | | ^ Fligner-Killeen test for equality of variance | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.fligner.html#scipy.stats.fligner|stats.fligner]] | | | =====假设检验===== ^ Name ^ 名称 ^ Python ^ R ^ 备注 ^ ^ Student's t-test | t检验 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_1samp.html#scipy.stats.ttest_1samp|stats.ttest_1samp]] | | | | ::: | ::: | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html#scipy.stats.ttest_ind|stats.ttest_ind]] | | | | ::: | ::: | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind_from_stats.html#scipy.stats.ttest_ind_from_stats|stats.ttest_ind_from_stats]] | | | | ::: | ::: | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html#scipy.stats.ttest_rel|stats.ttest_rel]] | | | ^ ANOVA | 方差分析 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html#scipy.stats.f_oneway|stats.f_oneway]] | | 独立、正态、方差齐性 | | | 方差不齐的多组比较 | [[https://stats.stackexchange.com/questions/91872/alternatives-to-one-way-anova-for-heteroskedastic-data|heteroskedastic]][[https://stats.stackexchange.com/questions/56971/alternative-to-one-way-anova-unequal-variance|alter]] | | | ^ chisquare | 卡方检验 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html#scipy.stats.chisquare|stats.chisquare]] | | | ^ chi2_contingency | 列联表 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html#scipy.stats.chi2_contingency|stats.chi2_contingency]] | | | ^ Fisher exact test 2x2 | | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.fisher_exact.html#scipy.stats.fisher_exact|stats.fisher_exact]] | | | ^ friedmanchisquare | | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.friedmanchisquare.html#scipy.stats.friedmanchisquare|stats.friedmanchisquare]] | | | ^ combine_pvalues | 合并p值 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.combine_pvalues.html#scipy.stats.combine_pvalues|stats.combine_pvalues]] | | | ^ pearsonr | 皮尔逊相关系数 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html#scipy.stats.pearsonr|stats.pearsonr]] | | | ^ pointbiserialr | | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pointbiserialr.html#scipy.stats.pointbiserialr|stats.pointbiserialr]] | | | ^ linregress | 线性最小二乘回归 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html#scipy.stats.linregress|stats.linregress]] | | | | ||||| ^ rankdata | | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rankdata.html#scipy.stats.rankdata|stats.rankdata]] | | | ^ Wilcoxon signed-rank test | | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html#scipy.stats.wilcoxon|stats.wilcoxon]] | | | ^ Wilcoxon two samples | | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ranksums.html#scipy.stats.ranksums|stats.ranksums]] | | 连续分布 | ^ mannwhitneyu | | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html#scipy.stats.mannwhitneyu|stats.mannwhitneyu]] | | 连续分布 | ^ spearmanr | 斯皮尔曼等级相关 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html#scipy.stats.spearmanr|stats.spearmanr]] | | | ^ kendalltau | | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kendalltau.html#scipy.stats.kendalltau|stats.kendalltau]] | | | ^ weightedtau | | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.weightedtau.html#scipy.stats.weightedtau|stats.weightedtau]] | | | ^ tiecorrect | U/KW检验校正 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.tiecorrect.html#scipy.stats.tiecorrect|stats.tiecorrect]] | | | ^ Kruskal-Wallis H-test | | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html#scipy.stats.kruskal|stats.kruskal]] | | 非参版方差分析 | ^ Bernoulli experiment | 伯努利试验 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom_test.html#scipy.stats.binom_test|stats.binom_test]] | | | ^ Mood’s median test | 中位数检验 | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.median_test.html#scipy.stats.median_test|stats.median_test]] | | | ^ Mood’s test for equal scale parameters | | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mood.html#scipy.stats.mood|stats.mood]] | | | | ||||| ^ Box-Cox | | | | | ^ Wasserstein distance | | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wasserstein_distance.html#scipy.stats.wasserstein_distance|stats.wasserstein_distance]] | | | ^ energy distance | | [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.energy_distance.html#scipy.stats.energy_distance|stats.energy_distance]] | | | ====多重比较==== [[http://www.statsmodels.org/stable/stats.html#module-statsmodels.stats.multitest]]\\ [[http://www.statsmodels.org/stable/generated/statsmodels.sandbox.stats.multicomp.multipletests.html#statsmodels.sandbox.stats.multicomp.multipletests|statsmodels.sandbox.stats.multicomp.multipletests]]\\ [[http://www.statsmodels.org/stable/generated/statsmodels.sandbox.stats.multicomp.MultiComparison.html#statsmodels.sandbox.stats.multicomp.MultiComparison|statsmodels.sandbox.stats.multicomp.MultiComparison]]\\ 方法比较[[http://www.medsci.cn/article/show_article.do?id=73743429933]]\\ Python post hoc包:[[https://github.com/maximtrp/scikit-posthocs]]\\ |两两比较|LSD检验,Bonferroni法,tukey法,Scheffe法,SNK法| ====方法选择==== 1.连续数据,正态分布,线性关系,用pearson相关系数是最恰当,当然用spearman相关系数也可以, 就是效率没有pearson相关系数高。 2.上述任一条件不满足,就用spearman相关系数,不能用pearson相关系数。 3.两个定序测量数据之间也用spearman相关系数,不能用pearson相关系数。 用pearson处理的数据,必须满足一下条件:成对数据、连续、整体是正态分布的。 其实, Spearman 和Pearson相关系数在算法上完全相同. 只是PEARSON相关系数是用原来的数值计算积差相关系数, 而SPEARMAN是用原来数值的秩次计算积差相关系数. ====方法选用标准==== |数据是否线性| | |方差齐不齐|Satterthwate Wilcoxon| |分布正态否|Bonferroni法校正P值 Wilcoxon检验 Fridman| |多组完全随机|Kruscal-Wallis| Fisher最小显著差异法(Fisher's Least Significant Difference test ) 学生t检验(Student's t-test) 曼-惠特尼 U 检定(Mann-Whitney U) 回归分析(regression analysis) 相关性(correlation) 皮尔森积矩相关系数(Pearson product-moment correlation coefficient) 史匹曼等级相关系数(Spearman's rank correlation coefficient ) 卡方分布(chi-square ) ====写作==== [[stat:tests:chiq]]