ggplot的boxplot/violin plot添加显著性 | Add P-values and Significance Levels to ggplots | 定量分析
2022年10月11日
显著性符号的意义
Symbol | Meaning |
ns | P > 0.05 |
* | P ≤ 0.05 |
** | P ≤ 0.01 |
*** | P ≤ 0.001 |
**** | P ≤ 0.0001 |
参考:What is the meaning of * or ** or *** in reports of statistical significance from Prism or InStat?
文章里的描述:Expression of regional markers TOX3 (r5) and PRDX6 (r6) between HCO and HBSO. Wilcoxon rank sum test was applied. P-values < 0.0001 is marked by ****.
2022年10月07日
越高分的文章,对数据的严格的统计处理就越重要,存在一些经典必用的图:
- boxplot,自带四分位信息,最好加上jitter让人看到你的数据点
- violin plot,在单细胞里很火,可以直接看到数据的分布,可以叠加boxplot使用
- 线性拟合回归,lm,我们目前绝对无法handle非线性的回归
这些经典分析必须搭配显著性测试,必须在图里显示P-value,或者P-value对应的符号(*、**、***、NS)。
目前在ggplot里添加显著性的主要方法:
- ggpubr的stat_compare_means,也是最为主流的方法。
- 优点:可以应对多组比对,理论上可以无限标注分组,快速
- 缺点:数据处理整合在ggplot函数里,无法自定义处理数据
- ggpubr的第二个函数stat_pvalue_manual,适合有一个固定的ref
- 优点:可以自定义处理数据
- 缺点:没有comparisons参数,多组比较不够自如
- 自己检验,然后用geom_text手动标注
- 优点:完全自定义,随心所欲,超批量作图
- 缺点:不适合多组比对,适合两组对比
难点:
- 在数据零值过多时,所有scale都失效了,此时只能用geom_jitter来显示差异【2022-10-08】
- 两个control,HSCR paper里,必须后期用inkscape来修改,也算方便。
- facet_wrap可以做到真正的free_y,但stat_compare_means跟不上,所以得用stat_pvalue_manual(参考:Data_center/analysis/HBSO_7Ala_Elly_2022_Jul/neuronal_subset.ipynb)
- facet_grid里无法做到真正的free_y,可以用facet_wrap和cowplot的拼图来实现
- 也可以将数据scale到0-1,这样label时就可以使用固定参数(参考:EllyLab/mouse/singleCell/case/Vcl_ENCC/Vcl_ENCCs_aggregate_analysis.ipynb)
方法一的方法参考下面的文章
方法二:
- Add Manually P-values to a ggplot
- How to Add p-values onto ggplot
- T-test() does not add adjusted p-values
参考:Data_center/analysis/HBSO_7Ala_Elly_2022_Jul/Seurat_integration_all.ipynb
方法三:
现实案例代码:EllyLab/mouse/singleCell/case/Vcl_ENCC/Vcl_ENCCs_aggregate_analysis.ipynb【会封装为函数】
参考:Add P-values and Significance Levels to ggplots
ggpubr的包比较局限,能用的test也比较局限,但是做起来快速简单。
当情况特殊时ggpubr就不能用了,可以自己做了显著性test之后再显示在图上。
# show lable in facet grid plot dat_text <- data.frame() for (i in names(paired_list)) { # Compute t-test res <- t.test(value ~ group, data = paired_list[[i]], paired = TRUE) dat_text <- rbind(dat_text, data.frame(variable=i, pvalue=res$p.value)) } dat_text$label <- paste("P", round(dat_text$pvalue, 3), sep="=") dat_text[dat_text$pvalue<0.05 & dat_text$pvalue>0.01,]$label <- paste("*", dat_text[dat_text$pvalue<0.05 & dat_text$pvalue>0.01,]$label, sep=" ") dat_text[dat_text$pvalue<0.01,]$label <- paste("**", "P<0.01", sep=" ") library(ggplot2) options(repr.plot.width=8, repr.plot.height=12) # 8x8 g2 <- ggplot(data=genes_expr_melt, aes(x=pseudotime, y=value, fill=group, color=group)) + geom_point(size=0.01, alpha=0.5, aes(color=group, fill=group)) + labs(x = "Pseudotime", y = "Relative expression", title = "Neuronal lineage") + geom_smooth(method = 'loess',se=F,size=0.15,span = 0.7) + # ,alpha=0.05, weight=0.1, facet_wrap( ~ variable, ncol=3, labeller = label_context, scales = "free_y") + # geom_text(size = 5, data = dat_text, mapping = aes(x = Inf, y = Inf, label = label), hjust = 1.05,vjust = 1.5, color=ifelse(dat_text$pvalue < 0.05,'red','black')) + # themes theme(strip.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.spacing=unit(.4, "lines"),panel.border = element_rect(color = "black", fill = NA, size = 0.5))+ theme(axis.text.x = element_text(face="plain", angle=0, size = 10, color = "black", vjust=0.5), axis.text.y = element_text(face="plain", size = 10, color = "black"), axis.title =element_text(size = 15)) + theme(strip.background = element_rect(fill = "gray90", color = NA))+ # theme(legend.position = "none") + # must remove legend theme(strip.placement = "outside", strip.text.x = element_text(face="plain", size = 13), strip.text.y = element_text(face="plain", size = 11)) + theme(strip.text.x = element_text(margin = margin(1,0,1,0, "mm"))) + scale_color_manual(values=c("deepskyblue","red","gray50")) + scale_fill_manual(values=c("deepskyblue","red","gray50")) plot(g2) # }
多组比较,挑选感兴趣的显示显著性。
data("ToothGrowth") head(ToothGrowth)
library(ggpubr) my_comparisons <- list( c("0.5", "1"), c("1", "2"), c("0.5", "2") ) options(repr.plot.width=4, repr.plot.height=4) ggplot(ToothGrowth, aes(x=as.character(dose), y=len, fill=dose)) + geom_boxplot(outlier.size=NA, size=0.01, outlier.shape = NA) + geom_jitter(width = 0.3, size=0.01) +# , aes(color=supp) + stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value stat_compare_means(label.y = 50, label.x = 1.5) # Add global p-value
还可以设定一个ref group来显示显著性差异,只需要改一下设定。
stat_compare_means(method = "anova", label.y = 1.3, label.x = 3)+ # Add pairwise comparisons p-value # # Add global p-value stat_compare_means(label = "p.signif", method = "t.test", ref.group = "hNP-D20", label.y = 1.1) +
生物学的强烈推荐看看Y叔的公众号里的统计相关的文章,非常的基础和实用。
统计
- Five things biologists should know about statistics
- 什么是T检验
- 富集基因之注释缺失
- 落入窠臼
- 你昨天才做的分析,可能是几年前的结果!
- 掐架的额外收获
- boxplot
- 如何告别单身
- 主成分分析
- 一文解决RT-PCR的统计分析
代码例子:
options(repr.plot.width=7, repr.plot.height=6) # facet boxplot bp <- ggplot(expr_data2, aes(x=group, y=expression, fill=NA)) + geom_boxplot(outlier.size=NA, size=0.01, outlier.shape = NA) + geom_jitter(width = 0.3, size=0.01, aes(color=cluster)) + # + geom_boxplot( + facet_grid( cluster ~ gene, switch="y") + # , scales = "free" theme_bw() + stat_compare_means(aes(group = group, label = ..p.signif..), label.x = 1.3,label.y = 1.3, method = "wilcox.test", hide.ns = T) + # label = "p.format", theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + labs(x = "", y = "", title = "") + theme(panel.spacing=unit(.3, "lines"),panel.border = element_rect(color = "black", fill = NA, size = 0.2)) + theme(axis.ticks.x = element_blank(), axis.ticks = element_line(size = 0.1), axis.text.x = element_text(face="plain", angle=90, size = 8, color = "black", vjust=0.5), axis.text.y = element_text(face="plain", size = 4, color = "black"), axis.title =element_text(size = 12)) + theme(strip.background = element_rect(fill = "gray97", color = NA))+ theme(legend.position = "none") + theme(strip.placement = "outside", strip.text.x = element_text(face="italic", size = 11), strip.text.y = element_text(face="plain", size = 11)) + scale_y_continuous(position="right", limits = c(-0.5,1.5)) + scale_fill_manual(values=brewer.pal(8,"Set2")[c(2,3,7,1,5,6)]) + scale_color_manual(values=brewer.pal(8,"Set2")[c(2,3,7,1,5,6)]) bp