ggplot的boxplot/violin plot添加显著性 | Add P-values and Significance Levels to ggplots | 定量分析
2022年10月11日
显著性符号的意义
Symbol | Meaning |
ns | P > 0.05 |
* | P ≤ 0.05 |
** | P ≤ 0.01 |
*** | P ≤ 0.001 |
**** | P ≤ 0.0001 |
参考:What is the meaning of * or ** or *** in reports of statistical significance from Prism or InStat?
文章里的描述:Expression of regional markers TOX3 (r5) and PRDX6 (r6) between HCO and HBSO. Wilcoxon rank sum test was applied. P-values < 0.0001 is marked by ****.
2022年10月07日
越高分的文章,对数据的严格的统计处理就越重要,存在一些经典必用的图:
- boxplot,自带四分位信息,最好加上jitter让人看到你的数据点
- violin plot,在单细胞里很火,可以直接看到数据的分布,可以叠加boxplot使用
- 线性拟合回归,lm,我们目前绝对无法handle非线性的回归
这些经典分析必须搭配显著性测试,必须在图里显示P-value,或者P-value对应的符号(*、**、***、NS)。
目前在ggplot里添加显著性的主要方法:
- ggpubr的stat_compare_means,也是最为主流的方法。
- 优点:可以应对多组比对,理论上可以无限标注分组,快速
- 缺点:数据处理整合在ggplot函数里,无法自定义处理数据
- ggpubr的第二个函数stat_pvalue_manual,适合有一个固定的ref
- 优点:可以自定义处理数据
- 缺点:没有comparisons参数,多组比较不够自如
- 自己检验,然后用geom_text手动标注
- 优点:完全自定义,随心所欲,超批量作图
- 缺点:不适合多组比对,适合两组对比
难点:
- 在数据零值过多时,所有scale都失效了,此时只能用geom_jitter来显示差异【2022-10-08】
- 两个control,HSCR paper里,必须后期用inkscape来修改,也算方便。
- facet_wrap可以做到真正的free_y,但stat_compare_means跟不上,所以得用stat_pvalue_manual(参考:Data_center/analysis/HBSO_7Ala_Elly_2022_Jul/neuronal_subset.ipynb)
- facet_grid里无法做到真正的free_y,可以用facet_wrap和cowplot的拼图来实现
- 也可以将数据scale到0-1,这样label时就可以使用固定参数(参考:EllyLab/mouse/singleCell/case/Vcl_ENCC/Vcl_ENCCs_aggregate_analysis.ipynb)
方法一的方法参考下面的文章
方法二:
- Add Manually P-values to a ggplot
- How to Add p-values onto ggplot
- T-test() does not add adjusted p-values
参考:Data_center/analysis/HBSO_7Ala_Elly_2022_Jul/Seurat_integration_all.ipynb
方法三:
现实案例代码:EllyLab/mouse/singleCell/case/Vcl_ENCC/Vcl_ENCCs_aggregate_analysis.ipynb【会封装为函数】
参考:Add P-values and Significance Levels to ggplots
ggpubr的包比较局限,能用的test也比较局限,但是做起来快速简单。
当情况特殊时ggpubr就不能用了,可以自己做了显著性test之后再显示在图上。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | # show lable in facet grid plot dat_text <- data.frame () for (i in names (paired_list)) { # Compute t-test res <- t.test (value ~ group, data = paired_list[[i]], paired = TRUE ) dat_text <- rbind (dat_text, data.frame (variable=i, pvalue=res$p.value)) } dat_text$label <- paste ( "P" , round (dat_text$pvalue, 3), sep= "=" ) dat_text[dat_text$pvalue<0.05 & dat_text$pvalue>0.01,]$label <- paste ( "*" , dat_text[dat_text$pvalue<0.05 & dat_text$pvalue>0.01,]$label, sep= " " ) dat_text[dat_text$pvalue<0.01,]$label <- paste ( "**" , "P<0.01" , sep= " " ) library (ggplot2) options (repr.plot.width=8, repr.plot.height=12) # 8x8 g2 <- ggplot (data=genes_expr_melt, aes (x=pseudotime, y=value, fill=group, color=group)) + geom_point (size=0.01, alpha=0.5, aes (color=group, fill=group)) + labs (x = "Pseudotime" , y = "Relative expression" , title = "Neuronal lineage" ) + geom_smooth (method = 'loess' ,se=F,size=0.15,span = 0.7) + # ,alpha=0.05, weight=0.1, facet_wrap ( ~ variable, ncol=3, labeller = label_context, scales = "free_y" ) + # geom_text (size = 5, data = dat_text, mapping = aes (x = Inf , y = Inf , label = label), hjust = 1.05,vjust = 1.5, color= ifelse (dat_text$pvalue < 0.05, 'red' , 'black' )) + # themes theme (strip.background = element_blank (), panel.grid.major = element_blank (), panel.grid.minor = element_blank (), panel.spacing= unit (.4, "lines" ),panel.border = element_rect (color = "black" , fill = NA , size = 0.5))+ theme (axis.text.x = element_text (face= "plain" , angle=0, size = 10, color = "black" , vjust=0.5), axis.text.y = element_text (face= "plain" , size = 10, color = "black" ), axis.title = element_text (size = 15)) + theme (strip.background = element_rect (fill = "gray90" , color = NA ))+ # theme(legend.position = "none") + # must remove legend theme (strip.placement = "outside" , strip.text.x = element_text (face= "plain" , size = 13), strip.text.y = element_text (face= "plain" , size = 11)) + theme (strip.text.x = element_text (margin = margin (1,0,1,0, "mm" ))) + scale_color_manual (values= c ( "deepskyblue" , "red" , "gray50" )) + scale_fill_manual (values= c ( "deepskyblue" , "red" , "gray50" )) plot (g2) # } |
多组比较,挑选感兴趣的显示显著性。
1 2 | data ( "ToothGrowth" ) head (ToothGrowth) |
1 2 3 4 5 6 7 8 | library (ggpubr) my_comparisons <- list ( c ( "0.5" , "1" ), c ( "1" , "2" ), c ( "0.5" , "2" ) ) options (repr.plot.width=4, repr.plot.height=4) ggplot (ToothGrowth, aes (x= as.character (dose), y=len, fill=dose)) + geom_boxplot (outlier.size= NA , size=0.01, outlier.shape = NA ) + geom_jitter (width = 0.3, size=0.01) + # , aes(color=supp) + stat_compare_means (comparisons = my_comparisons)+ # Add pairwise comparisons p-value stat_compare_means (label.y = 50, label.x = 1.5) # Add global p-value |
还可以设定一个ref group来显示显著性差异,只需要改一下设定。
1 2 3 | stat_compare_means (method = "anova" , label.y = 1.3, label.x = 3)+ # Add pairwise comparisons p-value # # Add global p-value stat_compare_means (label = "p.signif" , method = "t.test" , ref.group = "hNP-D20" , label.y = 1.1) + |
生物学的强烈推荐看看Y叔的公众号里的统计相关的文章,非常的基础和实用。
统计
- Five things biologists should know about statistics
- 什么是T检验
- 富集基因之注释缺失
- 落入窠臼
- 你昨天才做的分析,可能是几年前的结果!
- 掐架的额外收获
- boxplot
- 如何告别单身
- 主成分分析
- 一文解决RT-PCR的统计分析
代码例子:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | options (repr.plot.width=7, repr.plot.height=6) # facet boxplot bp <- ggplot (expr_data2, aes (x=group, y=expression, fill= NA )) + geom_boxplot (outlier.size= NA , size=0.01, outlier.shape = NA ) + geom_jitter (width = 0.3, size=0.01, aes (color=cluster)) + # + geom_boxplot( + facet_grid ( cluster ~ gene, switch= "y" ) + # , scales = "free" theme_bw () + stat_compare_means ( aes (group = group, label = ..p.signif..), label.x = 1.3,label.y = 1.3, method = "wilcox.test" , hide.ns = T) + # label = "p.format", theme (panel.grid.major = element_blank (), panel.grid.minor = element_blank ()) + labs (x = "" , y = "" , title = "" ) + theme (panel.spacing= unit (.3, "lines" ),panel.border = element_rect (color = "black" , fill = NA , size = 0.2)) + theme (axis.ticks.x = element_blank (), axis.ticks = element_line (size = 0.1), axis.text.x = element_text (face= "plain" , angle=90, size = 8, color = "black" , vjust=0.5), axis.text.y = element_text (face= "plain" , size = 4, color = "black" ), axis.title = element_text (size = 12)) + theme (strip.background = element_rect (fill = "gray97" , color = NA ))+ theme (legend.position = "none" ) + theme (strip.placement = "outside" , strip.text.x = element_text (face= "italic" , size = 11), strip.text.y = element_text (face= "plain" , size = 11)) + scale_y_continuous (position= "right" , limits = c (-0.5,1.5)) + scale_fill_manual (values= brewer.pal (8, "Set2" )[ c (2,3,7,1,5,6)]) + scale_color_manual (values= brewer.pal (8, "Set2" )[ c (2,3,7,1,5,6)]) bp |
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· .NET10 - 预览版1新功能体验(一)
2018-05-31 R quantile函数 | cut函数 | sample函数 | all函数 | scale函数 | do.call函数
2018-05-31 Hypergeometric distribution