ggplot的boxplot/violin plot添加显著性 | Add P-values and Significance Levels to ggplots | 定量分析

2022年10月11日

显著性符号的意义

Symbol	Meaning
ns	P > 0.05
*	P ≤ 0.05
**	P ≤ 0.01
***	P ≤ 0.001
****	P ≤ 0.0001

参考：What is the meaning of * or ** or *** in reports of statistical significance from Prism or InStat?

文章里的描述：Expression of regional markers TOX3 (r5) and PRDX6 (r6) between HCO and HBSO. Wilcoxon rank sum test was applied. P-values < 0.0001 is marked by ****.

2022年10月07日

越高分的文章，对数据的严格的统计处理就越重要，存在一些经典必用的图：

boxplot，自带四分位信息，最好加上jitter让人看到你的数据点
violin plot，在单细胞里很火，可以直接看到数据的分布，可以叠加boxplot使用
线性拟合回归，lm，我们目前绝对无法handle非线性的回归

这些经典分析必须搭配显著性测试，必须在图里显示P-value，或者P-value对应的符号（*、**、***、NS）。

目前在ggplot里添加显著性的主要方法：

ggpubr的stat_compare_means，也是最为主流的方法。
- 优点：可以应对多组比对，理论上可以无限标注分组，快速
- 缺点：数据处理整合在ggplot函数里，无法自定义处理数据
ggpubr的第二个函数stat_pvalue_manual，适合有一个固定的ref
- 优点：可以自定义处理数据
- 缺点：没有comparisons参数，多组比较不够自如
自己检验，然后用geom_text手动标注
- 优点：完全自定义，随心所欲，超批量作图
- 缺点：不适合多组比对，适合两组对比

难点：

在数据零值过多时，所有scale都失效了，此时只能用geom_jitter来显示差异【2022-10-08】
两个control，HSCR paper里，必须后期用inkscape来修改，也算方便。
facet_wrap可以做到真正的free_y，但stat_compare_means跟不上，所以得用stat_pvalue_manual（参考：Data_center/analysis/HBSO_7Ala_Elly_2022_Jul/neuronal_subset.ipynb）
facet_grid里无法做到真正的free_y，可以用facet_wrap和cowplot的拼图来实现
也可以将数据scale到0-1，这样label时就可以使用固定参数（参考：EllyLab/mouse/singleCell/case/Vcl_ENCC/Vcl_ENCCs_aggregate_analysis.ipynb）

方法一的方法参考下面的文章

方法二：

参考：Data_center/analysis/HBSO_7Ala_Elly_2022_Jul/Seurat_integration_all.ipynb

方法三：

现实案例代码：EllyLab/mouse/singleCell/case/Vcl_ENCC/Vcl_ENCCs_aggregate_analysis.ipynb【会封装为函数】

参考：Add P-values and Significance Levels to ggplots

ggpubr的包比较局限，能用的test也比较局限，但是做起来快速简单。

当情况特殊时ggpubr就不能用了，可以自己做了显著性test之后再显示在图上。

# show lable in facet grid plot
dat_text <- data.frame()
for (i in names(paired_list)) {
    # Compute t-test
    res <- t.test(value ~ group, data = paired_list[[i]], paired = TRUE)
    dat_text <- rbind(dat_text, data.frame(variable=i, pvalue=res$p.value))
}

dat_text$label <- paste("P", round(dat_text$pvalue, 3), sep="=")

dat_text[dat_text$pvalue<0.05 & dat_text$pvalue>0.01,]$label <- paste("*", 
                                  dat_text[dat_text$pvalue<0.05 & dat_text$pvalue>0.01,]$label, sep=" ")

dat_text[dat_text$pvalue<0.01,]$label <- paste("**", "P<0.01", sep=" ")

    library(ggplot2) 
    options(repr.plot.width=8, repr.plot.height=12) # 8x8
    g2 <- ggplot(data=genes_expr_melt, aes(x=pseudotime, y=value, fill=group, color=group)) +
        geom_point(size=0.01, alpha=0.5, aes(color=group, fill=group)) +
        labs(x = "Pseudotime", y = "Relative expression", title = "Neuronal lineage") +
        geom_smooth(method = 'loess',se=F,size=0.15,span = 0.7) + # ,alpha=0.05, weight=0.1,
        facet_wrap( ~ variable, ncol=3, labeller = label_context, scales = "free_y") + # 
        geom_text(size    = 5, data    = dat_text, mapping = aes(x = Inf, y = Inf, label = label),
            hjust   = 1.05,vjust   = 1.5, color=ifelse(dat_text$pvalue < 0.05,'red','black')) +
        # themes
        theme(strip.background = element_blank(),
             panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
             panel.spacing=unit(.4, "lines"),panel.border = element_rect(color = "black", fill = NA, size = 0.5))+
        theme(axis.text.x  = element_text(face="plain", angle=0, size = 10, color = "black", vjust=0.5),
            axis.text.y  = element_text(face="plain", size = 10, color = "black"),
            axis.title =element_text(size = 15)) +
        theme(strip.background = element_rect(fill = "gray90", color = NA))+
          # theme(legend.position = "none") + # must remove legend
          theme(strip.placement = "outside", strip.text.x = element_text(face="plain", size = 13),
                  strip.text.y = element_text(face="plain", size = 11)) +
        theme(strip.text.x = element_text(margin = margin(1,0,1,0, "mm"))) +
        scale_color_manual(values=c("deepskyblue","red","gray50")) +
        scale_fill_manual(values=c("deepskyblue","red","gray50"))
    plot(g2)
# }

多组比较，挑选感兴趣的显示显著性。

data("ToothGrowth")
head(ToothGrowth)

library(ggpubr)
my_comparisons <- list( c("0.5", "1"), c("1", "2"), c("0.5", "2") )
options(repr.plot.width=4, repr.plot.height=4)
ggplot(ToothGrowth, aes(x=as.character(dose), y=len, fill=dose)) +
  geom_boxplot(outlier.size=NA, size=0.01, outlier.shape = NA) +
  geom_jitter(width = 0.3, size=0.01) +# , aes(color=supp) +
  stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value
  stat_compare_means(label.y = 50, label.x = 1.5)     # Add global p-value

还可以设定一个ref group来显示显著性差异，只需要改一下设定。

  stat_compare_means(method = "anova", label.y = 1.3, label.x = 3)+ # Add pairwise comparisons p-value
  # # Add global p-value
  stat_compare_means(label = "p.signif", method = "t.test", ref.group = "hNP-D20", label.y = 1.1) +

生物学的强烈推荐看看Y叔的公众号里的统计相关的文章，非常的基础和实用。

统计

代码例子：

options(repr.plot.width=7, repr.plot.height=6)
# facet boxplot
bp <- ggplot(expr_data2, aes(x=group, y=expression, fill=NA)) + 
  geom_boxplot(outlier.size=NA, size=0.01, outlier.shape = NA) + 
  geom_jitter(width = 0.3, size=0.01, aes(color=cluster)) +
  # + geom_boxplot( + 
  facet_grid( cluster ~ gene, switch="y") + # , scales = "free"
  theme_bw() + 
  stat_compare_means(aes(group = group, label = ..p.signif..), label.x = 1.3,label.y = 1.3, 
                     method = "wilcox.test", hide.ns = T) + # label = "p.format",
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  labs(x = "", y = "", title = "") +
  theme(panel.spacing=unit(.3, "lines"),panel.border = element_rect(color = "black", fill = NA, size = 0.2)) +
  theme(axis.ticks.x = element_blank(), axis.ticks = element_line(size = 0.1), 
        axis.text.x  = element_text(face="plain", angle=90, size = 8, color = "black", vjust=0.5),
        axis.text.y  = element_text(face="plain", size = 4, color = "black"),
        axis.title =element_text(size = 12)) +
  theme(strip.background = element_rect(fill = "gray97", color = NA))+
  theme(legend.position = "none") +
  theme(strip.placement = "outside", strip.text.x = element_text(face="italic", size = 11),
          strip.text.y = element_text(face="plain", size = 11)) +
  scale_y_continuous(position="right", limits = c(-0.5,1.5)) +
  scale_fill_manual(values=brewer.pal(8,"Set2")[c(2,3,7,1,5,6)]) +
  scale_color_manual(values=brewer.pal(8,"Set2")[c(2,3,7,1,5,6)])
bp

posted @ 2019-05-31 15:51 Life·Intelligence 阅读(6858) 评论(0) 编辑收藏举报

刷新页面返回顶部

Digital-LI

ggplot的boxplot/violin plot添加显著性 | Add P-values and Significance Levels to ggplots | 定量分析

统计