highly variable gene | 高变异基因的选择 | feature selection | 特征选择
在做单细胞的时候,有很多基因属于noise,就是变化没有规律,或者无显著变化的基因。在后续分析之前,我们需要把它们去掉。
以下是一种找出highly variable gene的方法:
The feature selection procedure is based on the largest difference between the observed coefficient of variation (CV) and the predicted CV (estimated by a non-linear noise model learned from the data) See Figure S1C. In particular, Support Vector Regression (SVR, Smola and Vapnik, 1997) was used for this purpose (scikit-learn python implementation, default parameters with gamma = 0.06; Pedregosa et al., 2011).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | #Pre-filtering df_f = df_merge.copy() df_f = df_f.ix[ sum (df_f> = 1 , 1 )> = 5 ,:] # is at least 1 in X cells df_f = df_f.ix[ sum (df_f> = 2 , 1 )> = 2 ,:] # is at least 2 in X cells df_f = df_f.ix[ sum (df_f> = 3 , 1 )> = 1 ,:] # is at least 2 in X cells #Fitting mu = df_f.mean( 1 ).values sigma = df_f.std( 1 , ddof = 1 ).values cv = sigma / mu score, mu_linspace, cv_fit , params = fit_CV(mu,cv, 'SVR' , svr_gamma = 0.005 ) #Plotting def plot_cvmean(): figure() scatter(log2(mu),log2(cv), marker = 'o' , edgecolor = 'none' ,alpha = 0.1 , s = 5 ) mu_sorted = mu[argsort(score)[:: - 1 ]] cv_sorted = cv[argsort(score)[:: - 1 ]] scatter(log2(mu_sorted[:thrs]),log2(cv_sorted[:thrs]), marker = 'o' , edgecolor = 'none' ,alpha = 0.15 , s = 8 , c = 'r' ) plot(mu_linspace, cv_fit, '-k' , linewidth = 1 , label = '$Fit$' ) plot(linspace( - 9 , 7 ), - 0.5 * linspace( - 9 , 7 ), '-r' , label = '$Poisson$' ) ylabel( 'log2 CV' ) xlabel( 'log2 mean' ) grid(alpha = 0.3 ) xlim( - 8.6 , 6.5 ) ylim( - 2 , 6.5 ) legend(loc = 1 , fontsize = 'small' ) gca().set_aspect( 1.2 ) plot_cvmean() #Adjusting plot |
对每一个基因在不同细胞中的表达量的mean和CV散点图,通过SVR拟合出noise的曲线。
通过the largest difference between the observed coefficient of variation (CV) and the predicted CV (estimated by a non-linear noise model learned from the data)就能找出highly variable gene了。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· winform 绘制太阳,地球,月球 运作规律
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)