causal snps | causal variants | tensorflow | 神经网络实战 | Data Simulation

先读几篇文章:

Interpretation of Association Signals and Identification of Causal Variants from Genome-wide Association Studies

GWAS have been successful in identifying disease susceptibility loci, but it remains a challenge to pinpoint the causal variants in subsequent fine-mapping studies. A conventional fine-mapping effort starts by sequencing dozens of randomly selected samples at susceptibility loci to discover candidate variants, which are then placed on custom arrays or used in imputation algorithms to find the causal variants. We propose that one or several rare or low-frequency causal variants can hitchhike the same common tag SNP, so causal variants may not be easily unveiled by conventional efforts. Here, we first demonstrate that the true effect size and proportion of variance explained by a collection of rare causal variants can be underestimated by a common tag SNP, thereby accounting for some of the “missing heritability” in GWAS. We then describe a case-selection approach based on phasing long-range haplotypes and sequencing cases predicted to harbor causal variants. We compare this approach with conventional strategies on a simulated data set, and we demonstrate its advantages when multiple causal variants are present. We also evaluate this approach in a GWAS on hearing loss, where the most common causal variant has a minor allele frequency (MAF) of 1.3% in the general population and 8.2% in 329 cases. With our case-selection approach, it is present in 88% of the 32 selected cases (MAF = 66%), so sequencing a subset of these cases can readily reveal the causal allele. Our results suggest that thinking beyond common variants is essential in interpreting GWAS signals and identifying causal variants.

Where is the causal variant? On the advantage of the family design over the case-control design in genetic association studies.

Identification of causal genes for complex traits

Pure and Confounded Effects of Causal SNPs on Longevity: Insights for Proper Interpretation of Research Findings in GWAS of Populations with Different Genetic Structures

 

初步学习一些TensorFlow的基本概念

YouTube的莫凡教程  GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# View more python tutorial on my Youtube and Youku channel!!!
 
# Youtube video tutorial: https://www.youtube.com/channel/UCdyjiB5H8Pu7aDTNVXTTpcg
# Youku video tutorial: http://i.youku.com/pythontutorial
 
"""
Please note, this code is only for python 3+. If you are using python 2+, please modify the code accordingly.
"""
from __future__ import print_function
import tensorflow as tf
import numpy as np
 
# create data
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data*0.1 + 0.3
 
### create tensorflow structure start ###
Weights = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
biases = tf.Variable(tf.zeros([1]))
 
y = Weights*x_data + biases
 
loss = tf.reduce_mean(tf.square(y-y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)
### create tensorflow structure end ###
 
sess = tf.Session()
# tf.initialize_all_variables() no long valid from
# 2017-03-02 if using tensorflow >= 0.12
if int((tf.__version__).split('.')[1]) < 12 and int((tf.__version__).split('.')[0]) < 1:
    init = tf.initialize_all_variables()
else:
    init = tf.global_variables_initializer()
sess.run(init)
 
for step in range(201):
    sess.run(train)
    if step % 20 == 0:
        print(step, sess.run(Weights), sess.run(biases))

  

如何制作模拟的数据

Data Simulation Software for Whole-Genome Association and Other Studies in Human Genetics 

A comparison of tools for the simulation of genomic next-generation sequencing data 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
num_cau_SNP <- 20
num_SNP <- 500
samplesize <- 20
h_squared <- 0.5
 
# generate genotype in Binomial distribution
pj <- runif(num_SNP, 0.01, 0.5)
xij_star <- matrix(0, samplesize, num_SNP)
#for every SNP
for (j in 1: num_SNP)
{
  xij_star[,j] <- rbinom(samplesize, 2, pj[j])
}
 
#position of causal SNPs
CauSNP <- sample(1:num_SNP, num_cau_SNP, replace = F)
Ord_CauSNP <- sort(CauSNP, decreasing = F)
 
# generate beta, which is the best predictor
beta <- rep(0,num_SNP)
dim(beta) <- c(num_SNP,1)
# non-null betas follow standard normal distribution
beta[Ord_CauSNP] <- rnorm(num_cau_SNP,0,1)
 
# epsilon
var_e <- sum((xij_star %*% beta)^2)
# var_e <- t(beta)%*%t(xij_star)%*%xij_star%*%beta/samplesize*(1-h_squared)/h_squared
e <- rnorm(samplesize, 0,sqrt(var_e))
dim(e) <- c(samplesize, 1)
 
# generate phenotype
pheno <- xij_star %*% beta + e
 
# scale(genotype matrix)

  

 

 

 

待续~

posted @   Life·Intelligence  阅读(1204)  评论(0编辑  收藏  举报
(评论功能已被禁用)
编辑推荐:
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
阅读排行:
· winform 绘制太阳,地球,月球 运作规律
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
TOP
点击右上角即可分享
微信分享提示