uname|mv|tar -xzvf|
摘要:$ ls CAFE-4.2.1.tar.gz mcl-latest.tar.gz mysql-5.4.3-beta-linux-i686-glibc23.tar.gz.1 orthomclSoftware-v2.0.9.tar.gz r8s1.81.tar.gz $ uname -a Linux software-install.cngb.sz.hpc 2.6.32-696.30.1....
阅读全文
Hypothesis Tests for One Population Mean When σ Is Unknown|other
摘要:9.5 Hypothesis Tests for One Population Mean When σ Is Unknown 使用t分布: What If the Assumptions Are Not Satisfied? 对于小size 和非正态分布sample: use a nonparam
阅读全文
Hypothesis Tests for One Population Mean When σ Is Known
摘要:9.5 Hypothesis Tests for One Population Mean When σ Is Known 使用z-test前提(同使用mean distribution之前的考虑) 在H0假设的同时需要说明significant level Statistical Significa
阅读全文
P-Value
摘要:7.3 P-Value Approach to Hypothesis Testing† 前面步骤同one mean 值,后面计算出H0所对应的概率即为p值,再用p值与significant level相比。 -(2.6+0.05)=-2.65 p值是假如H0假设正确的情况下,得到该值的概率,如果非常
阅读全文
Critical-Value|Critical-Value Approach to Hypothesis Testing
摘要:9.2 Critical-Value Approach to Hypothesis Testing example: 对于mean 值 275 的假设: 有一个关于sample mean的distribution: 已知population 标准差和sample size=25的情况下: 标准型Z为
阅读全文
Null Hypotheses| Alternative Hypotheses|Hypothesis Test|Significance Level|two tailed |one tailed|
摘要:9.1 The Nature of Hypothesis Testing Over the years, however, null hypothesis has come to mean simply a hypothesis to be tested. Null Hypothesis: H0:
阅读全文
nonparametric method|One-Mean t-Interval Procedure|
摘要:8.4 Confidence Intervals for One Population Mean When σ Is Unknown 原先是 standardized version of x bar: 当没有提供population 的标准差时,采用S(样本标准差作为population 标准差)
阅读全文
Margin of Error|sample size and E
摘要:8.3 Margin of Error 由该公式可知: To improve the precision of the estimate, we need to decrease the margin of error, E. Because the sample size, n, occurs i
阅读全文
confidence intervals and precision|The One-Mean z-Interval Procedure|When to Use the One-Mean z-Interval Procedure
摘要:Confidence Intervals for One Population Mean When σ Is Known Obtaining Confidence Intervals for a Population Mean When σ Is Known The z-interval proce
阅读全文
Point Estimate|unbiased estimator|Confidence-Interval Estimate
摘要:8.1 Estimating a Population Mean Point Estimate estimate a single number, or point. 因为:the mean of the sample mean equals the population mean (μx¯ = μ
阅读全文
Sampling Distribution of the Sample Mean|Central Limit Theorem
摘要:7.3 The Sampling Distribution of the Sample Mean population:1000;Scale are normally distributed with mean 100 and standard deviation 16 sample:4;可以得到样
阅读全文
The Mean of the Sample Mean|Standard Deviation of the Sample Mean|SE
摘要:7.2 The Mean and Standard Deviation of the Sample Mean Recall that the mean of a variable is denoted μ, subscripted if necessary with the letter repre
阅读全文
Sampling Error|Sampling mean|population mean
摘要:7.1 Sampling Error; the Need for Sampling Distributions 样本均值的三种表达: Sampling distribution of the sample mean Distribution of the variable x¯ Distributi
阅读全文
ORs-6-Olfactory Bulb Ratio, ORs Gene Repertoire, and Olfactory Ability
摘要:Olfactory Bulb Ratio, ORs Gene Repertoire, and Olfactory Ability 1.Olfactory Bulb的生物学意义:a.生存 b.嗅觉能力 2.Olfactory Bulb Ratio与嗅觉能力成正比 基于以上table得到线性相关图: a
阅读全文
ORs-5-OR Subgenomes Variation among Birds, Sea Turtle and Alligator
摘要:OR Subgenomes Variation among Birds, Sea Turtle and Alligator 由 该图数据计算每种鸟的relative percentage,得到下图: 然后得到: (1)得到(总体): 1.The 48 avian genomes lacked OR1
阅读全文
ORs-4-Enhanced Role of OR Gene Loss (Pseudogenization) in Birds
摘要:Enhanced Role of OR Gene Loss (Pseudogenization) in Birds 1.因为文献已经证明(a)基因缺失和得到对于进化有影响,(b)大的基因家族对进化影响更大(because of the broader range of evolutionary pr
阅读全文
ORs-3-OR Gene Family Phylogeny
摘要:OR Gene Family Phylogeny 1.之前关于ORs基因构建系统生发树的研究中的不足:bootstrap support values在有些family中高,bootstrap support values在有些family中低。而我们现在的这个研究bootstrap support
阅读全文
ORs-2-Genome Coverage and the OR Subgenome
摘要:Genome Coverage and the OR Subgenome 因为: 爬行类动物的的gene numbers比较大,而birds 的 gene numbers 处于(182-688) 其中: (1)gene 扩张和缩小(之后会详细说明) (2)chicken and zebra finc
阅读全文
ORs-1-introduction
摘要:introduction: 1.Olfactory receptors (ORs)很重要 2.已知的ORs的分子结构,但仍存在没清楚的地方: Though the relationship between odors and ORs is not clear, it has been hypothe
阅读全文
Normal Probability Plots|outlier
摘要:6.4 Assessing Normality; Normal Probability Plots The normal probability plot is a graphical technique to identify substantive departures from normali
阅读全文
workflow
摘要:step1:grep data step2:build index Picard CreateSequenceDictionary creates .dict file and samtools faidx creates a .fai file. Both are needed for GATK
阅读全文
Sam format
摘要:reference:https://davetang.org/wiki/tiki-index.php?page=SAM @SQ SN:contig1 LN:9401 (序列ID及长度) 参考序列名,这些参考序列决定了比对结果sort的顺序,SN是参考序列名;LN是参考序列长度;每个参考序列为一行。
阅读全文
samtools faidx
摘要:第一列 NAME : 序列的名称,只保留“>”后,第一个空白之前的内容; 第二列 LENGTH: 序列的长度, 单位为bp; 第三列 OFFSET : 第一个碱基的偏移量, 从0开始计数,换行符也统计进行;gff文件中的mRNA start那一列的值 第四列 LINEBASES : 除了最后一行外,
阅读全文
bwa index|amb|ann|bwt|pac|sa
摘要:其中: 参数-a用于指定建立索引的算法: bwtsw 适用于>10M is 适用于参考序列<2G (默认-a is) 可以不指定-a参数,bwa index会根据基因组大小来自动选择合适的索引方法 .amb is text file, to record appearance of N (or ot
阅读全文
68.26-95.44-99.74 rule|empirical rule
摘要:6.3 Working with Normally Distributed Variables As illustrated in the previous example, the 68.26-95.44-99.74 rule allows us to obtain useful informat
阅读全文
z-scores|zα
摘要:6.2 Areas Under the Standard Normal Curve Property 4: Almost all the area under the standard normal curve lies between −3 and 3 Because the standard n
阅读全文
Normally Distributed|
摘要:6.1Introducing Normally Distributed Variables Why the word “normal”? Because, in the last half of the nineteenth century,researchers discovered that i
阅读全文
数理统计与概率论的关系
摘要:数理统计与概率论的关系: 统计学:根据手中信息,猜猜桶里有啥?(样本归纳总结出总体)(样本估计总体(该总体是已经存在的总体)) 概率论:根据桶中信息,猜猜手里有啥?(总体对样本进行预测)(总体预测样本(该样本是未来的样本)) 作者:猴子链接:https://www.zhihu.com/questio
阅读全文
The General Addition Rule|complementation rule|special addition rule|
摘要:5.3 Some Rules of Probability 如图所示,AorB是所有蓝色区域,所以P(AorB)=PA+PB,但是若非互斥事件,则不能直接相加: If you think of the regions as probabilities, the entire region enclo
阅读全文
Events|sample space|mutually exclusive events
摘要:5.2Events The collection of all 52 cards—the possible outcomes—is called the sample space for this experiment Relationships Among Events More generall
阅读全文
The equal-likelihood model|event|experiment|probability model
摘要:5.1Probability Basics uncertainty is inherent in inferential statistics,因为总是需要样本估计总体,The science of uncertainty is called probability theory.学习概率分布帮助我
阅读全文
linear correlation coefficient|Correlation and Causation|lurking variables
摘要:4.4 Linear Correlation 若由SxxSyySxy定义则为: 所以为了计算方便: 所以,可以明白的是,Sxx和Sx是不一样的! 所以,t r is independent of the choice of units and always lies between −1 and 1
阅读全文
SST|SSR|SSE| r 2|与Sx x &Sx y & Syy的关系|
摘要:4.3 The Coefficient of Determination 为评估模型,我们可以使用以下方法: (1) the total variation in the observed values of the response variable(观察值中的y) (2) the amount
阅读全文
the least-squares criterion|Sxx|Sxy|Syy|Regression Equation|Outliers|Influential Observations|curvilinear regression|linear regression
摘要:4.2 The Regression Equation Because we could draw many different lines through the cluster of data points, we need a method to choose the “best” line.
阅读全文
Linear Equations
摘要:4.1 Linear Equations with One Independent Variable
阅读全文
Descriptive Measures for Populations|Parameter|Statistic|standardized variable|z-score
摘要:3.4 Descriptive Measures for Populations; Use of Samples For a particular variable on a particular population: 1.There is only one population mean—nam
阅读全文
The Five-Number Summary|Boxplots
摘要:3.3 The Five-Number Summary; Boxplots the deciles divide a data set into tenths (10 equal parts), the quintiles divide a data set into fififths (5 equ
阅读全文
The sequence and de novo assembly of the giant panda genome.ppt
摘要:sequencing:使用二代测序原因:高通量,短序列 不用长序列原因: 1.算法错误率高 2.长序列测序将嵌合体基因错误积累。嵌合体基因:通过重组由来源与功能不同的基因序列剪接而形成的杂合基因 sequencing: 增多的total length>N>gap>missing in genome
阅读全文
range|Sample Standard Deviation|标准差几何意义
摘要:Measures of Variation 方差:measures of variation or measures of spread 源于range发现range不足以评估整个set(因为只用到largest and smallest value),所以有了方差 The Sample Stand
阅读全文
mean|mode|median|sample的表达方式
摘要:Measures of Center measures of central tendency:the center or most typical value:average Mean:its arithmetic average;受极值影响;可以通过去掉极值减少极值的影响 Median:the
阅读全文
smooth curve|population|sample
摘要:Distribution Shapes 由直方图到 smooth curve 1.this distribution of heights is bell shaped(or mound shaped), but the smooth curve makes seeing the shape a l
阅读全文
single-value grouping |limit grouping|cutpoint grouping|Lower class limit|Upper class limit|Class width|Class mark|rounding error or roundoff error|Histograms|Dotplots|Stem-and-Leaf
摘要:2.3 Organizing Quantitative Data group quantitative data: To organize quantitative data, we first group the observations into classes (also known as c
阅读全文
Relative-Frequency|frequency|pie chart |bar chart
摘要:2.2Organizing Qualitative Data The number of times a particular distinct value occurs is called its frequency (or count) Relative-Frequency Distributi
阅读全文
Variable|quantitative variables|continuous variable|discrete variable|qualitative variables| observation|data set
摘要:2.1Variables and Data Variable:某物或某人的某一特征和其他个体不同。 quantitative variables:定量变量either discrete (可以被数)or continuous.(A continuous variable is a variable
阅读全文
Simple Random Sampling|representative sample|probability sampling|simple random sampling with replacement| simple random sampling without replacement|Random-Number Tables
摘要:1.2 Simple Random Sampling Census, :全部信息 Sampling: 抽样方式: representative sample:有偏向,研究者选择自己觉得有代表性的sample probability sampling:使用随机数表不用研究者来抽样,较为客观(研究者可以
阅读全文
descriptive statistics|inferential statistics|Observational Studies| Designed Experiments
摘要:descriptive statistics:组织和总结信息,为自身(可以是population也可以是sample)审视和探索, inferential statistics.从sample中推论population情况并评价推论可信度 在population中精挑细选出sample Observ
阅读全文
replace|同时替换
摘要:a= 'eeekkksksksk' print a.replace('e','s').replace('s','k') #kkkkkkkkkkkk change={"e":"s","k":'@',"s":"!"} a_new=''.join(change[i] for i in a) print(a_new) ''' sss@@@!@!@!@ ''
阅读全文
依据gff切fa并翻译为蛋白质
摘要:#!/usr/bin/python import re import sys import gzip change={'A':'T','T':'A','C':'G','G':'C','N':'N'} CODE = { 'GCA' : 'A', 'GCC' : 'A', 'GCG' : 'A', 'GCT' : 'A&#
阅读全文
set|lambda|reduce
摘要:#!/usr/bin/python a=set([i for i in range(4,8)]) b=set([i for i in range(5,12)]) c= sorted(a & b) print c print reduce(lambda x,y:(x-1)*y,c) def func_1(x,y): return (x-1)*y print reduce(func_...
阅读全文
zcat|subprocess.check_all|subprocess.Popen|gzip|readline()
摘要:#!/usr/bin/python from subprocess import check_call import subprocess import gzip ''' $ zcat 160121_I133_FCH5LL5BBXX_L8_RSZADPI007179-107_2.fq.gz |head -4 >o3 $ cat o3 @K00133:143:H5LL5BBXX:8:1101:1...
阅读全文
getopt|sys|open|print文件|main()|if __name__ == "__main__"|getline()
摘要:#!/usr/bin/python import sys import getopt import re def compare(f1,f2,o1,o2,si_line): lines_count=0; in1 = open(f1,"r") in2 = open(f2,"r") ou1 = open(o1,"w") ou2 = open(o2,"w"...
阅读全文
qsub|pasta|
摘要:cd /xxx/genome_stat/Annotation ln -s /xxx/02.annotation/gff_v2/*.homolog.v2.gff /xxx/genome_stat/Annotation ls *.gff | while read l;do echo "perl /xxx/02.annotation/build_pipeline/bin/stat_gff.pl $l"...
阅读全文
open 管道用法|Getopt::Long
摘要:#!/usr/bin/perl use strict; use warnings; use Getopt::Long; my ($number,$in,$out); GetOptions( "number:i"=>\$number, "in:s"=>\$in, "out:s"=>\$out ); #my $all = $number +1;print $all; ...
阅读全文
Estimating Gene Frequencies| method of maximum likelihood|point estimate
摘要:I.11 Estimating Gene Frequencies 在小样本上计算基因A的概率PA,举例如下: 通过加大样本会将通过观察值得到的数趋近于真实数据,所以该问题转化为了统计学上利用大量观察值求真实值的问题,因此通过最大似然估计得到真实值. 为了理解多项式分布可以先以二项分布为例: 该二项分
阅读全文
定义变量|dirname|basename|printf
摘要:$ basename /xxxx/test test $ dirname /xxxx/test /xxx $ dirname /xxx/test|while read p;do sp=$p"222";echo $sp;done /xxx222 $ ls /xxx/*.txt|while read p;do sp=$p"22222";printf "%s\n%s\n" \ $p\ $sp...
阅读全文
Linkage Disequilibrium|D‘|r2
摘要:I.10 Other Measures of Linkage Disequilibrium 因为D的取值强烈地依赖于人为制定的等位基因频率(PA及PB),所以它不利于LD程度的比较。标准化的不平衡系数D'能够避免这种对等位基因频率的依赖。D'的计算方法如下: D'=D/Dmax 这样使得-1<D<1
阅读全文
linkage disequilibrium|linkage equilibrium
摘要:I.9 Linkage INDEPENDENCE OF GENOTYPES AT TWO LOCI:若A,B是两个独立位点:PA是基因A的概率,PB是基因B的概率。因为基因A与基因B是相互独立的位点,所以基因型AABB的概率为PAABB=(PA)^2*(PB)^2 A RETROSPECTIVE D
阅读全文
Sex linkage
摘要:I.8 Sex linkage 单倍体:性别决定基因(S\s)和与性别决定基因连锁的等位基因(A\a)存在于同一套遗传物质上,其配子结合和减数分裂图示如下: 如果性别是由染色体区域决定的,自然选择会避免在该性染色体区域发生重组(因为重组后会形成以前并不存在的性别),因此自然选择会让该区域的形式本质上
阅读全文
Different Gene Frequencies in the Two Sexes
摘要:I.7 Different Gene Frequencies in the Two Sexes 假设存在一种基因仅在第一代亲代的不同性别之间的概率有区别,比如,A 在male中频率是Pm,a是(1-Pm);A 在female中频率是Pf,a是(1-Pf) (第一代亲代,配子,第一代子代)情况如下:
阅读全文
Overlapping generations model
摘要:I.6 Overlapping generations 世代被分离开,世代不重复一定满足哈代公式的条件,但是现实情况远没有这么简单(因为会世代重叠,即亲代死去同时一个亲代在不同时间都有可能产生子代,因为而哈代公式需要世代不重叠,即亲代只产生一次子代),所以,我们需要构造另一种模型来分析世代重叠。 因
阅读全文
Multiple alleles|an intuitive argument|
摘要:I.5 Multiple alleles. 由两个等位基因拓展到多个等位基因,可以得到更多种二倍体基因型: 所以单个等位基因的概率(用i代指某个基因,pi*是该基因的频率)是(以计数的方法表示) 所以,减数分裂后的配子概率是pi’,所以得到: 又因为基于哈代公式的情况下: 所以:依据基因型和基因之间
阅读全文
rare alleles
摘要:I.4 Where the rare alleles are found p是基因A的频率,N是个体数目(也就是基因型个数,所以基因个数是2n,所以全部个体的基因A的个数是2np),p方是PAA,np2是:基因型是AA的个体数目,所以所有基因型是AA的个体的基因A的个数是2pn2;得到: 该比值是全
阅读全文
Hardy-Weinberg laws
摘要:I.3 Diploids with two alleles: Hardy-Weinberg laws 假设子代是Aa,AA,aa的概率分别是PAa,PAA,Paa,A的基因概率是P1,a的基因概率是P2(可以利用基因的数量来理解) (P1也是配子A的概率,P2也是配子a的概率,因为在单倍体阶段等于把
阅读全文
Haploid inheritance|Hardy-Weinberg proportions|
摘要:I.2 Haploid inheritance 单倍体也有短暂的二倍体时期: Meiosis:减数分裂 依据图示信息,同时基因型A的频率是p,基因型a的频率是(1-p): 建立Hardy-Weinberg proportions:(即组成相应二倍体的概率) 在经过短暂的二倍体事件后,通过减数分裂变成
阅读全文
Asexual inheritance
摘要:Asexual inheritance 1,2分别是两种基因型 N1,N2是两种基因型的亲代个数,Wt是t代后每一个每一个基因型的后代数 N1’,N2’是t代后1,2,基因型的个体数 the proportions and ratios of different genotypes are not
阅读全文
random mating
摘要:随机交配种群 孟德尔分离(基于diploid and sexual)和随机交配(1.不因突变而改变的规律2.可计算的)是群体遗传学的基础。 随机交配(random mating)指群体中每一个成员与另一性别的任何成员都有同等的交配机会。随机交配假设不存在任何遗传学上或行为学上的交配限制,所有个体都是
阅读全文
在gff中切fa的内容
摘要:#!/usr/bin/python import re def readfa(l): col={} arr =[] sca ='' li = open(l) for line in li: if re.match(r'>(\w*)',line): match = re.match(r'>(\w*)',line) ...
阅读全文
形参和实参|默认值|可选实参|tuple|*tuple|args|*args | **kwargs|args[:]|
摘要:1 #!/usr/bin/python 2 3 def hello(i,greet='long time to see!'): 4 out = "hello "+i+" "+greet 5 nobody = {'as':'123','ad':'1234','om':'ssss'} 6 if i == '2':return nobody 7 ret...
阅读全文
raw_input|active:|continue|break|
摘要:1 a = "please" 2 b = "say something:" 3 c =a+b 4 m = 0 5 a = True 6 while a: 7 m = int(raw_input(c)) 8 print m+1 9 if m>11: 10 a = False 11 12 #please say something:wwww...
阅读全文
两种大小写比较|elif|
摘要:1 name = ['alle','mike','tom','jerry','alice','hebe'] 2 for i in name: 3 if i == 'tom': 4 print 'get!' 5 #get! 6 7 if 'ale'not in name :print 'get!' 8 9 #get! 10 a = 'tom' 11 ...
阅读全文
Gene family|
摘要:6.1引言 随着测序技术的提高,能被测序的物种趋近于复杂(因为越高等的生物基因组大且复杂(1.本身基因结构复杂2.复杂程度与种属关系并不相关)),所以基因家族(Gene family)的数目可能能够更好的评估物种的复杂性,更多的信息可以通过比较基因组学的方式得到。 基因家族(英语:Gene fami
阅读全文
push 空内容push入数组会占位
摘要:1 #!/usr/bin/perl 2 3 use strict; 4 use warnings; 5 6 my $new = (0 ==1)?233:'';my @arr; 7 my $new_1 = (1==1)?233:''; 8 push @arr,$new_1;push @arr,$new;push @arr,'123456'; 9 my $line = join...
阅读全文
在大文件中照小文件对应项的剩余部分并输出
摘要:1 #!/usr/bin/perl 2 3 use strict; 4 use warnings; 5 6 #####input##### 7 my $all = $ARGV[0]; my $part_file = $ARGV[1]; 8 9 10 11 #####mian##### 12 13 my $all_in = &store($all); 14 my $p...
阅读全文
lower()|upper()|Traceback|title()|字符串合并|rstrip|lstrip|str()|
摘要:1 print ("hello,world!") 2 sentence = "yyyy" 3 print (sentence.lower()) 4 print (sentence.upper()) 5 6 #print (sentenc) 7 8 #hello,world! 9 #yyyy 10 #YYYY 11 #Traceback (most recent call l...
阅读全文
用hash存数组|得地址|取地址
摘要:key: hash的标准形式是:$hash{$key}=value; 取地址:$hash_ad = \%hash; 解地址: 整个hash:%new_hash = %{$hash_ad};变量形式相对应; 单个hash元素:$single = $hash_ad->{$key};这是由$hash{$k
阅读全文
Macroevolution|Silent changes|CNEs|Transposable elements|Neutral sites
摘要:Interspecies genomic comparison 因为脊椎动物诞生早,在演化过程中有Macroevolution(因为自然选择或遗传漂变导致持续突变同时表型发生改变),但是存在一种基因缺失是因为基因的沉默现象(发生在中性位点),所以不易该基因是否有功能。比较基因组学在这里可以用于研究种
阅读全文
Pooled genome sequence strategies |representative genome assembly approaches|Domestication|GERP|selective sweep|Hybridization|Introgression|iHS|SNP genotyping arrays|haplotype
摘要:Design based on biology 通过比较基因组学的方法,将脊椎动物基因组的数据,解决生物学各方面问题。新的调控注释(在脊椎动物的进化过程中的出现的)可以丰富物种树(比如不同功能蛋白质进化速度上的差异(因为编码蛋白质基因和早期进化基因的发现))。 Sequencing 需要以下两种策略
阅读全文
【转】Fst指数
摘要:【转】Fst指数 转载自 http://blog.csdn.net/zhu_si_tao/article/details/71513099 与 http://blog.sina.com.cn/s/blog_4ab0b3390102viol.html 群体遗传学--Fst指数,即群体间分化指数,用于群
阅读全文