cat /xxx/DU-030-17.gapcloser.fa |head -1000 > t1.fa bwa index -a bwtsw -p t1 t1.fa 1>t1.bwa_index.log 2>&1 #$ ll #total 292K #-rw-r--r-- 1 XXX 638 Jul 23 10:55 t1.amb #-rw-r--r-- 1 XXX 183 Jul 23 10:55 t1.ann #-rw-r--r-- 1 XXX 98K Jul 23 10:55 t1.bwt #-rw-r--r-- 1 XXX 99K Jul 23 10:54 t1.fa #-rw-r--r-- 1 XXX 25K Jul 23 10:55 t1.pac #-rw-r--r-- 1 XXX 49K Jul 23 10:55 t1.sa #-rw-r--r-- 1 XXX 70 Jul 23 10:57 t1.bwa_index.log #-rw-r--r-- 1 XXX 0 Jul 23 10:56 w.sh [bwa_idx_build] fail to open file 't2.fa' : No such file or directory
其中:
参数-a
用于指定建立索引的算法:
- bwtsw 适用于>10M
- is 适用于参考序列<2G (默认-a is)
可以不指定-a
参数,bwa index会根据基因组大小来自动选择合适的索引方法
.amb is text file, to record appearance of N (or other non-ATGC) in the ref fasta.
.ann is text file, to record ref sequences, name, length, etc.
.bwt is binary, the Burrows-Wheeler transformed sequence.
.pac is binary, packaged sequence (four base pairs encode one byte).
.sa is binary, suffix array index.