Rosetta scoring
参考:https://www.rosettacommons.org/demos/latest/tutorials/scoring/scoring
介绍
Rosetta有一个被称为ref2015(默认打分函数)的优化能量函数或打分函数,用于计算由L-氨基酸组成的球状蛋白质中所有原子相互作用的能量。还有几个全原子评分函数用于其他生物分子的特殊应用。此外,也可以自定义打分函数。
在Rosetta中,打分函数是能量项的加权和,其中一些表示物理力,如静电和范德华相互作用,而另一些表示统计项,如在Ramachandran space中找到扭转角的概率。
ref2015打分函数的各能量项及权重
能量项 | 解释 | 权重 |
fa_atr | Lennard-Jones attractive between atoms in different residues | 1 |
fa_rep | Lennard-Jones repulsive between atoms in different residues | 0.55 |
fa_sol | Lazaridis-Karplus solvation energy | 0.9375 |
fa_intra_sol_xover4 | Intra-residue Lazaridis-Karplus solvation energy | |
lk_ball_wtd | Asymmetric solvation energy | |
fa_intra_rep | Lennard-Jones repulsive between atoms in the same residue | 0.005 |
fa_elec | Coulombic electrostatic potential with a distance-dependent dielectric | 0.875 |
pro_close | Proline ring closure energy and energy of psi angle of preceding residue | 1.25 |
hbond_sr_bb | Backbone-backbone hbonds close in primary sequence | 1.17 |
hbond_lr_bb | Backbone-backbone hbonds distant in primary sequence | 1.17 |
hbond_bb_sc | Sidechain-backbone hydrogen bond energy | 1.17 |
hbond_sc | Sidechain-sidechain hydrogen bond energy | 1.1 |
dslf_fa13 | Disulfide geometry potential | 1.25 |
rama_prepro | Ramachandran preferences (with separate lookup tables for pre-proline positions and other positions) | 0.25 |
omega | Omega dihedral in the backbone. A Harmonic constraint on planarity with standard deviation of ~6 deg. | 0.625 |
p_aa_pp | Probability of amino acid, given torsion values for phi and psi | 0.4 |
fa_dun | Internal energy of sidechain rotamers as derived from Dunbrack's statistics | 0.7 |
yhh_planarity | A special torsional potential to keep the tyrosine hydroxyl in the plane of the aromatic ring | 0.625 |
ref | Reference energy for each amino acid. Balances internal energy of amino acid terms. Plays role in design. | 1 |
METHOD_WEIGHTS | Not an energy term itself, but the parameters for each amino acid used by the ref energy term. |
Rosetta Scores与真实能量比较
尽管得分较低的结构更接近天然结构,但Rosetta Scores不能直接转换为物理能量单位,如kcal/mol。而是使用Rosetta Energy Units (REU)来表示Rosetta Scores。另外,由于分数取决于使用的评分函数,比较使用不同评分函数的得分是没有意义的。
Score函数选项列表
-score:weights Name of weights file (without extension .wts)
Default="ref2015". [String]
-score:patch Name of patch file (without extension)
Default="". [String]
-score:set_weights Modification to weights via the command line.
List of paired strings: -score::set_weights <score_type1> <setting1>
<score_type2> <setting2> ...
-score:empty Make an empty score - i.e. NO scoring. [Boolean]
-score:fa_max_dis How far does the FA pair potential go out to.
Default = '6.0'. [Real]
-score:fa_Hatr Turn on Lennard Jones attractive term for hydrogen
atoms. [Boolean]
-score:no_smooth_etables Revert to old style etables. [Boolean]
-score:etable_lr Lowers energy well at 6.5A. [Real]
-score:input_etables Read etables from files with given prefix. [String]
-score:output_etables Write out etables to files with given prefix. [String]
-score:rms_target Target of RMS optimization for RMS_Energy EnergyMethod'
Default='0.0' [Real]
-score:ramaneighbors Uses neighbor-dependent ramachandran maps
Default='false' [Boolean]
-score:symmetric_gly_tables Use a symmetric version of the Ramachandran and p_aa_pp tables for glycine
when sampling or scoring. Useful for sampling or scoring glycine in the
context of a mixed D/L amino acid peptide. As of 23 February 2016, this
flag also symmetrizes the RamaPrePro tables for glycine. Default='false'
[Boolean]
-score:optH_weights Name of weights file (without extension .wts) to use
during optH. [String]
-score:optH_patch Name of weights file (without extension .wts) to use
during optH. [String]
-score:hbond_bb_per_residue_energy In score tables, separate backbone hydrogens bond energies per residue.
(By default, bb hbonds are included in the total energy, but not per residue
energies. Note that this may lead to a slowdown in packing) [Boolean]
Demo
准备PDB文件用于打分
1)示例1
在Rosetta,使用score_jd2应用来进行打分,pdb文件直接从PDB文件下载即可,不需要额外处理。执行命令如下:
score_jd2.mpi.linuxgccrelease -in:file:s input_files/from_rcsb/3tdm.pdb
注:官网的说法是从PDB数据库直接下载的PDB文件可能和score_jd2不兼容,导致运行出错。但我直接运行并没有报错(猜测rosetta更新版本解决不兼容问题了吧)。如果真的遇到PDB格式不兼容问题,也不用慌!在命令后加选项-ignore_unrecognized_res
就会忽略PDB文件中的磷酸基团。
运行结束后,会在执行命令的当前目录下生成一个sore.sc文件。执行多次,评分结果将会追加在此sore.sc文件中。
2)示例2
另外,如果输入的PDB缺少重原子或含有Rosetta默认识别的不寻常残基(不像磷酸盐),Rosetta会添加或更改原子以满足要求。在执行命令后加选项-out:pdb
可以输出Rosetta实际评分的结构。命令如下:
score_jd2.mpi.linuxgccrelease -in:file:s input_files/from_rcsb/1qys.pdb -out:pdb
执行以上命令会在屏幕上看到
1):表示Rosetta将残基MSE转换成了MET
2):表示Rosetta发现13号残基缺少Cγ原子,并为13号残基构建了侧链。
运行结束后,会在执行命令的当前目录下生成一个sore.sc文件和新的1qys_0001.pdb文件。
注:由于Rosetta非确定性地重建缺失的侧链,因此这个示例的每次运行都将产生不同的结果,包括PDB结构和分数文件。另外,sore.sc将显示一个大的正total_score,表示一个不利的结构;但这并不意味着结构不稳定,这只是意味着Rosetta认为在这个PDB中可能存在一些小的空间冲突。建议使用评分函数的relaxi协议来优化PDB可避免以上问题。
Basic Scoring
使用flag文件来评分改进过的1QYS结构(/demos/tutorials/scoring/input_files)。flag文件内容如下:
-in:file:s input_files/1qys.pdb
-out:file:scorefile output_files/score.sc
执行命令为:
score_jd2.mpi.linuxgccrelease @flag
运行结束后会在output_files目录下生成名为score.sc的评分文件,内容如下:
SEQUENCE:
SCORE: total_score score dslf_fa13 fa_atr fa_dun fa_elec fa_intra_rep fa_intra_sol_xover4 fa_rep fa_sol hbond_bb_sc hbond_lr_bb hbond_sc hbond_sr_bb linear_chainbreak lk_ball_wtd omega overlap_chainbreak p_aa_pp pro_close rama_prepro ref time yhh_planarity description
SCORE: -224.872 -224.872 0.000 -501.794 104.484 -146.069 1.118 18.786 71.043 304.630 -5.915 -35.139 -18.005 -33.177 0.000 -6.481 3.494 0.000 -13.948 0.000 7.351 24.518 0.000 0.230 1qys_0001
total_score:结构的总加权分数。分数越低表示结构越稳定。经验法则:使用ref2015评分函数对精细结构进行评分时,每个残基通常为-1至-3 REU。
fa_atr:fa_atr权重评分。各项的评分可帮助理解哪种能量项贡献最大。
注:较大的fa_rep加权分数(如,比fa_atr的稳定效果大得多)表明结构中存在冲突。
更多评分选项
更改评分函数
使用docking.wts(/main/database/scoring/weights)文件中给出的评分函数对1qys和1ubq进行评分,并将评分文件重命名为score_docking.sc。
flags文件内容如下:
-in:file:l input_files/pdblist
-score:weights docking
-out:file:scorefile output_files/score_docking.sc
pdblist文件内容如下:
input_files/1qys.pdb
input_files/1ubq.pdb
docking.wts内容如下:
fa_atr 0.338
fa_rep 0.044
fa_dun 0.036
fa_sol 0.242
fa_pair 0.164
hbond_lr_bb 0.245
hbond_sr_bb 0.245
hbond_bb_sc 0.245
hbond_sc 0.245
fa_elec 0.026
dslf_ss_dst 0.5
dslf_cs_ang 2
dslf_ss_dih 5
dslf_ca_dih 5
执行命令为:
score_jd2.mpi.linuxgccrelease @flag_docking
运行结束后会在output_files目录下生成名为score_docking.sc的评分文件,其内容如下:
SEQUENCE:
SCORE: total_score score dslf_ca_dih dslf_cs_ang dslf_ss_dih dslf_ss_dst fa_atr fa_dun fa_elec fa_pair fa_rep fa_sol hbond_bb_sc hbond_lr_bb hbond_sc hbond_sr_bb linear_chainbreak overlap_chainbreak time description
SCORE: -111.103 -111.103 0.000 0.000 0.000 0.000 -169.606 5.373 -3.798 -2.237 5.683 73.720 -0.822 -8.609 -2.680 -8.128 0.000 0.000 0.000 1qys_0001
SCORE: -83.850 -83.850 0.000 0.000 0.000 0.000 -131.980 4.479 -3.295 -2.059 3.845 58.363 -1.663 -5.786 -1.092 -4.661 0.000 0.000 0.000 1ubq_0001
注:不同蛋白质的总得分与结构稳定性之间不存在良好的相关性。
Patch Files and Changing Term Weights
更改能量选项权重的三种方式:
- 创建一个自定义weights文件,并将路径传递给-score:weights
- 使用补丁文件修改已存在权重
- 从命令行设置特定选项的权重
获得每个残基评分
使用可执行文件per_residue_energies和out:file:silent选项来指定要写入每个残基分解的文件。
flags文件内容如下:
-in:file:s input_files/1qys.pdb
-out:file:silent output_files/per_res.sc
执行命令为:
per_residue_energies.mpi.linuxgccrelease @flag_per_residue
运行结束后会在output_files目录下生成名为per_res.sc的评分文件,其部分内容如下:
SCORE: pose_id pdb_id fa_atr fa_rep fa_sol fa_intra_rep fa_intra_sol_xover4 lk_ball_wtd fa_elec pro_close hbond_sr_bb hbond_lr_bb hbond_bb_sc hbond_sc dslf_fa13 omega fa_dun p_aa_pp yhh_planarity ref rama_prepro score description
SCORE: input_files/1qys.pdb 3A -2.902 0.187 3.047 0.011 0.940 -0.046 -1.618 0.000 0.000 0.000 0.000 -0.828 0.000 0.019 1.594 0.000 0.000 -2.146 0.000 -1.743 residue_1
SCORE: input_files/1qys.pdb 4A -7.106 0.617 2.317 0.034 0.096 -0.041 -1.661 0.000 0.000 -0.734 0.000 0.000 0.000 0.001 0.785 -0.448 0.000 2.304 -0.160 -3.997 residue_2
SCORE: input_files/1qys.pdb 5A -5.601 1.002 5.613 0.009 0.500 0.427 -3.629 0.000 0.000 -1.490 0.000 -1.216 0.000 -0.002 3.033 0.281 0.000 -1.451 -0.068 -2.592 residue_3
SCORE: input_files/1qys.pdb 6A -6.259 1.078 1.135 0.018 0.048 0.081 -1.953 0.000 0.000 -1.295 0.000 0.000 0.000 0.033 0.032 -0.594 0.000 2.643 -0.047 -5.081 residue_4
SCORE: input_files/1qys.pdb 7A -4.785 0.307 4.224 0.008 0.198 0.384 -3.215 0.000 0.000 -1.182 0.000 -1.140 0.000 -0.051 2.698 0.310 0.000 -1.451 -0.044 -3.740 residue_5
SCORE: input_files/1qys.pdb 8A -7.435 2.567 1.530 0.016 0.046 0.005 -1.899 0.000 0.000 -1.603 0.000 0.000 0.000 0.022 0.014 -0.674 0.000 2.643 -0.033 -4.802 residue_6
SCORE: input_files/1qys.pdb 9A -5.439 0.350 5.395 0.005 0.270 0.320 -3.457 0.000 0.000 -1.180 0.000 -1.140 0.000 -0.051 3.249 -0.007 0.000 -1.340 -0.032 -3.056 residue_7
SCORE: input_files/1qys.pdb 10A -7.240 1.299 2.026 0.028 0.084 0.212 -2.108 0.000 0.000 -1.389 0.000 0.000 0.000 0.002 0.191 -0.628 0.000 2.304 -0.042 -5.260 residue_8
SCORE: input_files/1qys.pdb 11A -4.225 0.271 4.842 0.004 0.293 0.030 -3.017 0.000 0.000 -0.400 0.000 -0.774 0.000 0.082 2.007 0.750 0.000 -2.146 0.068 -2.215 residue_9
SCORE: input_files/1qys.pdb 12A -4.157 0.258 4.716 0.009 0.869 -0.099 -0.996 0.000 0.000 0.000 0.000 -0.901 0.000 0.967 1.636 -0.746 0.000 -2.146 0.995 0.405 residue_10
SCORE: input_files/1qys.pdb 13A -1.747 0.226 2.290 0.007 0.316 -0.204 0.672 0.000 0.000 0.000 0.000 0.000 0.000 -0.014 1.687 -0.855 0.000 -1.340 2.307 3.344 residue_11
SCORE: input_files/1qys.pdb 14A -0.955 0.126 0.958 0.000 0.000 -0.060 0.031 0.000 0.000 0.000 0.000 0.000 0.000 -0.113 0.000 -1.245 0.000 0.798 1.152 0.691 residue_12
SCORE: input_files/1qys.pdb 15A -3.936 0.288 3.337 0.008 0.155 -0.289 -0.091 0.000 0.000 0.000 0.000 0.000 0.000 0.112 1.465 0.248 0.000 -0.715 0.436 1.018 residue_13
SCORE: input_files/1qys.pdb 16A -2.981 0.169 2.557 0.005 0.258 -0.360 -0.773 0.000 0.000 0.000 0.000 0.000 0.000 -0.023 2.023 0.432 0.000 -1.340 0.902 0.868 residue_14
SCORE: input_files/1qys.pdb 17A -6.072 0.782 3.154 0.022 0.282 0.068 -2.010 0.000 0.000 -1.389 0.000 0.000 0.000 0.087 1.913 -0.083 0.000 1.218 0.013 -2.015 residue_15
SCORE: input_files/1qys.pdb 18A -2.735 0.127 3.375 0.010 0.704 -0.141 -0.198 0.000 0.000 0.000 0.000 0.000 0.000 0.025 1.595 -0.603 0.000 -2.146 -0.003 0.010 residue_16
注:确保在per_residue_energies中使用与在score_jd2中使用的相同的score函数,以获得可比的结果。
获得每个残基评分的分解
使用residue_energy_breakdown进一步获得每个残基评分的能量分解项。
flags文件内容如下:
-in:file:s input_files/1qys.pdb
-out:file:silent output_files/energy_breakdown.sc
执行命令为:
residue_energy_breakdown.mpi.linuxgccrelease @flag_residue_energy_breakdown
运行结束后会在output_files目录下生成名为energy_breakdown.sc的评分文件,其部分内容如下:
SCORE: pose_id resi1 pdbid1 restype1 resi2 pdbid2 restype2 fa_atr fa_rep fa_sol fa_intra_rep fa_elec pro_close hbond_sr_bb hbond_lr_bb hbond_bb_sc hbond_sc dslf_fa13 rama omega fa_dun p_aa_pp yhh_planarity ref total description
SCORE: input_files/1qys.pdb 1 3A ASP -- -- onebody 0.000 0.000 0.000 0.025 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
...
SCORE: input_files/1qys.pdb 1 3A ASP 2 4A ILE -1.518 0.072 1.027 0.000 0.721 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.301 input_files/1qys.pdb_1_2
...
第一行(在标题行之后)表示第一个残基(PDB编号3A)的内部(单体)能量的分值;第二行表示残基3A和4A的相互作用能。
注:确保在residue_energy_breakdown中使用与在score_jd2中使用的相同的score函数,以获得可比的结果。
其他
对于膜蛋白和对称蛋白则需要额外的文件来评分,不过多介绍了。
总结
虽然评分步骤是确定的,并且应该为给定的评分函数和输入结构提供相同的分数,但如果在相同的PDB上运行score_jd2,则可能得不到相同的分数。
不要比较不同分数函数产生的分数。它们可能意味着非常不同的东西。
在比较之前,应relex结构并使用相同的评分函数。