Modern CDCL SAT solvers maintain lots of state features such as the partial assignment, trail, learnt clause database, saved phases, etc. 译文:现代CDCL SAT求解器维护了许多状态特性,如部分赋值、跟踪、学习子句数据库、保存阶段等。


Understanding VSIDS Branching Heuristics in Conflict-Driven Clause-Learning SAT Solvers

  • Jia Hui LiangEmail author
  • Vijay Ganesh
  • Ed Zulkoski
  • Atulan Zaman
  • Krzysztof Czarnecki

Conflict-Driven Clause-Learning (CDCL) SAT solvers crucially depend on the Variable State Independent Decaying Sum (VSIDS) branching heuristic for their performance. Although VSIDS was proposed nearly fifteen years ago, and many other branching heuristics for SAT solving have since been proposed, VSIDS remains one of the most effective branching heuristics. Despite its widespread use and repeated attempts to understand it, this additive bumping and multiplicative decay branching heuristic has remained an enigma.

译文:冲突驱动的条款学习(CDCL) SAT求解器的性能主要依赖于变量状态独立衰减和(vsid)分支启发法。尽管VSIDS在近15年前就被提出了,而且此后也提出了许多用于SAT求解的其他分支启发法,但VSIDS仍然是最有效的分支启发法之一。尽管它的广泛使用和反复尝试理解,这种加法碰撞和乘法衰减分支启发仍然是一个谜。


In this paper, we advance our understanding of VSIDS by answering the following key questions. The first question we pose is “what is special about the class of variables that VSIDS chooses to additively bump?” In answering this question we showed that VSIDS overwhelmingly picks, bumps, and learns bridge variables, defined as the variables that connect distinct communities in the community structure of SAT instances. This is surprising since VSIDS was invented more than a decade before the link between community structure and SAT solver performance was discovered. Additionally, we show that VSIDS viewed as a ranking function correlates strongly with temporal graph centrality measures. Putting these two findings together, we conclude that VSIDS picks high-centrality bridge variables.




The second question we pose is “what role does multiplicative decay play in making VSIDS so effective?” We show that the multiplicative decay behaves like an exponential moving average (EMA) that favors variables that persistently occur in conflicts (the signal) over variables that occur intermittently (the noise).




The third question we pose is “whether VSIDS is temporally and spatially focused.” We show that VSIDS disproportionately picks variables from a few communities unlike, say, the random branching heuristic. We put these findings together to invent a new adaptive VSIDS branching heuristic that solves more instances than one of the best-known VSIDS variants over the SAT Competition 2013 benchmarks.





1 int lbd_val = lbd(learnt_clause);       //计算新增学习子句的lbd值
2  lbd_ema = lbd_ema_decay * lbd_ema + (1 - lbd_ema_decay) * lbd_val;//整体LBD的移动平均值
3  if (lbd_val >= lbd_ema) {               //第一种情况
4      decays++;                           //正常普通衰减模式使用次数计数
5      varDecayActivity(var_decay);        //正常模式下var_decay为0.85,变元活跃度碰撞增加较快
6  } else {                                //第二种情况:学习子句lbd低于移动平均
7      thresh_decays++;                    // 学习子句lbd低于移动平均lbd时,单独计数
8      varDecayActivity(var_thresh_decay); //var_thresh_decay为0.95,变元活跃度碰撞增加较慢
9  }


 2 Background


术语vsid指在现代CDCL SAT求解程序中广泛使用的一组分支启发法,这些启发法在求解程序运行期间对布尔公式的所有变量进行排序。就目前的情况来看,VSIDS明显比其他著名的启发式方法更有效,如DLIS[33]、MOM[18]、Jeroslow-Wang[28]和BOHM[12]。VSIDS最初作为箔条解决程序[36]的一部分引入时是一个重大突破。关键的思想是收集对所学从句的统计信息,以指导搜索的方向,在哪里最近学过的从句是受欢迎的。VSIDS的关键特性是加性碰撞和乘性衰变行为,具体描述如下。vsid的另一个优点是计算开销低。我们主要关注vsid的两个比较著名的变体,即Chaff[36]的变体和miniat 2.2.0[15]的变体。我们将这些变体分别称为cVSIDS和mVSIDS。这两种变体都具有下面列出的共同特征。

 VSIDS的共同特征 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Activity Score, Initialization and VSIDS Ranking. VSIDS assigns a floating point number, called activity, to each variable in the Boolean formula. At the begining of a run of a solver, the activity scores of all variables are typically initialized to 0. We refer to the ranking of variables according to their activity scores in the decreasing order as the VSIDS ranking. VSIDS picks the variable with the highest activity to branch on.


Additive Bump and Multiplicative Decay. When the solver learns a clause, a set of variables is chosen, and their activities are additively increased, typically by 1. The quantum of this increase is called the (additive) bump. At regular intervals during the run of the solver, the activities of all variables are multiplied by a constant 0<α<10<α<1 called the (multiplicative) decay factor.

译文:加法碰撞和乘法衰变。当求解器学习一个子句时,会选择一组变量,并且它们的活动会增加,通常是增加1。译文:这种增加的量称为(附加的)增加。在解算器的运行,定期的活动所有变量乘以一个常数0 <α< 1(乘法)衰减系数。


Chaff中VSIDS的变体:cVSIDS -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

cVSIDS. The activities of variables occurring in the newest learnt clause are bumped up by 1, immediately after the clause is learnt. The activities of all variables are multiplied by a constant 0<α<10<α<1. The decay occurs after every i conflicts. We follow the policy used in recent solvers like MiniSAT and use i=1.

264  void Solver::analyze(CRef confl, vec<Lit>& out_learnt, int& out_btlevel)
265 {
353 for (int i = 0; i < out_learnt.size(); i++) { 354 varBumpActivity(var(out_learnt[i])); 355 }
359 }


miniat 2.2.0VSIDS的变体:mVSIDS  --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

mVSIDS. The activities of all variables resolved during conflict analysis that lead to the learnt clause (including the variables in the learnt clause) are bumped up by 1. The activities of all variables are decayed as in cVSIDS.

264 void Solver::analyze(CRef confl, vec<Lit>& out_learnt, int& out_btlevel)
265 {
267     int pathC = 0;
268     Lit p     = lit_Undef;
270     // Generate conflict clause:
271     //
272     out_learnt.push();      // (leave room for the asserting literal)
273     int index   = trail.size() - 1;
275     do{
276         assert(confl != CRef_Undef); // (otherwise should be UIP)
277         Clause& c = ca[confl];
279         if (c.learnt())
280             claBumpActivity(c);
282         for (int j = (p == lit_Undef) ? 0 : 1; j < c.size(); j++){
283             Lit q = c[j];
285             if (!seen[var(q)] && level(var(q)) > 0){
286                 varBumpActivity(var(q));
287                 seen[var(q)] = 1;
288                 if (level(var(q)) >= decisionLevel())
289                     pathC++;
290                 else
291                     out_learnt.push(q);
292             }
293         }
295         // Select next clause to look at:
296        while (!seen[var(trail[index--])]);
297         p     = trail[index+1];
298         confl = reason(var(p));
299         seen[var(p)] = 0;
300         pathC--;
302     }while (pathC > 0);
303     out_learnt[0] = ~p;
305     // Simplify conflict clause:
306     //
307     int i, j;
308     out_learnt.copyTo(analyze_toclear);
309     if (ccmin_mode == 2){
310         uint32_t abstract_level = 0;
311         for (i = 1; i < out_learnt.size(); i++)
312             abstract_level |= abstractLevel(var(out_learnt[i])); 
314         for (i = j = 1; i < out_learnt.size(); i++)
315             if (reason(var(out_learnt[i])) == CRef_Undef || !litRedundant(out_learnt[i], abstract_level))
316 out_learnt[j++] = out_learnt[i]; 317 318 }else if (ccmin_mode == 1){ 319 for (i = j = 1; i < out_learnt.size(); i++){ 320 Var x = var(out_learnt[i]); 321 322 if (reason(x) == CRef_Undef) 323 out_learnt[j++] = out_learnt[i]; 324 else{ 325 Clause& c = ca[reason(var(out_learnt[i]))]; 326 for (int k = 1; k < c.size(); k++) 327 if (!seen[var(c[k])] && level(var(c[k])) > 0){ 328 out_learnt[j++] = out_learnt[i]; 329 break; } 330 } 331 } 332 }else 333 i = j = out_learnt.size(); 334 335 max_literals += out_learnt.size(); 336 out_learnt.shrink(i - j); 337 tot_literals += out_learnt.size(); 338 339 // Find correct backtrack level: 340 // 341 if (out_learnt.size() == 1) 342 out_btlevel = 0; 343 else{ 344 int max_i = 1; 345 // Find the first literal assigned at the next-highest level: 346 for (int i = 2; i < out_learnt.size(); i++) 347 if (level(var(out_learnt[i])) > level(var(out_learnt[max_i]))) 348 max_i = i; 349 // Swap-in this literal at index 1: 351 Lit p = out_learnt[max_i]; 352 out_learnt[max_i] = out_learnt[1]; 353 out_learnt[1] = p; 354 out_btlevel = level(var(p)); 355 } 356 357 for (int j = 0; j < analyze_toclear.size(); j++) seen[var(analyze_toclear[j])] = 0;
358 }


Variable Incidence Graph (VIG)

 vertices of the graph are the variables in the formula. For every clause cFc∈F we have an edge between each pair of variables in c. In other words, each clause corresponds to a clique between its variables. The weight of an edge is 。。。  

where |c| is the length of the clause. VIG does not distinguish between positive and negative occurrences of variables. We combine all edges between each pair of vertices into one weighted edge by summing the weights. More precisely, the VIG of a CNF formula F is a weighted graph defined as follows: set of vertices V=VarV=Var, set of edges 。。。, and the weight function w(xy) 。。。



3 Contribution I and II: Community-Focused Search, Bridge Variables, and VSIDS

The Hypotheses. 

Here we state the three hypotheses that we tested in this section: (1) Bridge Experiment: VSIDS disproportionately picks, bumps, and learns the bridge variables, (2) Spatial Focus Experiment: VSIDS disproportionately picks from a smaller number of communities rather than a large fraction of the communities of a SAT instance, and (3) Temporal Focus Experiment: VSIDS typically picks from recently-seen communities.




Results and Interpretations of Bridge Variable Experiment. 

Recent research suggests that CDCL solvers take advantage of good community structure in SAT instances [38] leading to faster solving time. The reason for this phenomenon is not fully understood. One possibility is that good community structure lends itself to divide-and-conquer because the bridges are easier to cut (i.e., satisfy). More precisely, the solver can focus its attention on the bridges by picking the bridge variables and assigning them appropriate values. When it eventually assigns the correct values to enough bridges, the VIG is divided into multiple components, and each component can be solved with no interference from each other. Even if the VIG cannot be completely separated, it may still be beneficial to the cut bridges between communities so that these communities can be solved relatively independently.

关于表1——最近的研究表明,CDCL求解者在SAT实例[38]中利用了良好的社区结构,从而加快了求解时间。一种可能性是,良好的社区结构有助于分治,因为桥梁更容易被切断(即,社区结构更容易被破坏,满足) 。   更准确地说,求解器可以通过选择桥接变量并为它们分配适当的值来将注意力集中在桥接上。当它最终将正确的值分配给足够多的桥时,VIG被划分为多个组件,并且每个组件可以在互不干扰的情况下解决。即使VIG不能完全分离,它仍然可能有利于社区之间的桥梁,使这些社区可以相对独立地解决。


Results and Interpretations of Temporal and Spatial Focused Search Experiments. Table 2a depicts the average Gini coefficient for the Spatial-Experiment. Both VSIDS techniques exhibit much more inequality relative to random branching for the application and combinatorial instances, indicating that VSIDS may be attempting to hone in on certain communities.


The very low values for random instances indicate that none of the branching heuristics typically favor certain communities, likely due to the poor community structures exhibited by such instances.



Table 2b demonstrates that VSIDS techniques are much more temporally focused on average than random branching. It is commonly believed that VSIDS improves the search locality [3237] which in turn improves solver performance. However, this term search locality has previously been not rigorously defined. We precisely defined spatial focus and temporal focus, and show that VSIDS displays high search locality in terms of these definitions.




4 Contribution III: Experimental Evidence Supporting Strong Correlation Between TGC and VSIDS

In this section, we describe the experiments to support the hypothesis that the VSIDS variants cVSIDS and mVSIDS, viewed as ranking functions, correlate strongly with both temporal degree centrality and temporal eigenvector centrality according to Spearman’s rank correlation coefficient and top-k measures. Combining the results of this section with Contribution I (namely, VSIDS picks, bumps and learns over bridge variables), we conclude that VSIDS picks high-centrality bridge variables.



(1) temporal variable incidence graph (TVIG)  时间变量关联图

In the TVIG, every clause is labeled with a timestamp denoted t(c). The t(c) is equal to 0 if c is a clause from the original input formula, otherwise t(c) is equal to the number conflicts up to the learning of c. 原始子句的时间戳为0;学习子句的时间戳为生成该子句时刻求解冲突数。

We refer to the difference between the current time t and the timestamp of a clause t(c) as the age of the clause:

age(c) = t-t(c)


\frac{\alpha ^{age(e)}}{|c|-1}

More precisely, the TVIG of a clause database at time t is defined in the same way as VIG except with a modified weight function that takes the ages of clauses into account:译文:更准确地说,一个子句数据库在t时刻的TVIG的定义方法与VIG相同,只是添加了一个修改后的权函数,该函数将子句的年龄考虑在内:

w(xy) = \sum _{x,y\in c \in F} \frac{\alpha ^{age(c)}}{|c|-1}

 Observe that the TVIG evolves throughout the solving process: as new learnt clauses are added, new edges are added to the graph, and all the ages increase. As an edge’s age increases, its weight decreases exponentially with time assuming no new learnt clause contains its variables. In many domains, it is often the case that more recent data points are more useful than older data points.


 (2)graph centrality measure 图中心性度量是一个函数,它为图中的每个顶点分配一个实数。

 A graph centrality measure is a function that assigns a real number to each vertex in a graph. The number associated with each vertex denotes its relative importance in the graph [161941]. For example, the degree centrality [16] of a vertex in a graph is defined as the degree of the vertex.


The eigenvector centrality of a vertex in a graph is defined as its corresponding value in the eigenvector of the greatest eigenvalue of the graph’s adjacency matrix. We similarly define the temporal versions of degree and eigenvector centrality. The key idea needed to define temporal graph centrality measures is to incorporate temporal information inside the TVIG. 


 The temporal degree centrality (TDC) ——时间程度中心性(TDC)     

temporal eigenvector centrality (TEC)——时间特征向量中心性 (TEC)   t时刻TVIG中顶点的特征向量中心性


 (3)Methodology for Comparing Rankings based on Spearman’s Rank Correlation Coefficient.基于斯皮尔曼等级相关系数的排名比较方法。


It is commonly believed that VSIDS focuses on the “most constrained part of the formula” [24], and that this is responsible for its effectiveness.


However, the term “most constrained part of the formula” has previously not been well-defined in a mathematically precise manner. 





7 Interpretation of Results

We began our research by posing a series of questions regarding VSIDS, and we now interpret the results obtained in light of these questions.


(1)What is special about the class of variables that VSIDS chooses to additively bump? 译文:vsid选择的这类变量有什么特别之处?

 In the bridge variables experiment (Sect. 3), we showed that VSIDS disproportionately favored bridge variables.



Even though SAT instances have large number of bridge variables on average, the frequency with which VSIDS picks, bumps, and learns bridge variables is much higher.



There is no a priori reason to believe that VSIDS would behave like this. This surprising result, plus a previous result that good community structure correlates with faster solving time [38], suggests CDCL solvers exploit community structure.



More precisely, they target variables linking distinct communities, possibly as a way to solve by divide-and-conquer approach.



In the VSIDS vs. TGC experiments (Sect. 4), we used the Spearman’s rank correlation coefficient to show that the VSIDS and TGC rankings are strongly correlated.



From our experiments, we can say that for all the VSIDS variants considered in this paper, additive bumping matches with the increase in centrality of the chosen variables.



We also observe from our results that the variables that solvers pick for branching have very high TGC rank. The concept of centrality allows us to define in a mathematically precise the intuition many solver developers have had, i.e., that branching on “highly constrained variables” is an effective strategy.



Our bridge variable experiment combined with the TGC experiment suggests that VSIDS focuses on high-centrality bridge variables.



 (2)What role does multiplicative decay play in making VSIDS so effective?译文:在使vsid如此有效的过程中,乘法衰减起了什么作用?

(Answered by Contribution IV, that in turn led to a new adaptive VSIDS presented as Contribution V.) We show that multiplicative decay is essentially a form of exponential smoothing (Sect. 5).



We add an explanation as to why this is important, namely, that exponential smoothing favors variables that persistently occur in conflicts and this is a better strategy for root-cause analysis.



We designed a new VSIDS technique, we call adaptVSIDS, based on the above results, wherein we rapidly decay the VSIDS activity if the learnt clause LBDs are large (Sect. 6). We showed that this technique is better than mVSIDS on the SAT Competition 2013 benchmark.



(3)Is VSIDS temporally and spatially focused? (Answered by Contribution II.)译文:vsid是否关注时间和空间?

We show that VSIDS exhibits spatial focus and temporal focus (Sect. 3), forms of locality in search. While there has been speculation among solver researchers that CDCL with VSIDS solvers perform local search, we precisely define spatial and temporal locality in terms of the community structure.



8 Related Work

Marques-Silva and Sakallah are credited with inventing the CDCL technique [34]. The original VSIDS heuristic was invented by the authors of Chaff [36].

Armin Biere [8] described the low-pass filter behavior of VSIDS, and Huang et al. [26] stated that VSIDS is essentially an EMA.

译文:Armin Biere[8]描述了VSIDS的低通滤波行为,Huang等人[26]认为VSIDS本质上是EMA(指数移动平均)


Katsirelos and Simon [30] were the first to publish a connection between eigenvector centrality and branching heuristics. In their paper [30], the authors computed eigenvector centrality (via Google PageRank) only once on the original input clauses and showed that most of the decision variables have higher than average centrality.




Also, it bears stressing that their definition of centrality is not temporal. By contrast, our results correlate VSIDS ranking with temporal degree and eigenvector centrality, and show the correlation holds dynamically throughout the run of the solver.




Also, we noticed that the correlation is also significantly stronger after extending centrality with temporality. Simon and Katsirelos do hypothesize that VSIDS may be picking bridge variables (they call them fringe variables). However, they do not provide experimental evidence for this.


To the best of our knowledge, we are the first to establish the following results regarding VSIDS: first, VSIDS picks, bumps, and learns high-centrality bridge variables; second, VSIDS-influenced search is more spatially and temporally focused than other branching heuristics we considered; third, explain the importance of EMA (multiplicative decay) to the effectiveness of VSIDS; and fourth, invent a new adaptive VSIDS branching heuristic based on our observations.








