文献学习——Guiding CDCL SAT Search via Random Exploration amid Conflict Depression

Guiding CDCL SAT Search via Random Exploration amid Conflict Depression

Md Solimul Chowdhury, Martin M¨ uller, Jia-Huai You
Department of Computing Science, University of Alberta
Edmonton, Alberta, Canada.
{mdsolimu, mmueller, jyou}@ualberta.ca

Abstract

The efficiency of Conflict Driven Clause Learning (CDCL) SAT solving depends crucially on finding conflicts at a
fast rate. State-of-the-art CDCL branching heuristics such as VSIDS, CHB and LRB conform to this goal.

We take a closer look at the way in which conflicts are generated over the course of a CDCL SAT search.

译文：我们将仔细研究在CDCL SAT搜索过程中产生冲突的方式。

Our study of the VSIDS branching heuristic shows that conflicts are typically generated in short bursts, followed by what we call a conflict depression phase in which the search fails to generate any conflicts in a span of decisions. 译文：我们对vsid分支启发式的研究表明，冲突通常在短时间内产生，随后是我们所说的冲突抑郁阶段，在这个阶段中，搜索不能在一系列决策中产生任何冲突。

The lack of conflict indicates that the variables that are currently ranked highest by the branching heuristic fail to generate conflicts.译文：缺少冲突表明，目前在分支启发式中排名最高的变量不能产生冲突。

Based on this analysis, we propose an exploration strategy, called expSAT, which randomly samples variable selection sequences in order to learn an updated heuristic from the generated conflicts. 译文：在此基础上，我们提出了一种探索策略，称为expSAT，它随机抽样变量选择序列，以从产生的冲突中学习得到一个更新的启发式。

The goal is to escape from conflict depressions expeditiously. 译文：目标是迅速摆脱冲突带来的萧条。

The branching heuristic deployed in expSAT combines these updates with the standard VSIDS activity scores. 译文：expSAT中部署的分支启发式方法将这些更新与标准vsid活动分数结合起来。

An extensive empirical evaluation with four state-of-the-art CDCL SAT solvers demonstrates good-to-strong performance gains with the expSAT approach. 译文：一个广泛的经验评估与四个最先进的CDCL SAT解决方案，证明了采用expSAT方法能带来良好的性能收益。

Introduction

	These heuristics reward variables involved in recent conflicts. The intuition is that assignments of these variables are likely to generate further conflicts, leading to useful learned clauses and thus pruning the search space. 译文：这些启发策略奖励最近冲突中涉及的变量。直觉是，这些变量的赋值很可能产生进一步的冲突，导致有用的学习子句，从而修剪搜索空间。

Preliminaries



	We assume familiarity with SAT solving (Biere et al. 2009). Here we briefly review the most relevant concepts. VSIDS Heuristic: VSIDS (Moskewicz et al. 2001) is a popular family of dynamic branching heuristics. We focus on exponential VSIDS as used in gLCM. VSIDS maintains an activity score for each variable in the given formula. It increases the activity score of each variable that is involved in conflict resolution by a variable bumping factor gz, where g > 1 is a constant and z is the count of the number of conflicts in the search so far. This strongly favors variables that participated in the most recent conflicts. Literal Block Distance (LBD): The LBD score (Audemard and Simon 2009) of a learned clause is the number of distinct decision levels in it. If this score is n, then the clause contains n propagation blocks, where each block has been propagated within the same branching decision. As variables in a block are considered to be related, learned clauses with a lower LBD score are likely of higher quality. Especially, when LBD score is 2, they are known to be glue clauses. Global Learning Rate (GLR): Suppose a CDCL solver takes d decisions to solve a given formula F and generates q conflicts. The GLR of the solver for F is defined as q d . GLR measures the overall ability of a solver to generate conflict for a given problem (Liang et al. 2017).

	Software, Hardware and Test Environment In this work, we adopt four baseline solvers: gLCM4, MplCOMSPS5 (winner of SAT-2016), MplCM4 (second runner up of SAT-2018) and MplCBT4 (winner of SAT-2018). While gLCM uses only VSIDS as its branching heuristic, the other three combine VSIDS with other heuristics. All experiments presented in this paper were run on a workstation with 64GB RAM and a processor clock speed of 2.4 GHz. Two test sets were used in experiments. (a) Test Set 1 contains 750 instances from the main track of SAT-2017 (350) and 2018 (400) and is run with a time limit of 5000 seconds per instance. (b) Test Set 2 consists of 52 hard instances from SATCoin (Bit Coin Mining) cryptographic benchmark, which are generated with the instance generator from (Manthey and Heusser 2018). We generated these instances by varying the range parameter, which determines the difficulty of a SATCoin instance. For experiments, we set the time limit to be 36,000 seconds per instance.

Conflict Depression and Conflict Bursts


	Let us represent the conflict history of the search by the sequence of ci and define a conflict depression (CD) phase as a sequence of one or more consecutive decisions with no conflict. 译文：让我们用c_i的序列来表示搜索的冲突历史，并将冲突压抑(CD)阶段定义为一个或多个没有冲突的连续决策的序列。
	Let us define a conflict burst (CB) phase as a sequence of one or more consecutive decisions with at least one conflict.译文：让我们将冲突爆发(CB)阶段定义为一个或多个具有至少一个冲突的连续决策的序列。


	译文：我们注意到，大多数情况下，CD的平均阶段长度都很短，但仍然包含多个决策(蓝色)。译文：不管它们的平均CD阶段长度如何，在给定决策率(黄色)的情况下，几乎所有的CD阶段(橙色)都以很高的速率出现.
	译文：图1右侧的直方图显示了CD相平均长度的分布。译文：这个平均值从2.09到1402.30。263个实例的长度很短(最多3个)。分布是重尾的，有69个平均长度大于25(最右边的bin)。

	Overall, the data indicates that for gLCM on Test Set 1, conflict depressions occur frequently and often last over multiple decisions (high average CD phase length).译文：总的来说，数据表明，在测试集1上的gLCM中，冲突抑郁经常发生，并在多个决策中持续(高平均CD阶段长度)。

	Propagation Depression Amid a CD Phase During a CD phase, VSIDS scores are not a good predictor of a variable’s future performance, since branching decisions fail to produce any conflict and perform only truth value propagations.译文：在CD阶段，vsid分数并不是变量未来性能的良好预测器，因为分支决策不会产生任何冲突，只执行真值传播。 Are there any differences in the pattern of unit propagations between CD and CB phases?译文：CD相和CB阶段的单位传播模式有什么不同吗? We define the Propagation Rate (PR) as the number of propagations per decision. Table 1 compares the average PR values for Test Set 1 over the decisions in CD and CB phases.译文：我们将传播速率(PR)定义为每个决策的传播次数。表1比较了测试集1在CD和CB阶段的平均PR值。 On average, PR values during a CD phase are almost 10 times lower than CB phases. Clearly, this result demonstrates that during a CD phase, VSIDS branching decisions go through propagation depression as well.译文：平均而言，CD阶段的PR值几乎比CB阶段低10倍。显然，这一结果表明，在CD阶段，vsid分支决策也经历了传播抑制。

	Conflict Bursts in gLCM How long are the CB phases compared to CD phases? For the Test Set 1 instances, average value of CB and CD length are 1.67 and 20.63, respectively.译文：与CD相相比CB相有多长?对于Test Set 1实例，CB和CD长度的平均值分别为1.67和20.63。 Thus, on average, shorter CB phases are followed by much longer CD phases.译文：因此，平均而言，较短的转CB阶段之后是较长的CD阶段。

	Bursts of Conflict Generation 冲突的爆发 Table 2 shows the average values of GLR, FDC, FDOC and FDMC for Test Set 1. Column 3 shows the average GLR values for all three types of problems to be close to 0.5.译文：三种问题的平均GLR值均接近0.5。 In contrast, the average FDC values in column 4 are much lower, averaging 0.2507 over all instances.译文：相比之下，第4列中的平均FDC值要低得多，所有实例的平均FDC值为0.2507。 Therefore, on average, about 75% of all the decisions do not produce any conflict and only 25% of all the decisions produce at least one conflict. Further, the majority of the conflict producing decisions produce more than 1 conflict. This is evident in the average FDMC value (0.1535), which is 61% of the total conflict producing decisions (0.2507). As a summary, we have the following conclusions. The typical search behavior contains shorter CB phases, which is followed by longer CD phases, where the search does not find any conflicts. 译文：典型的搜索行为包括较短的CB阶段，随后是较长的CD阶段，在这些阶段中搜索不会发现任何冲突。 • During a CD phase, the search goes through propagation depression as well.译文：在CD阶段，搜索也经历了传播萧条。 • The shorter CB phases are conflict intense, i.e., within a few decisions, many conflicts are generated.译文：较短的CB阶段是冲突激烈的阶段，即在几个决策中产生许多冲突。

Exploration Guided VSIDS 探索vsid的引导作用

	Is it possible to correct the course of the search in a CD phase by identifying promising variables that are currently underranked by VSIDS? 译文：是否有可能通过识别目前vsid低估的有希望的变量来纠正CD阶段的搜索过程? In this work, we address this question by formulating a solver framework, called expSAT, which performs random explorations that probe into the future search space.译文：在这项工作中，我们通过制定一个求解器框架来解决这个问题，称为expSAT，它执行对未来搜索空间的随机探索。 The goal is to discover branching variables that are likely to lead to conflicts from which clauses are learned.译文：其目标是发现可能导致冲突的分支变量。 Given a CDCL SAT solver, expSAT modifies it as follows: 1. Before each branching decision, if a substantial CD phase is detected, then with probability pexp, expSAT performs an exploration episode, consisting of a fixed number nW of random walks. 译文：在每个分支决策之前，如果检测到一个重要的CD阶段，那么expSAT将以概率pexp执行一个由固定数量的nW随机漫步组成探索阶段。 Each walk consists of a limited number of random steps. Each such step consists of the uniform random selection of an unassigned step variable, followed by unit propagation (UP).译文：每次行走都包含有限的随机步数。每个这样的步骤由一个未分配的步骤变量的均匀随机选择，然后是单位传播(UP)。 A walk terminates either when a conflict occurs during UP, or after a fixed number lW of random steps have been taken.译文：当UP期间发生冲突时，或者在执行了固定数量的lW随机步骤后，游走终止 After each walk, the search state is restored and the next walk begins.译文：每次行走之后，将恢复搜索状态，并开始下一次行走。 Fig. 2 illustrates an exploration episode with 3 walks and a maximum of 3 random steps per walk.译文：图2显示了3次行走和每次最多3次随机行走的探索阶段。 • An exploration score is computed for each step variable.译文：为每个步骤变量计算一个探索分数。 • In the CDCL search, branching variables are chosen that maximize the expVSIDS heuristic, which combines the VSIDS activity score of a variable and its exploration score. Ties are broken randomly.译文：在CDCL搜索中，选择的分支变量将使expVSIDS启发式最大化，该启发式结合了变量的VSIDS活动分数和它的探索分数。平局是随机打破的。 • All other elements, such as unit propagation, conflict analysis, restarts, and backjumping, remain the same as in the underlying CDCL SAT solver. 译文：所有其他元素，如单元传播、冲突分析、重启和回跳，都与底层的CDCL SAT解决器保持相同。

	Algorithm Details Input and Parameters All these parameters are explained above, except ω, which we explain below. 译文：所有这些参数都在上面解释了，除了ω，我们在下面解释。 When a random walk ends in a conflict after a series of random steps, some combination of the assigned variables has caused the conflict. 译文：当一个随机漫步在一系列随机步骤之后以冲突结束时，某个指定变量的组合导致了冲突。 In expSAT, we assign the most credit to the most recently assigned variable, and exponentially decay the credit for the variables assigned earlier in the walk, by a factor of ω per decision step. This approach is patterned on reward decay in reinforcement learning (Sutton and Barto 1998).译文：在expSAT中，我们为最近分配的变量分配了最多的积分，并以指数形式衰减在行走中较早分配的变量的积分，每个决策步骤的系数为ω。这种方法是以强化学习中的奖励衰减为模式的(Sutton and Barto 1998)。

Experiments

	We implemented expSAT in four systems gLCM, MplCOMSPS, MplCM and MplCBT, and call the resulting solvers eGLCM, eMplCOMSPS, eMplCM and eMplCBT, respectively. While gLCM uses only VSIDS, MplCOMSPS and MplCM apply a combination of two heuristics, LRB and VSIDS. In addition, MplCBT also employs a third heuristic called Dist (Xiao et al. 2017). Based on the activation of these heuristics, a run in Maple (Mpl) based systems is divided into two phases: phase 1, which lasts for the first 2500 seconds of a run and uses a combination of these heuristics, and phase 2, which starts after 2500 seconds and uses VSIDS exclusively. For the Maple based systems, we apply the expSAT approach only to phase 2.

	We compare the performance of these systems on Test Sets 1 and 2. To set the values of the exploration parameters, we performed a small scale grid search with eGLCM: we took one instance at random out of each benchmark from SAT-2018, which gave a subset of 23 instances. We run eGLCM on this subset for small parameters ranges, lW and nW in [4,5,6] and pexp in [0.01,0.02,0.03]. From this grid search, we chose our default parameter setting (mW,mS, pexp)=(5, 5, 0.02). We set the value of the exponential decay parameter ω to 0.9 based on intuition. These are the values used in the experiments.

	Comparison on Test Set 1 Table 3 (S: SAT, U: UNSAT) shows results for Test Set 1 for four expSAT extensions and their baseline solvers. Overall, each expSAT extension solves more instances and has lower (better) PAR-2 score than its respective baseline. For each of the Maple based system, for a given instance, runs with a baseline and its expSAT extension are identical in phase 1. For these systems, only instances solved in phase 2 show the impact of the expSAT approach. For each Maple based expSAT solver, all the additional instances are solved in phase 2. The best performing system eMplCOMSPS solves 16 more instances than its baseline. eMplCBT solves 9 more instances than its baseline MplCBT. eMplCM solves only 1 more instance than its baseline MplCM. eGLCM solves 7 more instances than its baseline gLCM, where most of the improvements comes from solving 8 additional SAT instances in the SAT-2018 benchmarks. Fig. 3 compares the solving speed of eGLCM (blue line), eMplCOMSPS (red line), eMplCM (yellow line) and eMplCBT (purple line) against their baselines. This figure plots the difference in the number of instances solved as a function of time. At phase 1, there is no difference for the Maple based expSAT solvers and their baselines. eMplCOMSPS (red line) dominates over its baseline for all of phase 2. eMplCBT (purple line) and eMplCM (yellow line) also solve instances at a faster speed than their baseline for most time points. eGLCM (blue line) performs slightly worse than gLCM at the earlier time points, but beats the baseline for the remaining time points.

	Comparison on Test Set 2 For SAT-2018, 17 SATCoin instances were submitted. For the experimental results reported in Table 3, we observe that compared to the baselines gLCM (solves 1) and MplCOMSPS (solves 2), their expSAT extensions, eGLCM (solves 5) and eMplCOMSPS (solves 6), show strong performance gains over these 17 instances6. We further evaluate the expSAT solvers on this benchmark by generating 52 hard instances (Test Set 2), which are different from the 17 instances submitted for SAT-2018. Table 4 compares our expSAT extensions with their respective baselines for Test Set 2. The best performing exp- SAT extensions, eMplCM and eMplCOMSPS, solve 10 and 13 instances respectively, beating their baselines by each solving 9 additional instances. Compared to their baselines, eGLCM and eMplCBT solve 5 and 2 additional instances, respectively. Fig. 4 shows the solve time comparison for the 8 solvers for Test Set 2. Here, all of our expSAT solvers solve the problems at higher speed than their baselines at most of the time points. For this experiment, each of our extended solvers shows strong performance gains over its baseline. To put this exper- iment into perspective, we ran experiment with CryptoMiniSAT57, which is known to be a strong system for solving cryptographic benchmarks. This system solves 41 instances with average solve time 8907.47 secs, while our best performing expSAT solver eMplCBT solves 43, with average solve time of 2518.90 secs.

Analysis of the Experimental Results

Exploration Parameter Adaptation

	A parameter setting that is effective for one instance may not be effective for another. Based on this intuition, we developed an algorithm named paramAdapt to dynamically control when to trigger exploration episodes, and how much exploration to perform in an exploration episode.

	paramAdapt The three exploration parameters nW, lW, and pexp are adapted between CDCL restarts based on the search behavior. A parameter setting is a triple Σ = (nW, lW, pexp), which is updated at the beginning of each restart by paramAdapt by comparing the exploration performance of the two most recent search periods, the period between the latest two restarts and the period before it. The search in expSAT starts with a default value of Σ. paramAdapt keeps track of the following statistics about all

Related Work （前期有随机探索工作文献和 CDCL与SLS结合文献）

Randomized exploration in SAT is used in local search methods such as GSAT (Selman, Levesque, and Mitchell
1992) andWalkSAT (Selman, Kautz, and Cohen 1993).

译文：SAT中的随机探索被用于局部搜索方法，如GSAT(Selman, Levesque，和Mitchell)和walksat (Selman, Kautz和Cohen 1993)。

The Satz algorithm (Li and Anbulagan 1997) heuristically selects a variable x, then performs two separate unit propagations with x and (¬x) respectively, in order to evaluate the potential of x.

译文：Satz算法(Li和Anbulagan 1997)启发式地选择一个变量x，然后分别用x和(¬)执行两个单独的单元传播，以评估x的潜力。

Modern CDCL SAT solvers include exploration components such as a small amount of random variable
selection (E´en and S¨orensson 2003).

UCTSAT (Previti et al. 2011) employs Monte Carlo Tree Search (MCTS) to build a SAT search tree.

译文：UCTSAT (Previti et al. 2011)采用蒙特卡洛树搜索(Monte Carlo Tree Search, MCTS)构建SAT搜索树。

Exploration can make a search process more robust by allowing an escape from early mistakes caused by inaccurate heuristics (Xie et al. 2014).

译文：探索可以通过避免不准确的启发式导致的早期错误而使搜索过程更加健壮(Xie et al. 2014)。

Examples of recently popular exploration methods in search are MCTS (Browne et al. 2012) and the random walk techniques used in classical planning (Nakhost andM¨uller 2009). 译文：最近在搜索中流行的探索方法的例子是MCTS (Browne et al. 2012)和经典规划中使用的随机游走技术(Nakhost and m¨uller 2009)。

These techniques motivated our work on random exploration in CDCL SAT. 译文：这些技术激发了我们在CDCL SAT中进行随机探索的工作。

------------------------------------------------------------------------------

CDCL与SLS的结合文献

SATHYS (Audemard et al. 2010) employs both a CDCL SAT solver and a local search SAT solver. The latter helps the CDCL solver by identifying the most promising literal assignment to branch on, and the CDCL search process guides the local search process to flee from local minima.

译文：SATHYS (Audemard等人2010)同时使用CDCL SAT求解器和本地搜索SAT求解器。后者帮助CDCL求解器识别最有希望的文字分配到分支上，CDCL搜索过程指导局部搜索过程逃离局部最小值。

The Conflict History Based (CHB) (Liang et al. 2016a) and Learning Rate Based (LRB) (Liang et al. 2016b) heuristics model variable selection as a Multi-Armed Bandit (MAB) problem, which is solved using the Exponential Recency Weighted Average (ERWA) algorithm.

译文：基于冲突历史(CHB) (Liang et al. 2016a)和基于学习率(LRB) (Liang et al. 2016b)启发式模型变量选择作为一个多武装强盗(MAB)问题，该问题使用指数近期加权平均(ERWA)算法解决。

Both of these heuristics compute rewards from the conflict history of unassigned variables, in order to rank them.

译文：这两种启发法都从未分配变量的冲突历史中计算奖励，以便对它们进行排序。

In contrast, we modify the VSIDS rank of variables based on the quality of conflicts generated by random exploration of the future states.

译文：相比之下，我们根据对未来状态的随机探索所产生的冲突的质量来修改变量的VSIDS排名。

Compared to the look-ahead based heuristic that maximize the GLR score (Liang et al. 2017), we perform nondeterministic exploration of the search space with a small subset of unassigned variables per random walk, and prioritize variables that generate high-quality conflicts.

译文：与最大化GLR分数的基于前瞻的启发式方法(Liang et al. 2017)相比，我们在每次随机游走中使用一小部分未分配变量对搜索空间进行不确定性探索，并对产生高质量冲突的变量进行优先排序。

As overhead is disregarded in their work, there is no direct basis for comparison.

译文：由于在他们的工作中忽略了间接费用，因此没有直接的比较基础。

Future Work

The ineffectiveness of VSIDS in conflict depressions can be addressed by performing exploration.

Interesting research avenues to explore further include:
1. Integrate expSAT to LRB and CHB based systems

译文：将expSAT与基于LRB和CHB的系统集成

2. Study exploration as in expSAT to guide polarity selection,e.g., by extending the phase-saving heuristic.

译文：研究探索，如expSAT指导极性选择。，通过扩展节省相位的启发式。

3. Develop machine learning methods to predict the onset of a long CD phase.

译文：开发机器学习方法来预测长CD期的开始。

4. Better understand the relationship between properties of CD phases such as length and the performance of a solver.

译文：更好地理解CD相的性质(如长度)与求解器性能之间的关系。

5. Identify characteristics of SAT domains which influence the effectiveness of exploration.

译文：确定影响勘探效果的SAT领域的特征。

posted on 2021-05-29 12:31 海阔凭鱼跃越阅读(212) 评论(0) 编辑收藏举报

刷新页面返回顶部