决策变元选择_决策分支策略——文献学习Learning Rate Based Branching Heuristic for SAT Solvers

Learning Rate Based Branching Heuristic for SAT Solvers

Liang J.H., Ganesh V., Poupart P., Czarnecki K. (2016) Learning Rate Based Branching Heuristic for SAT Solvers. In: Creignou N., Le Berre D. (eds) Theory and Applications of Satisfiability Testing – SAT 2016. SAT 2016. Lecture Notes in Computer Science, vol 9710. Springer, Cham

Abstract

In this paper, we propose a framework for viewing solver branching heuristics as optimization algorithms where the objective is to maximize the learning rate, defined as the propensity for variables to generate learnt clauses. By viewing online variable selection in SAT solvers as an optimization problem, we can leverage a wide variety of optimization algorithms, especially from machine learning, to design effective branching heuristics.

译文：通过将SAT求解器中的在线变量选择视为一个优化问题，我们可以利用各种优化算法，特别是机器学习，来设计有效的分支启发式。

In particular, we model the variable selection optimization problem as an online multi-armed bandit, a special-case of reinforcement learning, to learn branching variables such that the learning rate of the solver is maximized. We develop a branching heuristic that we call learning rate branching or LRB, based on a well-known multi-armed bandit algorithm called exponential recency weighted average and implement it as part of MiniSat and CryptoMiniSat.

　　LRB与multi-armed bandit algorithm的关系：使用指数移动权值平均算法实现LRB。

We upgrade the LRB technique with two additional novel ideas to improve the learning rate by accounting for reason side rate and exploiting locality. The resulting LRB branching heuristic is shown to be faster than the VSIDS and conflict history-based (CHB) branching heuristics on 1975 application and hard combinatorial instances from 2009 to 2014 SAT Competitions. We also show that CryptoMiniSat with LRB solves more instances than the one with VSIDS. These experiments show that LRB improves on state-of-the-art.

Keywords

Learning Rate Slot Machine Implication Graph CDCL Solver Clause Learning

主要内容： learning rate branching (LRB) heuristic

1.变元的学习率定义为：BCP传播与回溯阶段，变元被赋值到变元赋值被取消这段时间中，生成一定数量的学习子句，在这些学习子句中，与该变元相关的子句个数占总的子句个数的比率。

2.变元的学习率活跃度增加公式1：

alfa取值固定，一般取 0.4；
r为与学习率（作为reward)——分子为参与学习子句充当文字或者在冲突分析蕴含图冲突一侧的计数，分母为变元被赋值到变元赋值被取消这段时间中生成总的学习子句数量。

r =

That is, variables with high LR are the ones that frequently appear in the generated learnt clause and/or the conflict side of the implication graph.

3.Extension: Reason Side Rate (RSR)

Let A(v, I) be the number of learnt clauses which v reasons in generating in interval I and let L(I) be the number of learnt clauses generated in interval I. The reason side rate (RSR) of variable v at interval I is defined as

考虑变元处于reason side 一侧时的贡献，则改进的公式：

4.Extension: Locality

理由论述如下：

Recent research shows that VSIDS exhibits locality [20], defined with respect to the community structure of the input CNF instance [1, 20, 25]. Intuitively, if the solver is currently working within a community, it is best to continue focusing on the same community rather than exploring another. We hypothesize that high LR variables also exhibit locality, that is, the branching heuristic can achieve higher LR by restricting exploration.

5.VSIDS、CHB、ERWA(只考虑冲突侧)、ERWA+RSR(考虑了reason侧)、LRB（综合都考虑了）的求解器求解问题对比：说明了改进效果。

1 Introduction

branching heuristic (and its variants)	文献
VSIDS 被提出在2001年	【24】
VSIDS的改进系列版本	[7, 15, 16, 28]
conflict analysis techniques	【23】
the conflict history-based (CHB) branching heuristic 2016年	[19]
learning rate branching (LRB) 2016年
phase-saving	【26】
Exponential Recency Weighted Average (ERWA)	【31】
the decay reinforcement model [13,32]	【13】【32】

S本文提到SAT的应用：

Modern Boolean SAT solvers are a critical component of many innovative techniques in security, software engineering, hardware verification, and AI such as solver-based automated testing with symbolic execution [9], bounded model checking [11] for software and hardware verification, and planning in AI [27] respectively.

对branching heuchrishtics思考视角：

In this paper, we introduce a general principle for designing branching heuristics wherein online variable selection in SAT solvers is viewed as an optimization problem.

The objective to be maximized is called the learning rate (LR), a numerical characterization of a variable’s propensity to generate learnt clauses. The goal of the branching heuristic, given this perspective, is to select branching variables that will maximize the cumulative LR during the run of the solver.

Intuitively, achieving a perfect LR of 1 implies the assigned variable is responsible for every learnt clause generated during its lifetime on the assignment trail.

创新点

Contributions

Contribution I: We define a principle for designing branching heuristics, that is, a branching heuristic should maximize the learning rate (LR). We show that this principle yields highly competitive branching heuristics in practice.
Contribution II: We show how to abstract online variable selection in the multi-armed bandit (MAB) framework. This abstraction provides an interface for applying MAB algorithms directly as branching heuristics. Previously, we developed the conflict history-based (CHB) branching heuristic [19], also inspired by MAB. The key difference between this paper and CHB is that in the case of CHB the rewards are known a priori, and there is no metric being optimized. Whereas in this work, the learning rate is being maximized and is unknown a priori, which requires a bona fide machine learning algorithm to optimize under uncertainty.
Contribution III: We use the MAB abstraction to develop a new branching heuristic called learning rate branching (LRB). The heuristic is built on a well-known MAB algorithm called exponential recency weighted average (ERWA). Given our domain knowledge of SAT solving, we extend ERWA to take advantage of reason side rate and locality [20] to further maximize the learning rate objective. We show in comprehensive apple-to-apple experiments that it outperforms the current state-of-the-art VSIDS [24] and CHB [19] branching heuristics on 1975 instances from four recent SAT Competition benchmarks from 2009 to 2014 on the application and hard combinatorial categories. Additionally, we show that a modified version of CryptoMiniSat with LRB outperforms Glucose, and is very close to matching Lingeling over the same set of 1975 instances.

2 Preliminaries

	2.1 Simple Average and Exponential Moving Average
	2.2 Multi-Armed Bandit (MAB)
	2.3 Clause Learning
	2.4 The VSIDS Branching Heuristic

3. Learning Rate 定义

For example, suppose variable v is assigned by the branching heuristic after 100 learnt clauses are produced. It par-
ticipates in producing the 101-st and 104-th learnt clause. Then v is unassigned after the 105-th learnt clause is produced. In this case, P(v, I) = 2 and L(I) = 5 and hence the LR of variable v is 2/5.

The exact LR of a variable is usually unknown during branching. In the previous example, variable v was picked by the branching heuristic after 100 learnt clauses are produced, but the LR is not known until after the 105-th learnt clause is produced. Therefore optimizing LR involves a degree of uncertainty, which makes the problem well-suited for learning algorithms.

In addition, the LR of a variable changes over time due to modifications to the learnt clause database, stored phases, and assignment trail. As such, estimating LR requires nonstationary algorithms to deal with changes in the underlying environment.

4. Abstracting Online Variable Selection as a Multi-Armed Bandit (MAB) Problem

	将一个比较绕的问题分四步讲解的非常明了： 5.1 Exponential Recency Weighted Average (ERWA) —— 给出Algorithm 1. 5.2 Extension: Reason Side Rate (RSR) —— 给出Algorithm 2. 5.3 Extension: Locality —— the decay reinforcement model [13,32]在此处的应用. 5.4 Putting it all Together to Obtain the Learning Rate Branching (LRB) Heuristic

References

1.

Ansótegui, C., Giráldez-Cru, J., Levy, J.: The community structure of SAT formulas. In: Cimatti, A., Sebastiani, R. (eds.) SAT 2012. LNCS, vol. 7317, pp. 410–423. Springer, Heidelberg (2012)CrossRefGoogle Scholar
2.

Audemard, G., Simon, L.: Predicting learnt clauses quality in modern SAT solvers. In: Proceedings of the 21st International Jont Conference on Artifical Intelligence, IJCAI 2009, pp. 399–404. Morgan Kaufmann Publishers Inc., San Francisco (2009)Google Scholar
3.

Audemard, G., Simon, L.: Refining restarts strategies for SAT and UNSAT. In: Milano, M. (ed.) CP 2012. LNCS, vol. 7514, pp. 118–126. Springer, Heidelberg (2012)CrossRefGoogle Scholar
4.

Audemard, G., Simon, L.: Glucose 2.3 in the SAT 2013 Competition. In: Proceedings of SAT Competition 2013, pp. 42–43 (2013)Google Scholar
5.

Biere, A.: Adaptive restart strategies for conflict driven SAT solvers. In: Kleine Büning, H., Zhao, X. (eds.) SAT 2008. LNCS, vol. 4996, pp. 28–33. Springer, Heidelberg (2008)CrossRefGoogle Scholar
6.

Biere, A.: Lingeling, Plingeling, PicoSAT and PrecoSAT at SAT Race 2010. FMV Report Series Technical report 10(1) (2010)Google Scholar
7.

Biere, A., Fröhlich, A.: Evaluating CDCL variable scoring schemes. In: Heule, M., Weaver, S. (eds.) SAT 2015. LNCS, vol. 9340, pp. 405–422. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-24318-4_29CrossRefGoogle Scholar
8.

Brown, R.G.: Exponential smoothing for predicting demand. Oper. Res. 5, 145–145 (1957)CrossRefGoogle Scholar
9.

Cadar, C., Ganesh, V., Pawlowski, P.M., Dill, D.L., Engler, D.R.: EXE: automatically generating inputs of death. In: Proceedings of the 13th ACM Conference on Computer and Communications Security, CCS 2006, pp. 322–335. ACM, New York (2006)Google Scholar
10.

Carvalho, E., Marques-Silva, J.P.: Using rewarding mechanisms for improving branching heuristics. In: Proceedings of the Seventh International Conference on Theory and Applications of Satisfiability Testing (2004)Google Scholar
11.

Clarke, E., Biere, A., Raimi, R., Zhu, Y.: Bounded model checking using satisfiability solving. Form. Methods Syst. Des. 19(1), 7–34 (2001)CrossRefzbMATHGoogle Scholar
12.

Eén, N., Sörensson, N.: An extensible SAT-solver. In: Giunchiglia, E., Tacchella, A. (eds.) SAT 2003. LNCS, vol. 2919, pp. 502–518. Springer, Heidelberg (2004)CrossRefGoogle Scholar
13.

Erev, I., Roth, A.E.: Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. Am. Econ. Rev. 88(4), 848–881 (1998)Google Scholar
14.

Fröhlich, A., Biere, A., Wintersteiger, C., Hamadi, Y.: Stochastic local search for satisfiability modulo theories. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI 2015, pp. 1136–1143. AAAI Press (2015)Google Scholar
15.

Gershman, R., Strichman, O.: HaifaSat: a new robust SAT solver. In: Ur, S., Bin, E., Wolfsthal, Y. (eds.) HVC 2005. LNCS, vol. 3875, pp. 76–89. Springer, Heidelberg (2006)CrossRefGoogle Scholar
16.

Goldberg, E., Novikov, Y.: BerkMin: a fast and robust sat-solver. Discrete Appl. Math. 155(12), 1549–1561 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
17.

Jeroslow, R.G., Wang, J.: Solving propositional satisfiability problems. Ann. Math. Artif. Intell. 1(1–4), 167–187 (1990)CrossRefzbMATHGoogle Scholar
18.

Lagoudakis, M.G., Littman, M.L.: Learning to select branching rules in the DPLL procedure for satisfiability. Electron. Notes Discrete Math. 9, 344–359 (2001)CrossRefzbMATHGoogle Scholar
19.

Liang, J.H., Ganesh, V., Poupart, P., Czarnecki, K.: Exponential recency weighted average branching heuristic for SAT solvers. In: Proceedings of AAAI 2016 (2016)Google Scholar
20.

Liang, J.H., Ganesh, V., Zulkoski, E., Zaman, A., Czarnecki, K.: Understanding VSIDS branching heuristics in conflict-driven clause-learning SAT solvers. In: Liang, J.H., Ganesh, V., Zulkoski, E., Zaman, A., Czarnecki, K. (eds.) HVC 2015. LNCS, vol. 9434, pp. 225–241. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-26287-1_14CrossRefGoogle Scholar
21.

Loth, M., Sebag, M., Hamadi, Y., Schoenauer, M.: Bandit-based search for constraint programming. In: Schulte, C. (ed.) CP 2013. LNCS, vol. 8124, pp. 464–480. Springer, Heidelberg (2013)CrossRefGoogle Scholar
22.

Marques-Silva, J.: The impact of branching heuristics in propositional satisfiability algorithms. In: Barahona, P., Alferes, J.J. (eds.) EPIA 1999. LNCS (LNAI), vol. 1695, pp. 62–74. Springer, Heidelberg (1999)CrossRefGoogle Scholar
23.

Marques-Silva, J.P., Sakallah, K.A.: GRASP-a new search algorithm for satisfiability. In: Proceedings of the 1996 IEEE/ACM International Conference on Computer-aided Design, ICCAD 1996, pp. 220–227. IEEE Computer Society, Washington, DC (1996)Google Scholar
24.

Moskewicz, M.W., Madigan, C.F., Zhao, Y., Zhang, L., Malik, S.: Chaff: engineering an efficient SAT solver. In: Proceedings of the 38th Annual Design Automation Conference, DAC 2001, pp. 530–535. ACM, New York (2001)Google Scholar
25.

Newsham, Z., Ganesh, V., Fischmeister, S., Audemard, G., Simon, L.: Impact of community structure on SAT solver performance. In: Sinz, C., Egly, U. (eds.) SAT 2014. LNCS, vol. 8561, pp. 252–268. Springer, Heidelberg (2014)Google Scholar
26.

Pipatsrisawat, K., Darwiche, A.: A lightweight component caching scheme for satisfiability solvers. In: Marques-Silva, J., Sakallah, K.A. (eds.) SAT 2007. LNCS, vol. 4501, pp. 294–299. Springer, Heidelberg (2007)CrossRefGoogle Scholar
27.

Rintanen, J.: Planning and SAT. In: Biere, A., Heule, M., van Maaren, H., Walsh, T. (eds.) Handbook of Satisfiability, vol. 185, pp. 483–504. IOS Press, Amsterdam (2009)Google Scholar
28.

Ryan, L.: Efficient Algorithms for Clause-Learning SAT Solvers. Master’s thesis, Simon Fraser University (2004)Google Scholar
29.

Soos, M.: CryptoMiniSat v4. In: SAT Competition, p. 23 (2014)Google Scholar
30.

Stump, A., Sutcliffe, G., Tinelli, C.: StarExec: a cross-community infrastructure for logic solving. In: Demri, S., Kapur, D., Weidenbach, C. (eds.) IJCAR 2014. LNCS, vol. 8562, pp. 367–373. Springer, Heidelberg (2014)Google Scholar
31.

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press Cambridge, Massachusetts (1998)Google Scholar
32.

Yechiam, E., Busemeyer, J.R.: Comparison of basic assumptions embedded in learning models for experience-based decision making. Psychon. Bull. Rev. 12(3), 387–402 (2005)CrossRefGoogle Scholar
33.

Zhang, L., Madigan, C.F., Moskewicz, M.H., Malik, S.: Efficient conflict driven learning in a boolean satisfiability solver. In: Proceedings of the 2001 IEEE/ACM International Conference on Computer-aided Design, ICCAD 2001, pp. 279–285. IEEE Press, Piscataway (2001)Google Scholar

posted on 2020-06-26 12:21 海阔凭鱼跃越阅读(375) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部