文献学习--Restart Strategy Selection Using Machine Learning Techniques

Restart Strategy Selection Using Machine Learning Techniques

Haim S., Walsh T. (2009) Restart Strategy Selection Using Machine Learning Techniques. In: Kullmann O. (eds) Theory and Applications of Satisfiability Testing - SAT 2009. SAT 2009. Lecture Notes in Computer Science, vol 5584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02777-2_30

Abstract

Restart strategies are an important factor in the performance of conflict-driven Davis Putnam style SAT solvers. Selecting a good restart strategy for a problem instance can enhance the performance of a solver. Inspired by recent success applying machine learning techniques to predict the runtime of SAT solvers, we present a method which uses machine learning to boost solver performance through a smart selection of the restart strategy.

Based on easy to compute features, we train both a satisfiability classifier and runtime models. 译文：基于容易计算的特征，我们训练了满足度分类器和运行时模型。

We use these models to choose between restart strategies. We present experimental results comparing this technique with the most commonly used restart strategies. Our results demonstrate that machine learning is effective in improving solver performance.

Keywords

Machine Learn Technique 、Observation Window 、Horn Clause 、Runtime Model 、Current Partial Assignment

3 LMPick: A Restart-Strategy Selector

	Since restart strategies are an important factor in the performance of DPLL style SAT solvers, a selection of a good restart strategy for a given instance should improve the performance of the solver for that instance. We suggest that by using supervised machine learning, it is possible to select a good restart strategy for a given instance. We present LMPick, a machine learning based technique which enhances CDCL solvers’ performance.译文：我们提出了LMPick，一种基于机器学习的技术，可以提高CDCL求解器的性能。

	3.1 Restart Strategies Portfolio
	LMPick uses a portfolio of restart strategies from which it chooses the best one for a given instance. Following [11] we recognize several restart strategies that have shown to be effective on one benchmark family or more. We chose 9 restart strategies that represent, to our understanding, a good mapping of commonly used restart strategies.译文：我们选择了9种重启策略，根据我们的理解，它们代表了常用重启策略的良好映射。


	3.2 Supervised Machine Learning
	Satsifiable and unsatisfiable instances from the same benchmark family tend to have different runtime distributions [7]. A runtime prediction model that is trained using both SAT and UNSAT instances performs worse than a homogeneous model. It is better to train a layer of two models, one trained with satisfiable instances (Msat) and the other with unsatisfiable instances (Munsat). Since in most cases we do not know whether a given instance is satisfiable or not we need to determine which of the models is the correct one to query for a given instance according to its probability to be satisfiable. Previous work ([26],[4]) suggests that machine learning can be successfully used for this task as well. A classifier can be trained to estimate the probability of an instance to be satisfiable. Some classification techniques perform better than others, but it seems that for most benchmark families, a classifier with 80% accuracy or more is achievable.

	Using supervised machine learning, we train models offline in order to use them for predictions online. For every training example t ∈ T , where T is the training set, we gather the feature vector x = {x1, x2, . . . , xn} using the features presented in section 3.3. Once the raw data is gathered, we perform a feature selection. We repeatedly remove the feature with the smallest standardised coefficient until no improvement is observed based on the standard AIC (Akaike Information Criterion). We then searched and eliminate co-linear features in the chosen set. The reduced feature vector ˆx is then used to train a classifier and several runtime prediction models. The classifier predicts the probability of an instance to be satisfiable and the runtime models predict cpu-runtime. LMPick trains one classifier, but two runtime models for each restart strategy s ∈ S (where S is the set of all participating strategies) to the total of 2\|S\| models. Each training instance is used to train the satisfiability classifier, labeled with its satisfiability class, and \|S\| runtime models, for each model it is labeled with the appropriate runtime.

	As the classifier, we used a Logistic Regression technique. Any classifier that returns probabilities would be suitable. We found Logistic Regression to be a simple yet effective classifier which was also robust enough to deal with different data sets. We have considered both Sparse Multinomial Linear Regression [15] (suggested to be effective for this task in [25]), and the classifiers suggested by Devlin and O’Sullivan in [4], but the result of all classifiers were on par when using the presented feature vector on our datasets.

	For the runtime prediction models we used Ridge Linear Regression. Using ridge linear regression, we fit our coefficient vector w to create a linear predictor fw (ˆx) = wT ˆxi.We chose ridge regression, since it is a quick and simple technique for numerical prediction, and it was shown to be effective in the Linear Model Predictor (LMP) [10]. While LMP predicts the log of number of conflicts, in this work we found that predicting cpu-runtime is more effective as a selection criterion for restart strategies. Using the number of conflicts as a selection criterion tends to bias the selection towards frequent restart strategies for large instances. This is because an instance with many variables spends more time going down the first branch to a conflict after a restart. This work is unaccounted for when conflicts are used as the cost criterion. Hence a very frequent restart strategy might be very effective in the number of conflict while much less effective in cpu-time.

	3.3 Feature Vector
	There are 4 different sets of features that we used in this study, all are inspired by the two previously discussed techniques - SatZilla [25] and LMP [10]. The first set include only the number of variables and the number of clauses in the original clause database. These values are the only ones that are not normalized. The second set includes variables that are gathered before the solver starts but after removing clauses that are already satisfiable, shrinking clauses with multiple appearances and propagating unit clauses in the original formula. These features are all normalized appropriately. They are inspired by SatZilla and were first suggested in [18]. The third set include statistics that are gathered during the “Observation Window”, this is a period where we analyze the behavior of the solver while solving the instance. The “Observation Window” was first used in [10]. The way the observation window is used in this study will be discussed shortly. The variables in this set are the only ones which are DPLL dependent. The last set includes the same features as the second, but they are calculated at the end of the observation window. A full list of the features is presented in Fig. 1. For further explanation about these features see [18] and [10].

	Fig. 2. Steps in the operation of a restart strategy portfolio based solver. Features sets I through IV are presented in Fig. 1.

	3.4 Operation of the Solver
	Once all runtime models are fitted and the satisfiability classifier is trained, we can use them to improve performance for future instances. The steps that are taken by LMPick are presented in Fig. 2. Since no prediction can be made before the observation window is terminated, and since we favor an early estimation, it is important that the observation window should terminate early in the search. In our preliminary testings we have noticed that the first restart tends to be very noisy, and that results are better if data is collected in the second restart onwards. We have tried several options for the observation window location and size, eventually we opted for a first restart which is very short (100 conflicts), followed by a second restart (of size 2000) which hosts the observation window. Hence the observation window is closed and all data is gathered after 2100 conflicts.


	The restart strategy which is predicted to be the first to terminate is picked, and the solver starts following this strategy from the next restart onwards. Although restart strategies are usually followed from the beginning of the search, we do not want to lose the learned clauses from the first 2100 conflicts. Therefore, we continue the current solving process and keep the already learnt clauses. We denote the restart sequence that takes place from the first restart to termination as LMPicksb . It is important to note that sb = LMPicksb .

4 Results

4.1 Experiment Settings

4.2 Benchmarks


	2 http://www.cprover.org/cbmc/
	– bmc: An ensemble of software verification problems generated using CBMC²verifying the C functions presented in Fig. 3. These two functions are almost identical, apart for a change in line 8, which causes the sat script to overflow. The different instances use different array sizes and different number of unwindings. This dataset represents an ensemble of problems that are very similar and generated by the same process. We use 234 satisfiable and 237 unsatisfiable problems.
	– velev: An ensemble of hardware formal verification problems distributed by Miroslav Velev3. These are well studied verification hardware benchmarks. This ensemble is not as homogeneous as bmc because it is a union of many small benchmark families. We use 72 satisfiable and 105 unsatisfiable instances. – crypto: An ensemble of problems that are generated as part of an attack on the Bivium stream cipher, presented by Eibach, Pilz and V¨olkel [6]. This ensemble presents some interesting characteristics. While it is generated by a non-random process, the instances are significantly smaller than common industrial instances. The satisfiable instances we use were generated with 35 guesses, the unsatisfiable ones were generated with 40 guesses. The reason for this discrepancy is that unsatisfiable instances are harder to solve in this benchmark family, and different number of guesses renders the datasets too easy or too hard. We use 139 sat and 300 unsat instances. – rand: An ensemble of 457 satisfiable and 601 unsatisfiable randomly generated 3-SAT problems with 250 to 450 variables and a clause-to-var ratio of 4.1 to 5.0.
	3 http://www.miroslav-velev.com/sat benchmarks.html. We use the following benchmark families: vliw sat 2.0, vliw sat 2.1, vliw sat 4.0, vliw unsat 2.0, vliw unsat 3.0, vliw unsat 4.0, pipe sat 1.0, pipe sat 1.1, pipe unsat 1.0, pipe unsat 1.1, liveness sat 1.0, liveness unsat 1.0, liveness unsat 2.0, dlx iq unsat 1.0, dlx iq unsat 2.0, engine unsat 1.0, fvp sat 3.0, fvp unsat 1.0, fvp unsat 2.0, fvp unsat 3.0.

References

1.

Biere, A.: Adaptive Restart Strategies for Conflict Driven SAT Solvers. In: Proc. of the 11th Int. Conf. on Theory and Applications of Satisfiability Testing (2008)Google Scholar
2.

Biere, A.: PicoSAT Essentials. Journal on Satisfiability, Boolean Modeling and Computation 4, 75–97 (2008)zbMATHGoogle Scholar
3.

Bregman, D., Mitchell, D.: The SAT solver MXC (version 0.75). Solver Description for the SAT Race 2008 solver competition (2008)Google Scholar
4.

Devlin, D., O’Sullivan, B.: Satisfiability as a Classification Problem. In: Proc. of the 19th Irish Conf. on Artificial Intelligence and Cognitive Science (2008)Google Scholar
5.

Eén, N., Sörensson, N.: An extensible SAT-solver. In: Proc. of the 6th Int. Conf. on Theory and Applications of Satisfiability Testing (2003)Google Scholar
6.

Eibach, T., Pilz, E., Völkel, G.: Attacking Bivium Using Using SAT Solvers. In: Proc. of the 11th Int. Conf. on Theory and Applications of Satisfiability Testing (2008)Google Scholar
7.

Frost, D., Rish, I.: Summarizing CSP hardness with continuous probability distributions. In: Proc. of the 14th National Conf. on Artificial Intelligence (1997)Google Scholar
8.

Goldberg, E., Novikov, Y.: BerkMin: A fast and robust SAT-solver. In: Proc. of Design Automation and Test in Europe (2002)Google Scholar
9.

Gomes, C.P., Selman, B., Kautz, H.: Boosting Combinatorial Search through Randomization. In: Proc. of the 15th National Conf. on Artificial Intelligence (1998)Google Scholar
10.

Haim, S., Walsh, T.: Online Estimation of SAT Solving Runtime. In: Proc. of the 11th Int. Conf. on Theory and Applications of Satisfiability Testing (2008)Google Scholar
11.

Huang, J.: The effect of restarts on the efficiency of clause learning. In: Proc. of the 20th Int. Joint Conf. on Artificial Intelligence (2007)Google Scholar
12.

Huang, J.: A Case for Simple SAT Solvers. In: Proc. of the 13th Int. Conf. on Principles and Practice of Constraint Programming (2007)Google Scholar
13.

Hutter, F., Hamadi, Y., Hoos, H., Leyton-Brown, K.: Performance Prediction and Automated Tuning of Randomized and Parametric Algorithms. In: Proc. of the 12th Int. Conf. on Principles and Practice of Constraint Programming (2006)Google Scholar
14.

Kautz, H., Horvitz, E., Ruan, Y., Gomes, C., Selman, B.: Dynamic Restart Policies. In: Proc. of the 18th National Conf. on Artificial Intelligence (2002)Google Scholar
15.

Krishnapuram, B., Figueiredo, M., Carin, L., Hartemink, A.: Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 957–968 (2005)CrossRefGoogle Scholar
16.

Luby, M., Sinclair, A., Zuckerman, D.: Optimal speedup of Las Vegas algorithms. In: Proc. of the 2nd Israel Symp. on the Theory and Computing Systems (1993)Google Scholar
17.

Moskewicz, M.W., Madigan, C.F., Zhao, Y., Zhang, L., Malik, S.: Chaff: engineering an efficient SAT solver. In: Proc. of the 38th Design Automation Conference (2001)Google Scholar
18.

Nudelman, E., Leyton-Brown, K., Hoos, H.H., Devkar, A., Shoham, Y.: Understanding Random SAT: Beyond the Clauses-to-Variables Ratio. In: Wallace, M. (ed.) CP 2004, vol. 3258, pp. 438–452. Springer, Heidelberg (2004)CrossRefGoogle Scholar
19.

Ruan, Y., Horvitz, E., Kautz, H.: Restart Policies with Dependence among Runs: A Dynamic Programming Approach. In: Van Hentenryck, P. (ed.) CP 2002, vol. 2470, p. 573. Springer, Heidelberg (2002)CrossRefGoogle Scholar
20.

Ruan, Y., Horvitz, E., Kautz, H.: Hardness-aware restart policies. In: The 18th Int. Joint Conference on Artificial Intelligence: Workshop on Stochastic Search (2003)Google Scholar
21.

Ryan, L.: Efficient algorithms for clause learning SAT solvers. Master thesis, Simon Fraser University, School of Computing Science (2004)Google Scholar
22.

Ryvchin, V., Strichman, O.: Local Restarts. In: Kleine Büning, H., Zhao, X. (eds.) SAT 2008. LNCS, vol. 4996, pp. 271–276. Springer, Heidelberg (2008)CrossRefGoogle Scholar
23.

Walsh, T.: Search in a Small World. In: Proc. of the 12th Int. Joint Conference on Artificial Intelligence (1999)Google Scholar
24.

Wu, H., van Beek, P.: On Universal Restart Strategies for Backtracking Search. In: Bessière, C. (ed.) CP 2007. LNCS, vol. 4741, pp. 681–695. Springer, Heidelberg (2007)CrossRefGoogle Scholar
25.

Xu, L., Hutter, F., Hoos, H., Leyton-Brown, K.: SATzilla: Portfolio-based Algorithm Selection for SAT. Journal of Artificial Intelligence Research 32, 565–606 (2008)zbMATHGoogle Scholar
26.

Xu, L., Hoos, H., Leyton-Brown, K.: Hierarchical Hardness Models for SAT. In: Bessière, C. (ed.) CP 2007. LNCS, vol. 4741, pp. 696–711. Springer, Heidelberg (2007)CrossRefGoogle Scholar

posted on 2020-12-26 11:26 海阔凭鱼跃越阅读(130) 评论(0) 编辑收藏举报

刷新页面返回顶部