Restart Strategy Selection Using Machine Learning Techniques

Haim S., Walsh T. (2009) Restart Strategy Selection Using Machine Learning Techniques. In: Kullmann O. (eds) Theory and Applications of Satisfiability Testing - SAT 2009. SAT 2009. Lecture Notes in Computer Science, vol 5584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02777-2_30


Abstract

 

Restart strategies are an important factor in the performance of conflict-driven Davis Putnam style SAT solvers. Selecting a good restart strategy for a problem instance can enhance the performance of a solver. Inspired by recent success applying machine learning techniques to predict the runtime of SAT solvers, we present a method which uses machine learning to boost solver performance through a smart selection of the restart strategy.

Based on easy to compute features, we train both a satisfiability classifier and runtime models. 译文:基于容易计算的特征,我们训练了满足度分类器和运行时模型。

We use these models to choose between restart strategies. We present experimental results comparing this technique with the most commonly used restart strategies. Our results demonstrate that machine learning is effective in improving solver performance.

   

Keywords

        Machine Learn Technique 、Observation Window 、Horn Clause 、Runtime Model 、Current Partial Assignment 

 

 

3 LMPick: A Restart-Strategy Selector

 

Since restart strategies are an important factor in the performance of DPLL style SAT solvers, a selection of a good restart strategy for a given instance should improve the performance of the solver for that instance. We suggest that by using supervised machine learning, it is possible to select a good restart strategy for a given instance.

We present LMPick, a machine learning based technique which enhances CDCL solvers’ performance.译文:我们提出了LMPick,一种基于机器学习的技术,可以提高CDCL求解器的性能。

   
   3.1 Restart Strategies Portfolio
   

LMPick uses a portfolio of restart strategies from which it chooses the best one for a given instance. Following [11] we recognize several restart strategies that have shown to be effective on one benchmark family or more.

We chose 9 restart strategies that represent, to our understanding, a good mapping of commonly used restart strategies.译文:我们选择了9种重启策略,根据我们的理解,它们代表了常用重启策略的良好映射。

 
   
  3.2 Supervised Machine Learning 
   

Satsifiable and unsatisfiable instances from the same benchmark family tend
to have different runtime distributions [7]. A runtime prediction model that is
trained using both SAT and UNSAT instances performs worse than a homogeneous
model. It is better to train a layer of two models, one trained with
satisfiable instances (Msat) and the other with unsatisfiable instances (Munsat).
Since in most cases we do not know whether a given instance is satisfiable or
not we need to determine which of the models is the correct one to query for

a given instance according to its probability to be satisfiable. Previous work
([26],[4]) suggests that machine learning can be successfully used for this task
as well. A classifier can be trained to estimate the probability of an instance to
be satisfiable. Some classification techniques perform better than others, but it
seems that for most benchmark families, a classifier with 80% accuracy or more
is achievable.

   
   

Using supervised machine learning, we train models offline in order to use
them for predictions online. For every training example t ∈ T , where T is the
training set, we gather the feature vector x = {x1, x2, . . . , xn} using the features
presented in section 3.3. Once the raw data is gathered, we perform a feature
selection. We repeatedly remove the feature with the smallest standardised coefficient
until no improvement is observed based on the standard AIC (Akaike
Information Criterion). We then searched and eliminate co-linear features in the
chosen set. The reduced feature vector ˆx is then used to train a classifier and
several runtime prediction models. The classifier predicts the probability of an
instance to be satisfiable and the runtime models predict cpu-runtime. LMPick
trains one classifier, but two runtime models for each restart strategy s ∈ S
(where S is the set of all participating strategies) to the total of 2|S| models.
Each training instance is used to train the satisfiability classifier, labeled with
its satisfiability class, and |S| runtime models, for each model it is labeled with
the appropriate runtime.

   
   

As the classifier, we used a Logistic Regression technique. Any classifier that
returns probabilities would be suitable. We found Logistic Regression to be a
simple yet effective classifier which was also robust enough to deal with different
data sets. We have considered both Sparse Multinomial Linear Regression [15]

(suggested to be effective for this task in [25]), and the classifiers suggested by
Devlin and O’Sullivan in [4], but the result of all classifiers were on par when
using the presented feature vector on our datasets.

   
   

For the runtime prediction models we used Ridge Linear Regression. Using
ridge linear regression, we fit our coefficient vector w to create a linear predictor
fw (ˆx) = wT ˆxi.We chose ridge regression, since it is a quick and simple technique
for numerical prediction, and it was shown to be effective in the Linear Model
Predictor (LMP) [10]. While LMP predicts the log of number of conflicts, in this
work we found that predicting cpu-runtime is more effective as a selection criterion
for restart strategies. Using the number of conflicts as a selection criterion
tends to bias the selection towards frequent restart strategies for large instances.
This is because an instance with many variables spends more time going down
the first branch to a conflict after a restart. This work is unaccounted for when
conflicts are used as the cost criterion. Hence a very frequent restart strategy
might be very effective in the number of conflict while much less effective in
cpu-time.

   
   3.3 Feature Vector
   

There are 4 different sets of features that we used in this study, all are inspired
by the two previously discussed techniques - SatZilla [25] and LMP [10]. The first
set include only the number of variables and the number of clauses in the original
clause database. These values are the only ones that are not normalized. The
second set includes variables that are gathered before the solver starts but after
removing clauses that are already satisfiable, shrinking clauses with multiple
appearances and propagating unit clauses in the original formula. These features
are all normalized appropriately. They are inspired by SatZilla and were first
suggested in [18]. The third set include statistics that are gathered during the
“Observation Window”, this is a period where we analyze the behavior of the
solver while solving the instance. The “Observation Window” was first used in
[10]. The way the observation window is used in this study will be discussed
shortly. The variables in this set are the only ones which are DPLL dependent.
The last set includes the same features as the second, but they are calculated
at the end of the observation window. A full list of the features is presented in
Fig. 1. For further explanation about these features see [18] and [10].

   
 

 

Fig. 2. Steps in the operation of a restart strategy portfolio based solver. Features sets
I through IV are presented in Fig. 1.

   
   3.4 Operation of the Solver
 

Once all runtime models are fitted and the satisfiability classifier is trained, we
can use them to improve performance for future instances. The steps that are
taken by LMPick are presented in Fig. 2.
Since no prediction can be made before the observation window is terminated,
and since we favor an early estimation, it is important that the observation
window should terminate early in the search. In our preliminary testings we
have noticed that the first restart tends to be very noisy, and that results are
better if data is collected in the second restart onwards. We have tried several
options for the observation window location and size, eventually we opted for a
first restart which is very short (100 conflicts), followed by a second restart (of
size 2000) which hosts the observation window. Hence the observation window
is closed and all data is gathered after 2100 conflicts.

 
   
 

The restart strategy which is predicted to be the first to terminate is picked,
and the solver starts following this strategy from the next restart onwards. Although
restart strategies are usually followed from the beginning of the search,
we do not want to lose the learned clauses from the first 2100 conflicts. Therefore,
we continue the current solving process and keep the already learnt clauses. We
denote the restart sequence that takes place from the first restart to termination
as LMPicksb . It is important to note that sb = LMPicksb .

   

4 Results

4.1 Experiment Settings

4.2 Benchmarks

 
  2 http://www.cprover.org/cbmc/
 

– bmc: An ensemble of software verification problems generated using CBMC2  verifying the C functions presented in Fig. 3.

These two functions are almost identical, apart for a change in line 8, which causes the sat script to overflow. The different instances use different array sizes and different number of unwindings. This dataset represents an ensemble of problems that are very similar and generated by the same process. We use 234 satisfiable and 237 unsatisfiable problems.

 

– velev: An ensemble of hardware formal verification problems distributed
by Miroslav Velev3. These are well studied verification hardware benchmarks.
This ensemble is not as homogeneous as bmc because it is a union of
many small benchmark families. We use 72 satisfiable and 105 unsatisfiable
instances.
– crypto: An ensemble of problems that are generated as part of an attack on
the Bivium stream cipher, presented by Eibach, Pilz and V¨olkel [6]. This
ensemble presents some interesting characteristics. While it is generated by
a non-random process, the instances are significantly smaller than common
industrial instances. The satisfiable instances we use were generated with 35
guesses, the unsatisfiable ones were generated with 40 guesses. The reason
for this discrepancy is that unsatisfiable instances are harder to solve in this
benchmark family, and different number of guesses renders the datasets too
easy or too hard. We use 139 sat and 300 unsat instances.
– rand: An ensemble of 457 satisfiable and 601 unsatisfiable randomly generated
3-SAT problems with 250 to 450 variables and a clause-to-var ratio of
4.1 to 5.0.

 

3 http://www.miroslav-velev.com/sat benchmarks.html. We use the following benchmark
families: vliw sat 2.0, vliw sat 2.1, vliw sat 4.0, vliw unsat 2.0, vliw unsat 3.0,
vliw unsat 4.0, pipe sat 1.0, pipe sat 1.1, pipe unsat 1.0, pipe unsat 1.1, liveness sat
1.0, liveness unsat 1.0, liveness unsat 2.0, dlx iq unsat 1.0, dlx iq unsat 2.0, engine
unsat 1.0, fvp sat 3.0, fvp unsat 1.0, fvp unsat 2.0, fvp unsat 3.0.

 

 

 

 

References

 
  1. 1.
    Biere, A.: Adaptive Restart Strategies for Conflict Driven SAT Solvers. In: Proc. of the 11th Int. Conf. on Theory and Applications of Satisfiability Testing (2008)Google Scholar
  2. 2.
    Biere, A.: PicoSAT Essentials. Journal on Satisfiability, Boolean Modeling and Computation 4, 75–97 (2008)zbMATHGoogle Scholar
  3. 3.
    Bregman, D., Mitchell, D.: The SAT solver MXC (version 0.75). Solver Description for the SAT Race 2008 solver competition (2008)Google Scholar
  4. 4.
    Devlin, D., O’Sullivan, B.: Satisfiability as a Classification Problem. In: Proc. of the 19th Irish Conf. on Artificial Intelligence and Cognitive Science (2008)Google Scholar
  5. 5.
    Eén, N., Sörensson, N.: An extensible SAT-solver. In: Proc. of the 6th Int. Conf. on Theory and Applications of Satisfiability Testing (2003)Google Scholar
  6. 6.
    Eibach, T., Pilz, E., Völkel, G.: Attacking Bivium Using Using SAT Solvers. In: Proc. of the 11th Int. Conf. on Theory and Applications of Satisfiability Testing (2008)Google Scholar
  7. 7.
    Frost, D., Rish, I.: Summarizing CSP hardness with continuous probability distributions. In: Proc. of the 14th National Conf. on Artificial Intelligence (1997)Google Scholar
  8. 8.
    Goldberg, E., Novikov, Y.: BerkMin: A fast and robust SAT-solver. In: Proc. of Design Automation and Test in Europe (2002)Google Scholar
  9. 9.
    Gomes, C.P., Selman, B., Kautz, H.: Boosting Combinatorial Search through Randomization. In: Proc. of the 15th National Conf. on Artificial Intelligence (1998)Google Scholar
  10. 10.
    Haim, S., Walsh, T.: Online Estimation of SAT Solving Runtime. In: Proc. of the 11th Int. Conf. on Theory and Applications of Satisfiability Testing (2008)Google Scholar
  11. 11.
    Huang, J.: The effect of restarts on the efficiency of clause learning. In: Proc. of the 20th Int. Joint Conf. on Artificial Intelligence (2007)Google Scholar
  12. 12.
    Huang, J.: A Case for Simple SAT Solvers. In: Proc. of the 13th Int. Conf. on Principles and Practice of Constraint Programming (2007)Google Scholar
  13. 13.
    Hutter, F., Hamadi, Y., Hoos, H., Leyton-Brown, K.: Performance Prediction and Automated Tuning of Randomized and Parametric Algorithms. In: Proc. of the 12th Int. Conf. on Principles and Practice of Constraint Programming (2006)Google Scholar
  14. 14.
    Kautz, H., Horvitz, E., Ruan, Y., Gomes, C., Selman, B.: Dynamic Restart Policies. In: Proc. of the 18th National Conf. on Artificial Intelligence (2002)Google Scholar
  15. 15.
    Krishnapuram, B., Figueiredo, M., Carin, L., Hartemink, A.: Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 957–968 (2005)CrossRefGoogle Scholar
  16. 16.
    Luby, M., Sinclair, A., Zuckerman, D.: Optimal speedup of Las Vegas algorithms. In: Proc. of the 2nd Israel Symp. on the Theory and Computing Systems (1993)Google Scholar
  17. 17.
    Moskewicz, M.W., Madigan, C.F., Zhao, Y., Zhang, L., Malik, S.: Chaff: engineering an efficient SAT solver. In: Proc. of the 38th Design Automation Conference (2001)Google Scholar
  18. 18.
    Nudelman, E., Leyton-Brown, K., Hoos, H.H., Devkar, A., Shoham, Y.: Understanding Random SAT: Beyond the Clauses-to-Variables Ratio. In: Wallace, M. (ed.) CP 2004, vol. 3258, pp. 438–452. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  19. 19.
    Ruan, Y., Horvitz, E., Kautz, H.: Restart Policies with Dependence among Runs: A Dynamic Programming Approach. In: Van Hentenryck, P. (ed.) CP 2002, vol. 2470, p. 573. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  20. 20.
    Ruan, Y., Horvitz, E., Kautz, H.: Hardness-aware restart policies. In: The 18th Int. Joint Conference on Artificial Intelligence: Workshop on Stochastic Search (2003)Google Scholar
  21. 21.
    Ryan, L.: Efficient algorithms for clause learning SAT solvers. Master thesis, Simon Fraser University, School of Computing Science (2004)Google Scholar
  22. 22.
    Ryvchin, V., Strichman, O.: Local Restarts. In: Kleine Büning, H., Zhao, X. (eds.) SAT 2008. LNCS, vol. 4996, pp. 271–276. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  23. 23.
    Walsh, T.: Search in a Small World. In: Proc. of the 12th Int. Joint Conference on Artificial Intelligence (1999)Google Scholar
  24. 24.
    Wu, H., van Beek, P.: On Universal Restart Strategies for Backtracking Search. In: Bessière, C. (ed.) CP 2007. LNCS, vol. 4741, pp. 681–695. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  25. 25.
    Xu, L., Hutter, F., Hoos, H., Leyton-Brown, K.: SATzilla: Portfolio-based Algorithm Selection for SAT. Journal of Artificial Intelligence Research 32, 565–606 (2008)zbMATHGoogle Scholar
  26. 26.
    Xu, L., Hoos, H., Leyton-Brown, K.: Hierarchical Hardness Models for SAT. In: Bessière, C. (ed.) CP 2007. LNCS, vol. 4741, pp. 696–711. Springer, Heidelberg (2007)CrossRefGoogle Scholar
   
posted on 2020-12-26 11:26  海阔凭鱼跃越  阅读(130)  评论(0编辑  收藏  举报