文献1:Applying UCT to Boolean Satisfiability

文献2:Monte-Carlo Style UCT Search for Boolean Satisfiability

文献3:


 

文献1——简要说明。没有介绍基本概念。

 

UCT repeatedly starts from the root node and incrementally builds a tree based on estimates of node utilities and node visit frequencies computed from previous iterations. 译文:UCT重复地从根节点开始,并根据对节点实用程序的估计和从以前的迭代中计算出的节点访问频率逐步构建树。

 

In most implementations of UCT, the estimated utility of a new node is computed using Monte-Carlo methods, i.e., by generating random completions of the search (termed “playouts”) and averaging their outcomes.

译文:在大多数UCT实现中,新节点的估计效用是使用蒙特卡罗方法计算的,即,通过生成搜索的随机补全(称为“playouts”)并平均它们的结果。

This utility is revised each time the search revisits the node using the estimated values of the children. This technique is especially effective when no adequate heuristic is available to perform this value estimation task.

译文:每次搜索重新访问节点时,都会使用子节点的估计值对该实用程序进行修改。当没有足够的启发式来执行此值估计任务时,该技术尤其有效。

 

In this paper, we introduce and study an algorithm called UCTSAT that employs the UCT search control mechanism but replaces the playouts with a heuristic to estimate the initial utility of a node.译文:在本文中,我们介绍并研究了一种名为UCTSAT的算法,该算法使用UCT搜索控制机制,但用启发式代替游戏规则来估计节点的初始效用。

The heuristic we use is the fraction of the total set of clauses that are satisfied by the partial assignment associated with the node; this fraction is computed after the application of unit propagation.译文:我们使用的启发式是与节点关联的部分分配所满足的子句总数的一部分;这个分数是在应用单位传播后计算的。

 

While we do not expect UCTSAT to outperform the highly-optimized, state of the art SAT solvers (especially with respect to CPU time), we believe that the development of an algorithm based on a radically different search technique is important for at least two reasons: (a) the hardness of SAT instances is related to the algorithm used [1], and hence UCTSAT, which uses a different search strategy, can provide useful and new insights into the complexity of SAT instances; and (b) because such an algorithm can be useful when included in a portfolio of algorithms (see, for example, [6]) where very different solution techniques can help expand the range of applicability of the portfolio.

译文:虽然我们不期望UCTSAT能超过高度优化的、最新的SAT求解器(特别是在CPU时间方面),但我们相信基于完全不同的搜索技术的算法的发展至少有两个原因:

    • SAT实例的困难程度与使用的算法[1]有关,因此使用不同搜索策略的UCTSAT可以为复杂性提供有用的新见解;
    • 因为这样的算法在包含在一个算法组合中(例如,请参阅[6])是有用的,其中非常不同的解决方案技术可以帮助扩展组合的适用性范围。

As such, we focus our efforts on understanding whether UCTSAT is capable of solving SAT instances using smaller search trees than DPLL.To simplify the comparisons, we contrast our algorithm against a no-frills implementation of DPLL.译文:因此,我们将精力集中在理解UCTSAT是否能够使用比DPLL更小的搜索树来解决SAT实例。我们将我们的算法与DPLL进行了简要比较。

We set the exploration bias parameter in UCTSAT to 0 as this yielded the best performance on average.译文:我们将UCTSAT中的勘探偏差参数设置为0,因为这平均能产生最佳性能。

We also experimented with varying the number of atoms that UCTSAT assigned at a given node in the search tree and discovered that setting more than one atom at once hurt the performance of the algorithm.译文:我们还尝试改变UCTSAT在搜索树中给定节点上分配的原子数量,发现一次设置多个原子会降低算法的性能。

 

On uniform random 3-SAT and flat-graph coloring instances of various sizes, we found little difference in the sizes of the search trees constructed by the two algorithms.

译文:在不同大小的均匀随机3-SAT和平图着色实例上,我们发现两种算法所构建的搜索树的大小差别不大。

We believe that this is due to the unstructured nature of these instances — UCTSAT works well when each exploration of the tree yields information that can be successfully used in subsequent iterations.

译文:我们认为,这是由于这些实例的非结构化本质——UCTSAT在每次对树的探索产生可以在后续迭代中成功使用的信息时工作得很好。

In instances drawn from real-world problems (namely, single-stuckat-fault analysis problems) that exhibit structure, we discovered that UCTSAT constructs significantly smaller search trees than DPLL — this is illustrated in table 1.

译文:在展示结构的现实问题(即单stucka -fault分析问题)中,我们发现UCTSAT构建的搜索树要比DPLL小得多——如表1所示。

 Table 1. Average tree sizes (number of nodes) for SSA circuit fault analysis instances

 

文献2 Monte-Carlo Style UCT Search for Boolean Satisfiability

             Upper Confidence Bounds (UCB)

 Abstract.

In this paper, we investigate the feasibility of applying algorithms based on the Uniform Confidence bounds applied to Trees [12] to the satisfiability of CNF formulas. We develop a new family of algorithms based on the idea of balancing exploitation (depth-first search) and exploration (breadth-first search), that can be combined with two different techniques to generate random playouts or with a heuristics-based evaluation function. We compare our algorithms with a DPLL-based algorithm and with WalkSAT, using the size of the tree and the number of flips as the performance measure. While our algorithms perform on par with DPLL on instances with little structure, they do quite well on structured instances where they can effectively reuse information gathered from one iteration on the next. We also discuss the pros and cons of our different algorithms and we conclude with a discussion of a number of avenues for future work.

 

 1 Introduction

In this paper we perform a preliminary investigation into the application of UCT-style search algorithms to satisfiability testing of propositional formulas in Conjunctive Normal Form (CNF).

 

Here we present in detail a family of algorithms called UCTSAT that employ the UCT search control mechanism but use different mechanisms to estimate the utility of a node.

 UCTSATh, a heuristic is used to estimate the initial utility of a node, more precisely, the heuristic used is the fraction of the total set of clauses that are satisfied by the partial assignment associated with the node.

译文:UCTSATh是一种用来估计节点的初始效用的启发式,更精确地说,使用的启发式是与节点关联的部分分配所满足的子句总数的一部分.

UCTSATcp and UCTSATsbs

 use search strategies that are closer to the more traditional usage of UCT algorithms, that is using random tryouts in a MonteCarlo style.

 
 2 Upper Confidence Bounds Applied to Trees (UCT)

Monte-Carlo tree search algorithms such as UCT [12] have recently received a great deal of attention from the planning and game-playing community, in particular due to their success in the domain of Go [10,16].

UCT builds on the UCB1 algorithm for multi-armed bandits [2], which is used to guide the search tree construction process.

Exploration of under-sampled actions is balanced against exploitation of known good actions to generate asymmetric trees that are deeper in more promising regions of the search space and shallower elsewhere.

译文:UCT基于多武装匪徒[2]的UCB1算法,用于指导搜索树的构建过程。

译文:对未充分采样的操作的探索与对已知的良好操作的利用相平衡,从而生成在搜索空间中更有前景的区域更深、在其他区域更浅的非对称树。

 Algorithm 1 describes the recursive procedure UCT uses to build the search tree.

译文:算法1描述了UCT用来构建搜索树的递归过程。

T (s, a) is the domain transition function that returns the state s' reached from taking action a in state s. The algorithm maintains two lookup tables — n(s) tracks the number of times state s has been visited and Q(s) tracks the current estimated utility of the state s.

译文:该算法维护两个查找表——n(s)跟踪状态s被访问的次数,Q(s)跟踪状态s的当前估计效用。

 

The action selection operator π(s) is repeatedly applied to descend down the tree until a previously unvisited (or terminal) node k is reached. k is added to the tree and an estimate of its utility is computed which is used to update Q(s) and n(s) for all nodes s on the path from the root node to k, according to lines 11 and 12. Under this scheme, the size of the tree grows by one node on every iteration.

 

 

A UCT search consists of repeatedly calling the function given in Algorithm 1 on the root node for as long as time allows. At that point, the action that leads to the state with the highest average utility is returned. Alternate schemes include returning the action with the most number of visits and returning the action with the highest lower confidence bound. In practice, there is little difference between these approaches.

译文:译文:UCT搜索包括在根节点上反复调用算法1中给出的函数,只要时间允许。在这一点上,导致平均效用最高的状态得到回报。备选方案包括返回访问次数最多的操作,以及返回具有最高下限置信界的操作。实际上,这些方法之间几乎没有区别。

 
3 UCTSAT
Typical UCT implementations estimate the utility of a node n on the first visit by sampling the search space subsumed by n, via random or pseudo-random playouts. This idea is very appealing when no good heuristics are available for a domain. The pseudo-code for the recursive tree-building component of our procedure (which we call UCTSAT) is given by Algorithm 2.
 
Analogously to UCT, a UCTSAT search comprises repeated invocations of Algorithm 1 on the root node. UCTSAT behaves like a cross between a backtracking (DPLL-style) and a randomized algorithm (for example, WalkSAT [15]). It is a complete procedure that explores the search space in a very different fashion to that of DPLL. While DPLL only backtracks when it has finished completely evaluating a branch, UCTSAT repeatedly starts from the root node and only goes one level deeper on each iteration. As in UCT, the UCB1 formula is used to control the descent down the tree, where each step involves making a variable assignment and simplifying the original formula. In the flavor of local search methods, the most promising branch is typically chosen at each step, but occasional deviations to sub-optimal branches (that may still lead to solutions) also occur.
 

The search terminates when either:

1. a satisfying assignment is found (line 5)

2. the formula is determined to be unsatisfiable (line 17, when s is the root)

3. or the specified number of iterations is exceeded。

 
 
 
 
 
 
posted on 2020-06-29 18:44  海阔凭鱼跃越  阅读(269)  评论(0编辑  收藏  举报