Leela Chess Zero

Leela Chess Zero - Chessprogramming wiki

Leela Chess Zero is initiated and announced by Stockfish co-author Gary Linscott. Leela Chess is open source. The goal is to build a strong chess playing entity following the same type of deep learning along with Monte-Carlo tree search (MCTS) techniques of AlphaZero as described in DeepMind's 2017 and 2018 papers, but using distributed training for the weights of the deep convolutional neural network (CNN, DNN, DCNN). Leela Chess Zero由Stockfish的合著者加里·林斯科特发起并宣布。Leela Chess是开源的。目标是按照DeepMind 2017年和2018年论文中所述的AlphaZero的蒙特卡洛树搜索(MCTS)技术,建立一个强大的下棋实体,但使用深度卷积神经网络(CNN、DNN、DCNN)权重的分布式训练。

Lc0 is the actual chess engine that performs the MCTS and reads the self-taught CNN, which weights are persistent in a separate file. Lc0 is written in C++ and may be compiled for various platforms and backends. Lc0是实际的国际象棋引擎,它执行MCTS并读取自学的CNN,这些权重保存在单独的文件中。Lc0是用C++编写的,可以针对各种平台和后端进行编译。

Like AlphaZero, Lc0's evaluates positions using non-linear function approximation based on a deep neural network, rather than the linear function approximation as used in classical chess programs. This neural network takes the board position as input and outputs position evaluation (QValue) and a vector of move probabilities (PValue, policy). Once trained, these network is combined with a Monte-Carlo Tree Search (MCTS) using the policy to narrow down the search to high-probability moves, and using the value in conjunction with a fast rollout policy to evaluate positions in the tree. The MCTS selection is done by a variation of Rosin's UCT improvement dubbed PUCT (Predictor + UCT). 与AlphaZero一样,Lc0使用基于深度神经网络的非线性函数近似来评估局面,而不是经典象棋程序中使用的线性函数近似。该神经网络将棋盘位置作为输入,并输出局面评估(QValue)和着法概率向量(PValue,policy)。一旦经过训练,这些网络将与蒙特卡洛树搜索(MCTS)相结合,使用该策略将搜索范围缩小到高概率着法,并将该值与快速展开策略相结合以评估树中的局面。MCTS的选择是由Rosin的UCT改进(称为PUCT(Predictor+UCT))的变种完成的。

When they say that the rollout policy (I believe they borrowed the term "rollout" from backgammon [西洋双陆棋]) is a linear softmax function they're referring to a generalization of the sigmoid function used in logistic regression. [link]

Like in AlphaZero, the Zero suffix implies no other initial knowledge than the rules of the game, to build a superhuman player, starting with truly random self-play games to apply reinforcement learning based on the outcome of that games. 与AlphaZero一样,Zero后缀意味着除了游戏规则之外,没有其他初始知识,以打造超人玩家,从真正随机的自我游戏开始,根据游戏结果应用强化学习。

The distributed training is realized with an sophisticated client-server model. The client, written entirely in the Go programming language, incorporates Lc0 to produce self-play games. Controlled by the server, the client may download the latest network, will start self-playing, and uploading games to the server, who on the other hand will regularly produce and distribute new neural network weights after a certain amount of games available from contributors. The training software consists of Python code, the pipeline requires NumPy and TensorFlow running on Linux. The server is written in Go along with Python and shell scripts. 分布式训练是通过复杂的客户端-服务端模型实现的。该客户端完全用Go语言编写,结合了Lc0来制作自己和自己下的游戏。在服务器的控制下,客户端可以下载最新的网络,开始游戏,并将游戏上传到服务器,另一方面,服务器将在贡献者提供一定数量的游戏后定期生成和分发新的神经网络权重。训练软件包含Python代码,流程需要在Linux上运行NumPy和TensorFlow。服务端是用Go, Python和shell脚本写的。

象棋旋风NNUE是象棋旋风开发团队2021年设计的象棋引擎,在普通笔记本电脑上就能碾压人类象棋冠军。

Artificial Intelligence | Deep Learning | Neural Networks | Feed Forward Neural Network

posted @ 2022-12-13 09:07  Fun_with_Words  阅读(197)  评论(0编辑  收藏  举报









 张牌。