ChatterBot - 03 Training过程 训练
当机器人训练师加载了一段数据,它会根据加载的数据构建会话机器人的知识图谱,过程如下:
1. 训练师类
-
使用列表数据训练
chatterbot.trainers.ListTrainer
(storage, ***kwargs*)示例1:
from chatterbot.trainers import ListTrainer
chatterbot = ChatBot("Training Example")
chatterbot.set_trainer(ListTrainer)
chatterbot.train([
"Hi there!",
"Hello",
])
chatterbot.train([
"Greetings!",
"Hello",
])
示例2:
chatterbot.train([
"How are you?",
"I am good.",
"That is good to hear.",
"Thank you",
"You are welcome.",
])
示例1和示例2的区别在于,示例1有两个入口,而示例2有四个入口,也就是说,训练完成之后,当对示例1的机器人说『Hi there!』或者『Greetings!』均会返回『hello』,对示例2的机器人说列表中的前四句中的任意一句话,机器人都会返回相应的下一句。
-
使用语料库数据训练
chatterbot.trainers.ChatterBotCorpusTrainer
(storage, ***kwargs*)- 训练ChatterBot内嵌英文语料库
from chatterbot.trainers import ChatterBotCorpusTrainer
chatterbot = ChatBot("Training Example")
chatterbot.set_trainer(ChatterBotCorpusTrainer)
chatterbot.train(
"chatterbot.corpus.english"
)
- 训练ChatterBot内嵌部分语料库
chatterbot.train(
"chatterbot.corpus.english.greetings",
"chatterbot.corpus.english.conversations"
)
- 训练指定路径的语料库文件或者指定路径文件夹内的语料库
chatterbot.train(
"./data/greetings_corpus/custom.corpus.json",
"./data/my_corpus/"
)
-
使用Twitter的API接口训练
chatterbot.trainers.TwitterTrainer
(storage, ***kwargs*)参数:
- random_seed_word - 首次查询使用的关键字,默认:random
- twitter_consumer_key - 用户KEY
- twitter_consumer_secret - 用户密钥
- twitter_access_token_key - 访问token
- twitter_access_token_secret - 访问密钥
示例
# -*- coding: utf-8 -*-
from chatterbot import ChatBot
from settings import TWITTER
import logging
'''
This example demonstrates how you can train your chat bot
using data from Twitter.
To use this example, create a new file called settings.py.
In settings.py define the following:
TWITTER = {
"CONSUMER_KEY": "my-twitter-consumer-key",
"CONSUMER_SECRET": "my-twitter-consumer-secret",
"ACCESS_TOKEN": "my-access-token",
"ACCESS_TOKEN_SECRET": "my-access-token-secret"
}
'''
# Comment out the following line to disable verbose logging
logging.basicConfig(level=logging.INFO)
chatbot = ChatBot(
"TwitterBot",
logic_adapters=[
"chatterbot.logic.BestMatch"
],
input_adapter="chatterbot.input.TerminalAdapter",
output_adapter="chatterbot.output.TerminalAdapter",
database="./twitter-database.db",
twitter_consumer_key=TWITTER["CONSUMER_KEY"],
twitter_consumer_secret=TWITTER["CONSUMER_SECRET"],
twitter_access_token_key=TWITTER["ACCESS_TOKEN"],
twitter_access_token_secret=TWITTER["ACCESS_TOKEN_SECRET"],
trainer="chatterbot.trainers.TwitterTrainer"
)
chatbot.train()
chatbot.logger.info('Trained database generated successfully!')
-
使用Ubuntu对话语料库训练
chatterbot.trainers.UbuntuCorpusTrainer
(storage, ***kwargs*)该训练师类会自动下载、解压语料库,如果已经存在下载文件,则不会重新下载,如果存在解压后的文件,也不会再次进行解压操作,由于Ubuntu的语料库文件较大,所以下载、解压、训练的过程会花费较长时间。
2. 创建自定义训练师类
现有的训练师不能正确识别数据源格式时,需要自己构建训练师类。自己构建的训练师类需要继承 chatterbot.trainers.Trainer
类,并且需要实现train
方法,参数可以自行定义。