CatGPT beta2
更新
- 现在使用随机权来决定文字输出,随机权定义为词频的平方
- 添加了词组模式,如果一组词重复出现则捆绑输出
- 添加了标点符号(这只是一个尝试)
功能
- 根据输入的单词生成一段话(当前训练材料不足,生成词数大约只有 \(50\) 左右)
- 根据一段材料自训练
原理
- 统计累计词频,为词频计入权重
- 统计使用次数(防止循环用词等情况)
注意
- 并不支持根据未见过的词生成句子,因此太偏僻的不行,不过你可以通过喂 AI 一篇带这个词的文章来让它学
使用
- 主函数上方的
#define TRAIN
,注释掉即可使用test()
,即根据词生成句子,不注释掉则可以根据本地train
文件里的内容自训练 - 请务必保证有一个
info
文件,如果你想重头训练,可以删掉info
中得全部内容,但请务必保留一个 \(0\)
如果你不想动脑子地使用,也可以看下面这一版
- 测试:注释掉主函数上方的
#define TRAIN
,然后在主函数里修改test()
函数的值,这个值是你句子的第一个单词 - 训练:向
train
文件中粘贴文本,然后取消对#define TRAIN
的注释,直接编译运行程序即可
声明
- 这只是一个尝试,代码实现与生成效果比较烂,仅供娱乐与参考使用
效果例(beta2)
cat girls are left the castle , it has a long as usual mr . b returned , said he saw cth said tao ge had completed his head , tao ge is on the castle , and looked at this time , tao ge to be honest there is not a bean tree and the tree and regulations every time , tao ge , he saw that the cat lady falls behind like a cat girls are quite peculiar
---
猫娘们都离开了城堡,有一个和往常一样长的 B 先生回来了,说他看见cth说涛哥已经完成了他的头,涛哥就在城堡上,看着这一次,涛哥老实说没有一棵豆树,每次都有树和规定,涛哥,他看见猫娘像猫娘一样落在后面,很奇怪
cat lady back to the tree tao ge feels ashamed to grow in the castle, as he had a small village but also responsible for the person from the house so he quickly dodged and finally arrived at this moment , and scratched his head , and the castle needs to the castle , he had completed his hat
---
猫娘回到树上,涛哥觉得在城堡里长大很羞耻,因为他有一个小村庄,但也负责从房子里出来的人,所以他很快躲开了,终于到了这一刻,挠了挠头,城堡需要城堡,他已经完成了帽子
cth scratched his eerie smile on the leaves , but also took off the castle , but the tree king ? hurry to the castle where tao ge was wrong strange , but he saw that the castle where tao ge , the castle manager mr . b is not right branch and the castle , it is already as much larger and doesnt look dazed and the tree , and doesnt look very smart . b is also recite praises loudly saying hi , but cth said tao ges hand holding hands high school gate like this time , they will be punished for a living how could build a big deal because he will be one bite dont know why did you can only found that read transport little cat lady this happens
---
cth在树叶上挠了挠他诡异的笑容,也离开了城堡,但树王呢?匆匆赶到涛哥所在的城堡,奇怪的是,他看到了涛哥所在城堡的城堡经理 B 先生。不是右边的树枝和城堡,它已经大得多了,看起来不那么茫然,也不是那棵树,看起来不太聪明。B 先生也大声地念着赞歌打招呼,但cth说涛哥像这次这样手牵手高中大门,他们怎么会被罚卫生,因为他会被咬一口,不知道为什么,你只能找到读运输的小猫女士这种事发生
huge had a walk alone until today is not enough , tao ge , and the castle this sense , he asked tao ge was his head vigorously
---
huge 独自一人走到今天还不够,涛哥,还有城堡这种感觉,他用力问涛哥是不是他的头
cat lady this was too crowded around tao ge , and said , but also the castle , and the tree , and people who was oceansofstars , and each cat girl had never come knocking on the castle , the relationship between doujiao . tao ge used to be done ! ! ! ! i can be checked for a water well in a new day because the cat girl is not to see any pigs with a bean tree , he suddenly heard some noise ahead . at tao ge had no time , and the chef has been caused by the castle where he quickly jumped straight out of him to find me looking down regained some line segment tree , i dont have any pigs ? oh my buddy , tao ge lives on the castle manager still occasionally invites us are you want to see them from the chairman tree and asked sorry , and said that morning sunshine , and asked huge explained to the castle this moment
---
猫娘这一次挤在陶哥身边,说,还有城堡,还有树,还有谁是海洋之星,还有每一个猫娘从来没有来敲过城堡,豆角之间的关系。涛哥以前是干的!我可以在新的一天检查一口水井,因为猫娘没有看到任何带豆角树的猪,他突然听到前面有声音。在涛哥没有时间的时候,厨师就被城堡里的他所引起了,他很快就从他身上跳了出来,发现我往下看又恢复了一些线段树,我没有猪吗?哦,我的朋友,涛哥住在城堡里,经理偶尔还会邀请我们,你想从主席树上看到他们吗?他向我们道歉,说那天早上阳光明媚,并向城堡解释了这一刻
下载
浅度训练版本对标点符号的训练效果并不好
古诗词版本
更新的 CatGPT 版本
#include<bits/stdc++.h>
using namespace std;
namespace hdk{
namespace Rand{
random_device __rd;
/**
* @brief random number creator
* @note there's some problem of 'device_srand()' under GCC9.3.0(Windows),
* if so, try 'time_srand()'
* @param randt store the methods of the random
*/
struct __Rand{
mt19937_64 _Rand;
long long Rand(){
return _Rand();
}
int SystemRand(long long a,long long b){
return std::rand()%(b-a+1)+a;
}
int RandSignedInt(){
return (int)Rand();
}
int RandSignedInt(int l,int r){
int res=RandSignedInt();
while(res<l or res>r) res=RandSignedInt();
return res;
}
int RandInt(){
return abs(RandSignedInt());
}
int RandInt(int a,int b){
return abs(RandSignedInt())%(b-a+1)+a;
}
long long RandSignedLong(){
return (long long)Rand();
}
long long RandSignedLong(long long l,long long r){
long long res=RandSignedLong();
while(res<l or res>r) res=RandSignedLong();
return res;
}
long long RandLong(){
return llabs(RandSignedLong());
}
long long RandLong(long long a,long long b){
return RandLong()%(b-a+1)+a;
}
unsigned long long device_srand(){
unsigned long long seed=__rd();
_Rand=mt19937_64(seed);
return seed;
}
unsigned long long time_srand(){
unsigned long long seed=time(0);
_Rand=mt19937_64(seed);
return seed;
}
void seed_srand(unsigned long long seed=time(0)){
_Rand=mt19937_64(seed);
}
long double RandReal(int fixed){
long long res=1;
for(int i=1;i<=fixed;++i) res*=10;
int rres=RandLong(0,res);
cout<<rres<<endl;
return rres*1.0/res;
}
bool access(double access_p){
long long res=RandLong();
cout<<res<<endl;
if(res<=LLONG_MAX*access_p){
return true;
}
return false;
}
template<typename T>
T randfrom(vector<T>A){
return A[RandLong(0,(int)A.size()-1)];
}
template<typename T>
T randfrom(T A[],int l,int r){
return A[RandLong(l,r)];
}
}randt;
}
using namespace Rand;
}
using namespace hdk;
int store_size=0;
int output_size=0;
/**
* @brief record the appears times of each word
* @param word record the word appears
* @param appear_times record the appear times of each words
* in past trainments
* there's a set<vec> that sort for the most appears word
*/
struct vec{
string word;
int appear_times;
bool operator <(const vec&A)const{
if(appear_times==A.appear_times) return word<A.word;
return appear_times<A.appear_times;
}
};
set<vec>s[500001];
int cnt=0;
map<string,int>next_word;
map<string,int>appear_time[500001];
/**
* @brief fixed the chatacter of the trainment material
* @note please use once before any 'remove_useless'
*/
vector<string>store;
vector<string>store2;
void fixed_training(const string file){
ifstream _I(file);
store.clear();
while(!_I.eof()){
string h;_I>>h;
bool flag=false;
for(char i:h){
if(i=='.' or i==',' or i=='?' or i=='!' or i==';'){
store2.clear();
store2.push_back("");
for(int j=0;j<=(int)h.length()-1;++j){
if(h[j]=='.' or h[j]==',' or h[j]=='?' or h[j]=='!' or h[j]==';'){
string fx;fx.push_back(h[j]);
store2.push_back(fx);
store2.push_back("");
}
else store2.back().push_back(h[j]);
}
flag=true;
break;
}
}
if(!flag) store.push_back(h);
else for(string i:store2) if(!i.empty()) store.push_back(i);
}
_I.close();
ofstream _O(file);
for(string i:store){
_O<<i<<" ";
}
}
/**
* @brief remove character except 'a' to 'z', 'A' to 'Z'
* and lowercase it
* @note if it's empty after remove, it still return
* @note character should be seperated from any words
* you can using function 'fixed_training'
*/
string remove_useless(string x){
if(x[0]=='.' or x[0]==',' or x[0]=='!' or x[0]=='?' or x[0]==';') return x;
string ans;
for(char i:x){
if(i>='a' and i<='z'){
ans.push_back(i);
}
if(i>='A' and i<='Z'){
ans.push_back(i-'A'+'a');
}
}
return ans;
}
vector<string>tot_word;
/**
* @brief read train record from &in
*/
void read_info(ifstream &in){
store_size=0;
int tot;in>>tot;store_size=tot;
while(tot--){
string x,y;int n,t;in>>x>>n;
x=remove_useless(x);
if(!next_word.count(x)){
next_word[x]=++cnt;
}
int tmp=next_word[x];
while(n--){
store_size++;
in>>y>>t;
y=remove_useless(y);
appear_time[tmp][y]=t;
s[tmp].insert({y,t});
}
}
for(auto i:next_word){
tot_word.push_back(i.first);
}
}
#define randword randt.randfrom(tot_word)
/**
* @brief print new train record to &out
*/
void print_info(ofstream &out){
out<<next_word.size()<<endl;
output_size=(int)next_word.size();
for(auto i:next_word){
out<<i.first<<" "<<s[i.second].size()<<endl;
for(auto j:s[i.second]){
output_size++;
out<<j.word<<" "<<j.appear_times<<endl;
}
}
}
/**
* @brief train itself from the text material from &in
* @note please ensure that there's a english text material
* be wait for train
*/
void train(ifstream &in){
string x,last="eof";
while(!in.eof()){
in>>x;x=remove_useless(x);
if(x.empty()) continue;
if(last!="eof"){
if(!next_word.count(last)){
next_word[last]=++cnt;
}
int tmp=next_word[last];
if(!appear_time[tmp].count(x)){
appear_time[tmp][x]=1;
s[tmp].insert({x,1});
}
else{
auto iter=s[tmp].lower_bound({x,appear_time[tmp][x]});
auto st=*iter;s[tmp].erase(iter);
s[tmp].insert({st.word,st.appear_times+1});
appear_time[tmp][x]+=1;
}
}
last=x;
}
}
map<string,int>mp;
vector<string>wating_word;
/**
* @note if one appears up to tied%, then it be tied
*/
const long double tied=0.76;
/**
* @brief determine what will say next of the string x, and print it
* @note if there's nothing can be print, then function end, else it
* will continue to test next word automaticly
* @param rand_weight
* the rand of the word will act as it has a weight of each word
* the weight of each word is the square of its appear times
*/
void test(string x){
x=remove_useless(x);
if(x.empty()) return;
cout<<x<<" ";
if(!next_word.count(x)) return;
int tmp=next_word[x];
wating_word.clear();
long long maxword=0,totcnt=0;
maxword=s[tmp].begin()->appear_times;
for(auto i:s[tmp]){
totcnt+=i.appear_times;
for(int j=1;j<=i.appear_times*i.appear_times;++j){
wating_word.push_back(i.word);
}
}
if(maxword*1.0/totcnt>tied and x!=s[tmp].begin()->word){
mp[s[tmp].begin()->word]++;
test(s[tmp].begin()->word);
return;
}
string lt=randt.randfrom(wating_word);int cnt=0;
while(mp.count(lt) and mp[lt]>=appear_time[tmp][lt]){
if(cnt>=20) return;
cnt++;lt=randt.randfrom(wating_word);
}
mp[lt]++;
test(lt);
}
/**
* #define TRAIN to turn on the train mode
* in this mod, CatGPT will study from File 'train'
* if TRAIN not be defined, then act the test
*/
#define TRAIN
int main(){
#if RAND_MAX==INT_MAX
randt.device_srand();
#else
randt.time_srand();
#endif
ifstream _I("info");
read_info(_I);
#ifndef TRAIN
//remenber to change the test info
test(randword);
#else
cout<<"Read Finished -> ";
fixed_training("train");
ifstream _I2("train");
train(_I2);
cout<<"Train Finished"<<endl;
ofstream _O("info");
print_info(_O);
cout<<"Update: "<<output_size-store_size<<" words"<<endl;
cout<<"Now: "<<output_size<<" words"<<endl;
cout<<"Already Used Memory: "<<(int)(next_word.size())*1.0/5000<<"% "<<endl;
#endif
}