微信扫一扫打赏支持

《python深度学习》笔记---8.1、使用LSTM生成文本

《python深度学习》笔记---8.1、使用LSTM生成文本

一、总结

一句话总结:

其实原理非常简单,就是单层的LSTM把训练数据中单词与字符的统计规律学好,然后softmax层相当于分类对应到词表中的各个字符的概率
from tensorflow.keras import layers
model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

 

 

1、人工智能的目的?

【人工智能不是为了替代我们的智能】:的确,到目前为止,我们见到的人工智能艺术作品的水平还很低。人工智能还远远比不上 人类编剧、画家和作曲家。但是,替代人类始终都不是我们要谈论的主题,人工智能不会替代 我们自己的智能,
【而是会为我们的生活和工作带来更多的智能】:而是会为我们的生活和工作带来更多的智能,即另一种类型的智能。在许多 领域,特别是创新领域中,人类将会使用人工智能作为增强自身能力的工具,实现比人工智能 更加强大的智能。

 

 

2、人工智能发挥作用的地方?

【简单的模式识别与专业技能】:很大一部分的艺术创作都是简单的模式识别与专业技能。这正是很多人认为没有吸引力、 甚至可有可无的那部分过程。
【我们的感知模式、语言和艺术作品都具有统计结构】:学习这种结构是深度学习算法所擅长的。

 

 

3、机器学习模型只是一种数学运算?

【机器学习模型能够对图像、 音乐和故事的统计潜在空间(latent space)进行学习,然后从这个空间中采样(sample)】:创造 出与模型在训练数据中所见到的艺术作品具有相似特征的新作品。
【机器学习模型只是一种数学运算】:当然,这种采样本身并不是 艺术创作行为。它只是一种数学运算,算法并没有关于人类生活、人类情感或我们人生经验的 基础知识;相反,它从一种与我们的经验完全不同的经验中进行学习。

 

 

4、使用 LSTM 生成文本实例中 如何生成序列数据?

【使用前面的标记作为输入,训练一个网络来预测序列中接下来的一个或多个标记】:用深度学习生成序列数据的通用方法,就是使用前面的标记作为输入,训练一个网络(通常是循环神经网络或卷积神经网络)来预测序列中接下来的一个或多个标记。
【例如,给定输入 the cat is on the ma,训练网络来预测目标 t,即下一个字符。】

 

 

5、语言模型(language model)?

【给定前面的标记,能够对下一个标记的概率进行建模的任何网络】:与前面处理文本数据时一样,标记 (token)通常是单词或字符,给定前面的标记,能够对下一个标记的概率进行建模的任何网络 都叫作语言模型(language model)。
【语言的潜在空间(latent space),即语言的统计结构】:语言模型能够捕捉到语言的潜在空间(latent space),即语言的统计结构。

 

 

6、使用 LSTM 生成文本实例中 的采样和条件数据是什么?

【采样(sample,即生成新序列)】:一旦训练好了这样一个语言模型,就可以从中采样(sample,即生成新序列)。
【初始文本字符串[即条件数据(conditioning data)]】:向模型中输入一个初始文本字符串[即条件数据(conditioning data)],要求模型生成下一个字符或下一个单词(甚至可以同时生成多个标记),然后将生成的输出添加到输入数据中,并多次重复这一过程

 

 

7、生成文本时,如何选择下一个字符至关重要?

【贪婪采样】:一种简单的方法是贪婪采样(greedy sampling), 就是始终选择可能性最大的下一个字符。但这种方法会得到重复的、可预测的字符串,看起来 不像是连贯的语言。
【随机采样】:一种更有趣的方法是做出稍显意外的选择:在采样过程中引入随机性,即 从下一个字符的概率分布中进行采样。这叫作随机采样(stochastic sampling,stochasticity 在这 个领域中就是“随机”的意思)。在这种情况下,根据模型结果,如果下一个字符是 e 的概率为 0.3,那么你会有 30% 的概率选择它。

 

 

8、为什么采样(生成新序列)的时候需要有一定的随机性?

【纯随机采样有最大的熵,随机性大】:考虑一个极端的例子——纯随机采样,即从均匀概率分布中 抽取下一个字符,其中每个字符的概率相同。这种方案具有最大的随机性,换句话说,这种概 率分布具有最大的熵。当然,它不会生成任何有趣的内容。
【贪婪采样有最小的熵,没有任何随机性】:再来看另一个极端——贪婪采样。 贪婪采样也不会生成任何有趣的内容,它没有任何随机性,即相应的概率分布具有最小的熵。
【更小的熵可以让生成的序列具有更加可预测的结构(因此可能看起来更真实),而更大的熵会得到更加出人意料且更有创造性的序列】:但是,还有许多其他中间点具有更大或更小的熵,你可能希望都研究一下。更小的 熵可以让生成的序列具有更加可预测的结构(因此可能看起来更真实),而更大的熵会得到更加 出人意料且更有创造性的序列。

 

 

9、softmax 温度(softmax temperature)?

【为了在采样过程中控制随机性的大小】:我们引入一个叫作 softmax 温度(softmax temperature) 的参数
【用于表示采样概率分布的熵,即表示所选择的下一个字符会有多么出人意料或多么可预测】

 

 

10、用于预测下一个字符的单层 LSTM 模型?

其实原理非常简单,就是单层的LSTM把训练数据中单词与字符的统计规律学好,然后softmax层相当于分类对应到词表中的各个字符的概率
from tensorflow.keras import layers
model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

 

11、使用LSTM生成文本 注意点?

我们可以生成离散的序列数据,其方法是:给定前面的标记,训练一个模型来预测接下 来的一个或多个标记。
对于文本来说,这种模型叫作语言模型。它可以是单词级的,也可以是字符级的。
对下一个标记进行采样,需要在坚持模型的判断与引入随机性之间寻找平衡。
处理这个问题的一种方法是使用softmax 温度。一定要尝试多种不同的温度,以找到合适的那一个。

 

 

 

 

二、8.1、使用LSTM生成文本

博客对应课程的视频位置:

 

[...]

Implementing character-level LSTM text generation

Let's put these ideas in practice in a Keras implementation. The first thing we need is a lot of text data that we can use to learn a language model. You could use any sufficiently large text file or set of text files -- Wikipedia, the Lord of the Rings, etc. In this example we will use some of the writings of Nietzsche, the late-19th century German philosopher (translated to English). The language model we will learn will thus be specifically a model of Nietzsche's writing style and topics of choice, rather than a more generic model of the English language.

Preparing the data

Let's start by downloading the corpus and converting it to lowercase:

In [1]:
from tensorflow import keras
import numpy as np

path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))
Corpus length: 600893
In [2]:
print(text[0:400])
preface


supposing that truth is a woman--what then? is there not ground
for suspecting that all philosophers, in so far as they have been
dogmatists, have failed to understand women--that the terrible
seriousness and clumsy importunity with which they have usually paid
their addresses to truth, have been unskilled and unseemly methods for
winning a woman? certainly she has never allowed herself 

Next, we will extract partially-overlapping sequences of length maxlen, one-hot encode them and pack them in a 3D Numpy array x of shape (sequences, maxlen, unique_characters). Simultaneously, we prepare a array y containing the corresponding targets: the one-hot encoded characters that come right after each extracted sequence.

In [3]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1
Number of sequences: 200278
Unique characters: 58
Vectorization...
In [4]:
print(chars)
['\n', ' ', '!', '"', "'", '(', ')', ',', '-', '.', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '=', '?', '[', ']', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '忙', '毛', '盲', '脝', '茅']

Building the network

Our network is a single LSTM layer followed by a Dense classifier and softmax over all possible characters. But let us note that recurrent neural networks are not the only way to do sequence data generation; 1D convnets also have proven extremely successful at it in recent times.

In [5]:
from tensorflow.keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

Since our targets are one-hot encoded, we will use categorical_crossentropy as the loss to train the model:

In [6]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

Training the language model and sampling from it

Given a trained model and a seed text snippet, we generate new text by repeatedly:

  • 1) Drawing from the model a probability distribution over the next character given the text available so far
  • 2) Reweighting the distribution to a certain "temperature"
  • 3) Sampling the next character at random according to the reweighted distribution
  • 4) Adding the new character at the end of the available text

This is the code we use to reweight the original probability distribution coming out of the model, and draw a character index from it (the "sampling function"):

In [7]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

Finally, this is the loop where we repeatedly train and generated text. We start generating text using a range of different temperatures after every epoch. This allows us to see how the generated text evolves as the model starts converging, as well as the impact of temperature in the sampling strategy.

In [8]:
import random
import sys

for epoch in range(1, 60):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y,
              batch_size=128,
              epochs=1)

    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)

        # We generate 400 characters
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()
epoch 1
1565/1565 [==============================] - 14s 9ms/step - loss: 1.9697
--- Generating with seed: "let it be
permitted to designate by this expression the beli"
------ temperature: 0.2
let it be
permitted to designate by this expression the belies and some the self-contiment in the any the spirit of the say for the say the man and stan and stand and still one man are some and still of the say for the more the should the experience of the self-conscience of the self-in the and the exist of the more the should the exist of the say the self-conscience of the and the feases and condicious and states of the acts the sense the self-curtemption
------ temperature: 0.5
icious and states of the acts the sense the self-curtemption, the instinction of the morality one weal a case conduction of the earthing there should se for the all and finder conception of the more the some the formere. the at a moraling for the into a the despection of deep and self-dection of the sigie shound in them to and consigness, and man are they any
will general the more that is it is a sense of the were manger of the indesing the enters, exarmed
------ temperature: 1.0
sense of the were manger of the indesing the enters, exarmed ithe will light scient spacting in to fe sign to meare, liked encerstand: are with the shirally
foon caussive finhel to he, "p6ession, onitate of impasion of but in bloud; a man be
an--morality fow mabinebres and post whethers(_orders--is the
shild a more
have:--in present hatestageialequally bod; it from "say best by a false,
may a pe brail,
"
myasio "contuping within it
egom:, kant of "sympatio
------ temperature: 1.2
rail,
"
myasio "contuping within it
egom:, kant of "sympation; ragtlering bhacteezt., ti luec and nopence to that is thore le im-sphils asing, and mekeucimem of retodrancm tos.
a2u once surffar spivis anding fuint onciaveariace by coleres ouces-le, eremy virowe bamide have
fehemenesss. in
the yid thus
regral cladting"
-ipeible for pait
frumccordmal isps, love to geent expousies
theseupratt free which has alsophingd: in dumter onithes, it
is great sthen for
epoch 2
1565/1565 [==============================] - 14s 9ms/step - loss: 1.6167
--- Generating with seed: "about its being the best or the worst) and
that these ideas "
------ temperature: 0.2
about its being the best or the worst) and
that these ideas the consequently and the more the consequently and souls the consequently and such an accortion of the consequently the the content and the art an accounted the morality of the subtle of the latter the man in the consequently and attain the consequently the existence of the subtle the subtle and serition of the act of the constance of the not the consequently and self the consequently and such an 
------ temperature: 0.5
 not the consequently and self the consequently and such an existences of the strong the intimplesses or the ads under the sense, the will as the delicate feeling and sacriferent of the subtle explianly the soulter of the prenocle the to be a the time of but the strong the heart the nature, or the superity sight of the more there is any what is purit of the consequently and its friends with even the free simple, and in the will as the wart of the suberess 
------ temperature: 1.0
he free simple, and in the will as the wart of the suberess of they ancimal to arfiner enterlexised not with furdiamentvances act of huthed cormes it contannation, howeverintwame fear, an action and self,d the stalfn-ganes of the dogatey conis is the polity virtue the god and invented excliend the, one courtetacy the inslange of all iowned, a laffer
furreptions the cates harn byings open of the a delicate ancinch fcledo,
morality) that which an oppeenate l
------ temperature: 1.2
delicate ancinch fcledo,
morality) that which an oppeenate lithletxinces. it simply difficially could (bachigear-of europe of agriligy,. a parculalireavi.)
 be"-all, noteven, inverco-manial, the
gort ut and find that therress whochy
haught feeld heir, at sief hance, 
is fvensamentate horder by the suppossionware walt was, libe, partarisispssfy of a mind has to its"licy. it id, what cured. lotes basue: whihe taken time bvtward every
here abadmins thues who 
epoch 3
1565/1565 [==============================] - 15s 9ms/step - loss: 1.5280
--- Generating with seed: "the "otherwise"), nor does it address
itself to the individu"
------ temperature: 0.2
the "otherwise"), nor does it address
itself to the individual soul and should to the same that the same that the self-contemplation of the self-contiture of the strong and something of the self-call that when the self-calling to the same of the same that the self-contemplation of the same to man that it is that the self-corrant of the self-corrant and strong and stronger property of the self-contradict of the self-constance of the same soul and strong to 
------ temperature: 0.5
radict of the self-constance of the same soul and strong to obering of the self-cortion and the man as in not god that the times of the spicit that there are
later of the learny of the prople to the strength and something with a contended and as the most as one always, who seem of the same to methouther and later of the strength to be self-contully that, what we more for the ladding and contractlibuted in that the distrustion of every one his possible to d
------ temperature: 1.0
buted in that the distrustion of every one his possible to diffect of
the
searly for etrist his world--as being traupant that all impro?tion of
this qualt must undeemnes, more reterate of which with which shoride truths and wirds and pofitic origin and
anther ternian owing thas
engless prowly hid who outisgune of the
instruncing for a modest--he is
rightly and akindc ear honting,
done in prompuss, which
squirge--we charced, a think for dispirst of one has 
------ temperature: 1.2
 which
squirge--we charced, a think for dispirst of one has is flan to !

11. i man ourpved in
hound unto the are, recograiistly, in a bodiotes in invervients.---ashah tirlibal opinion; it is character powesty--skoul, labe was found and gencivication of the
bleaker aachoig;--who masks remathes philosophyriagesty objift of ects
no loveroginiresty, from noh feelration of perhondovantant and man anxurimabligion, an ax ambigionisifantion" doisism in fadt (w! i
epoch 4
1565/1565 [==============================] - 15s 9ms/step - loss: 1.4808
--- Generating with seed: "tire of "perfecting" ourselves in our
virtue, which alone re"
------ temperature: 0.2
tire of "perfecting" ourselves in our
virtue, which alone really of the present of the present and the pressing and suffering the same the predoce of a suffering to the present to the present and an entailed to the suffering of the present and the pressions of the present to the pressing of the conception of the pressing and suffer the subtle the pression of the present to our proper and the present to the predicious in the present and from the same as the
------ temperature: 0.5
nt to the predicious in the present and from the same as the "feeling and the tarnes when wishing of the misunderstand of morality of a desires of
the strong to sould and existence of all immoralist and puritical condition of the spirit of causes and also to the cheres and there is also the comprehend the soulse of his the sufferon, which the man his have deprehing in the last prode and cause and to of the preceams to auboug the philosophers as only in the
------ temperature: 1.0
to of the preceams to auboug the philosophers as only in the suproral an e signiain, in which ameribeness
of say exagonged after old already to their their virtues, for the treat our new dogenession. 
 sanism.--and the old reason, wishes he who hellance, that is not moral bad to
the germany emorism. sust they will if is not firmoure wishes
heid, as it has also to us as which they are pride. there, in chanberriris
he could quand,, who
referring
with the, ea
------ temperature: 1.2
 in chanberriris
he could quand,, who
referring
with the, easpeg,nes to according very have all eleble culture, as the are itself-grows is reffired tuen sphild, only out of the whoflearness" not this justme
toforehichespardly tread many not,
what it
boughr: ethic sapas;
by which
conduth that
acreng

super, timerowings phuloncenced, of fotuinlian, are all the
vided, if least stated too
cistment-nanma-change.--the
sencely truiss of which one different and
su
epoch 5
1565/1565 [==============================] - 15s 9ms/step - loss: 1.4511
--- Generating with seed: "f port royal, sainte-beuve, in spite of all
his hostility to"
------ temperature: 0.2
f port royal, sainte-beuve, in spite of all
his hostility to the same and something and something and best the precisely the subtle spirit of the states of the same distance and the subtle present and such a conscipuse and the distrust and the present and such a man and the fact of the same the contradicting and states which has a conscipuse and distance of the present of the same and such a man and such a man and mankind and precisely the subtle spiriture
------ temperature: 0.5
nd such a man and mankind and precisely the subtle spiriture the fact the best attermant looks and such as the part of the part of the bedom and possess, the dange
...................

As you can see, a low temperature results in extremely repetitive and predictable text, but where local structure is highly realistic: in particular, all words (a word being a local pattern of characters) are real English words. With higher temperatures, the generated text becomes more interesting, surprising, even creative; it may sometimes invent completely new words that sound somewhat plausible (such as "eterned" or "troveration"). With a high temperature, the local structure starts breaking down and most words look like semi-random strings of characters. Without a doubt, here 0.5 is the most interesting temperature for text generation in this specific setup. Always experiment with multiple sampling strategies! A clever balance between learned structure and randomness is what makes generation interesting.

Note that by training a bigger model, longer, on more data, you can achieve generated samples that will look much more coherent and realistic than ours. But of course, don't expect to ever generate any meaningful text, other than by random chance: all we are doing is sampling data from a statistical model of which characters come after which characters. Language is a communication channel, and there is a distinction between what communications are about, and the statistical structure of the messages in which communications are encoded. To evidence this distinction, here is a thought experiment: what if human language did a better job at compressing communications, much like our computers do with most of our digital communications? Then language would be no less meaningful, yet it would lack any intrinsic statistical structure, thus making it impossible to learn a language model like we just did.

Take aways

  • We can generate discrete sequence data by training a model to predict the next tokens(s) given previous tokens.
  • In the case of text, such a model is called a "language model" and could be based on either words or characters.
  • Sampling the next token requires balance between adhering to what the model judges likely, and introducing randomness.
  • One way to handle this is the notion of softmax temperature. Always experiment with different temperatures to find the "right" one.
 
posted @ 2020-10-16 00:27  范仁义  阅读(568)  评论(0编辑  收藏  举报