Sequence Model - Natural Language Processing & Word Embeddings

Word Embeddings

Word Representation

  • 1-hot representation: any product of them is 0
  • Featurized representation: word embedding

Visualizing word embeddings

visualize

t-SNE algorithm: 300D2D

learn the concepts that fell like they should be more related

Using word embeddings

Named entity recognition example

name_entity

it will be much smaller in training sets and so this allows you to carry out transfer learning

Transfer learning and word embeddings

  • Learn word embeddings from large text corputs. (1100B words)

    (or download pre-trained embedding online.)

  • Transfer embedding to new task with smaller training set.

    (say, 100k words)

  • Optional: Continue to finetune word embeddings with new data

Properties of Word Embeddings

Analogies

ManWoman as King?

emanewoman[2000]ekingequeen

e?ekingeman+ewomanequeen

find a word w to satisfiy \argmaxwsim(ew,ekingeman+ewoman)

  • Cosine similarity

    sim(u,v)=uTv||u||2||v||2

Embedding Matrix

embedding_matrix

Learning Word Embeddings: Word2vec & GloVe

Learning Word Embeddings

  • Neural language model

    mask a word and build a network to predict the word, and get the parameters

neural_language_model
  • Other context/target pairs

    Context: Last 4 words / 4 words on left & right / Last 1 word / Neraby 1 word(skig gram)

    a glass of orange ?_ to go along with

Word2Vec

Skip-grams

come up with a few context to target errors to create our supervised learning problem

  • Model

    Vocab size=10000

    Context c "orange"(6527)=Target t "juice"(4834)

    OcEec(=E×Oc)o(softmax)y^

    softmax:P(t|c)=eθtTecj=110000eθjTec

    et is a parameter associated with output t

    Loss:L(y^,y)=i=110000yilogy^i

  • Problems with softmax classification

    computation cost is too high

  • Solutions with softmax classification

    hierarchical softmax classifier

hierarchical_softmax

Negative Sampling

context word target?
orange juice 1
orange king 0
orange book 0
orange the 0
orange of 0

Defining a new learning problem & Model

  • pick a context word and a target word to get a positive example;

  • pick k random words in dictionary and the target word to get k negative examples.

    k={520(small dataset)25(larget dataset)

  • train 10000 binary classification problem ( k+1 example ) instead of multiple classification(computation cost is much lower)

Selecting negative examples

P(wi)=f(wi)3/4j=110000f(wj)3/4

f(wi) represents the frequency of wi .

GloVe Word Vectors

GloVe(global vectors for word representation)

Xct=Xij=times i appears in context j

Xij=Xji represent how i,j close to each others

mini=1nj=1nf(Xij)(θiTej+bi+bjlogXij)2

f(Xij) is a weighting term:

f(Xij)={0if Xij=0high(stopwords) this, is, of, a, low(rare words) durian, 

(regarding 0log0=0 )

θi and ej are symmetric so you can calculate
ewfinal=ew+θw2 .

Applications Using Word Embeddings

Sentiment Classification

Average the word embeddings of the sentence and use a softmax to predict

sentiment_classification

But it makes some mistakes, e.x. "Completely lacking in good taste, good service, and good ambience."

RNN for sentiment classification

Use the many-to-one RNN (input the word embeddings) can solve this problem.

Debiasing word embeddings

Word embeddings can reflect gender, ethnicity, age, sexual, orientation, and other biases of the text used to train the model.

Addressing bias in word embeddings

  • Indentify bias direction

    average
    {eheesheemaleefemale

    bias direction( 1 D )

    non-bias direction( n1 D )

    SVU(singluar vale decomposition, like PCA) can solve it

  • Neutralize: For every word that is not definitional, project to get rid of bias

    (need to figure out which words should be neutralize, use SVM first to classify)

  • Equalize pairs.

    grandmother - grandfater have the same similarity and distance(gender neural)

    you can handpick them(they are not so much)

Homework - Emojify

Building the Emojifier-V2

emojifier-v2
# UNQ_C5 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) # GRADED FUNCTION: Emojify_V2 def Emojify_V2(input_shape, word_to_vec_map, word_to_index): """ Function creating the Emojify-v2 model's graph. Arguments: input_shape -- shape of the input, usually (max_len,) word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words) Returns: model -- a model instance in Keras """ ### START CODE HERE ### # Define sentence_indices as the input of the graph. # It should be of shape input_shape and dtype 'int32' (as it contains indices, which are integers). sentence_indices = Input(input_shape, dtype = 'int32') # Create the embedding layer pretrained with GloVe Vectors (≈1 line) embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index) # Propagate sentence_indices through your embedding layer # (See additional hints in the instructions). embeddings = embedding_layer(sentence_indices) # Propagate the embeddings through an LSTM layer with 128-dimensional hidden state # The returned output should be a batch of sequences. X = LSTM(128, return_sequences = True)(embeddings) # Add dropout with a probability of 0.5 X = Dropout(0.5)(X) # Propagate X trough another LSTM layer with 128-dimensional hidden state # The returned output should be a single hidden state, not a batch of sequences. X = LSTM(128, return_sequences = False)(X) # Add dropout with a probability of 0.5 X = Dropout(0.5)(X) # Propagate X through a Dense layer with 5 units X = Dense(5)(X) # Add a softmax activation X = Activation('softmax')(X) # Create Model instance which converts sentence_indices into X. model = Model(inputs = sentence_indices, outputs = X) ### END CODE HERE ### return model
model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index) model.summary()
Model: "functional_3" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_2 (InputLayer) [(None, 10)] 0 _________________________________________________________________ embedding_3 (Embedding) (None, 10, 50) 20000050 _________________________________________________________________ lstm_2 (LSTM) (None, 10, 128) 91648 _________________________________________________________________ dropout_2 (Dropout) (None, 10, 128) 0 _________________________________________________________________ lstm_3 (LSTM) (None, 128) 131584 _________________________________________________________________ dropout_3 (Dropout) (None, 128) 0 _________________________________________________________________ dense_1 (Dense) (None, 5) 645 _________________________________________________________________ activation_1 (Activation) (None, 5) 0 ================================================================= Total params: 20,223,927 Trainable params: 223,877 Non-trainable params: 20,000,050 _________________________________________________________________

Compile it

model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

Train it

X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen) Y_train_oh = convert_to_one_hot(Y_train, C = 5) model.fit(X_train_indices, Y_train_oh, epochs = 50, batch_size = 32, shuffle=True)

__EOF__

本文作者zjp_shadow
本文链接https://www.cnblogs.com/zjp-shadow/p/15142398.html
关于博主:评论和私信会在第一时间回复。或者直接私信我。
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
声援博主:如果您觉得文章对您有帮助,可以点击文章右下角推荐一下。您的鼓励是博主的最大动力!
posted @   zjp_shadow  阅读(146)  评论(0编辑  收藏  举报
编辑推荐:
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
阅读排行:
· 单线程的Redis速度为什么快?
· 展开说说关于C#中ORM框架的用法!
· Pantheons:用 TypeScript 打造主流大模型对话的一站式集成库
· SQL Server 2025 AI相关能力初探
· 为什么 退出登录 或 修改密码 无法使 token 失效
历史上的今天:
2018-08-15 BZOJ 3745: [Coci2015]Norma(分治)
点击右上角即可分享
微信分享提示