【489】高级的深度学习最佳实践
参考:Deep Learning with Python P196
一、不用 Sequential 模型的解决方案:Keras 函数式 API
- 一个多输入模型
- 一个多输出(或多头)模型
1.1 函数式 API 简介
都是按照输入输出的模式,以下两种模式是一致的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | from keras.models import Sequential, Model from keras import layers, Input seq_model = Sequential() seq_model.add(layers.Dense( 32 , activation = 'relu' , input_shape = ( 64 ,))) seq_model.add(layers.Dense( 32 , activation = 'relu' )) seq_model.add(layers.Dense( 10 , activation = 'softmax' )) input_tensor = Input (shape = ( 64 ,)) h1 = layers.Dense( 32 , activation = 'relu' )(input_tensor) h2 = layers.Dense( 32 , activation = 'relu' )(h1) output_tensor = layers.Dense( 10 , activation = 'softmax' )(h2) model = Model(input_tensor, output_tensor) model.summary() |
output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | _________________________________________________________________ Layer ( type ) Output Shape Param # = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = input_1 (InputLayer) ( None , 64 ) 0 _________________________________________________________________ dense_1 (Dense) ( None , 32 ) 2080 _________________________________________________________________ dense_2 (Dense) ( None , 32 ) 1056 _________________________________________________________________ dense_3 (Dense) ( None , 10 ) 330 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 3 , 466 Trainable params: 3 , 466 Non - trainable params: 0 _________________________________________________________________ |
对这种 Model 实例进行编译、训练或评估时,其 API 与 Sequential 模型相同。
1.2 多输入模型
典型的问答模型有两个输入:一个自然语言描述的问题和一个文本片段(比如新闻文章),后者提供用于回答问题的信息。然后模型要生成一个回答,在最简单的情况下,这个回答只包含一个词,可以通过对摸个预定义的词表做softmax得到。
输入:问题 + 文本片段
输出:回答(一个词)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | from keras.models import Model from keras import layers from keras import Input text_vocabulary_size = 10000 question_vocabulary_size = 10000 answer_vocabulary_size = 500 text_input = Input (shape = ( None ,), dtype = 'int32' , name = 'text' ) embeded_text = layers.Embedding(text_vocabulary_size, 64 )(text_input) encoded_text = layers.LSTM( 32 )(embeded_text) question_input = Input (shape = ( None ,), dtype = 'int32' , name = 'question' ) embeded_question = layers.Embedding(question_vocabulary_size, 32 )(question_input) encoded_question = layers.LSTM( 16 )(embeded_question) concatenated = layers.concatenate([encoded_text,encoded_question],axis = - 1 ) answer = layers.Dense(answer_vocabulary_size,activation = 'softmax' )(concatenated) model = Model([text_input,question_input],answer) model. compile (optimizer = 'rmsprop' , loss = 'categorical_crossentropy' , metrics = [ 'acc' ]) model.summary() |
output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | __________________________________________________________________________________________________ Layer ( type ) Output Shape Param # Connected to = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = text (InputLayer) ( None , None ) 0 __________________________________________________________________________________________________ question (InputLayer) ( None , None ) 0 __________________________________________________________________________________________________ embedding_5 (Embedding) ( None , None , 64 ) 640000 text[ 0 ][ 0 ] __________________________________________________________________________________________________ embedding_6 (Embedding) ( None , None , 32 ) 320000 question[ 0 ][ 0 ] __________________________________________________________________________________________________ lstm_5 (LSTM) ( None , 32 ) 12416 embedding_5[ 0 ][ 0 ] __________________________________________________________________________________________________ lstm_6 (LSTM) ( None , 16 ) 3136 embedding_6[ 0 ][ 0 ] __________________________________________________________________________________________________ concatenate_3 (Concatenate) ( None , 48 ) 0 lstm_5[ 0 ][ 0 ] lstm_6[ 0 ][ 0 ] __________________________________________________________________________________________________ dense_6 (Dense) ( None , 500 ) 24500 concatenate_3[ 0 ][ 0 ] = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 1 , 000 , 052 Trainable params: 1 , 000 , 052 Non - trainable params: 0 __________________________________________________________________________________________________ |
训练这种模型需要能够对网络的各个头指定不同的损失函数,例如:年龄预测是标量回归任务,而性别预测是二分类任务,二者需要不同的损失过程。 但是,梯度下降要求将一个标量最小化,所以为了能够训练模型,我们必须将这些损失合并为单个标量。合并不同损失最简单的方法就是对所有损失求和。
1 2 3 4 5 6 7 8 9 | #多输出模型的编译选项:多重损失 #方法一 model. compile (optimizer = 'rmsprop' , loss = [ 'mse' , 'categorical_crossentropy' , 'binary_crossentropy' ]) #方法二 # model.compile(optimizer='rmsprop', # loss={'age':'mse', # 'income':'categorical_crossentropy', # 'gender':'binary_crossentropy'}) |
#多输出模型的编译选项:损失加权
1 2 3 4 5 6 7 8 9 10 11 12 | #方法一 model. compile (optimizer = 'rmsprop' , loss = [ 'mse' , 'categorical_crossentropy' ,<br> 'binary_crossentropy' ], loss_weights = [ 0.25 , 1. , 10. ]) #方法二 # model.compile(optimizer='rmsprop', # loss={'age':'mse', # 'income':'categorical_crossentropy', # 'gender':'binary_crossentropy'}, # loss_weights={'age':0.25, # 'income':1., # 'gender':10.}) |
不同的损失值具有不同的取值范围,为了平衡不同损失的贡献,应该对loss_weights进行设置
1 2 3 4 5 6 7 8 9 | #将数据输入到多输出模型中 #方法一 model.fit(posts,[age_targets,income_targets,gender_targets], epochs = 10 ,batch_size = 64 ) #方法二 # model.fit(posts,{'age':age_targets, # 'income':income_targets, # 'gender':gender_targets}, # epochs=10,batch_size=64) |
分类:
AI Related / NLP
, AI Related
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· .NET10 - 预览版1新功能体验(一)