SciTech-BigDataAIML-Tensorflow-模型的训练与评估: tf.keras.losses + tf.keras.optimizer + tf.keras.metrics

模型的训练: tf.keras.losses 和 tf.keras.optimizer
定义一些模型超参数:
num_epochs = 5
batch_size = 50
learning_rate = 0.001

Model.compile 的函数签名 与 帮助文档:

help(model.compile)

Help on method compile in module keras.src.trainers.trainer:

Model.compile(optimizer='rmsprop', loss=None, loss_weights=None, metrics=None, weighted_metric
s=None, run_eagerly=False, steps_per_execution=1, jit_compile='auto', auto_scale_loss=Tr
ue) method of main.Linear instance
Configures the model for training.

Example:

```python
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[
        keras.metrics.BinaryAccuracy(),
        keras.metrics.FalseNegatives(),
    ],
)
```

Args:
optimizer: String (name of optimizer) or optimizer instance. See
keras.optimizers.
loss: Loss function. May be a string (name of loss function), or
a keras.losses.Loss instance. See keras.losses. A
loss function is any callable with the signature
loss = fn(y_true, y_pred), where y_true are the ground truth
values, and y_pred are the model's predictions.
y_true should have shape (batch_size, d0, .. dN)
(except in the case of sparse loss functions such as
sparse categorical crossentropy which expects integer arrays of
shape (batch_size, d0, .. dN-1)).
y_pred should have shape (batch_size, d0, .. dN).
The loss function should return a float tensor.
loss_weights: Optional list or dictionary specifying scalar
coefficients (Python floats) to weight the loss contributions of
different model outputs. The loss value that will be minimized
by the model will then be the weighted sum of all individual
losses, weighted by the loss_weights coefficients. If a list,
it is expected to have a 1:1 mapping to the model's outputs. If a dict, it is expected to map output names (strings) to scalar coefficients.
metrics: List of metrics to be evaluated by the model during
training and testing. Each of this can be a string (name of a
built-in function), function or a keras.metrics.Metric
instance. See keras.metrics. Typically you will use
metrics=['accuracy']. A function is any callable with the
signature result = fn(y_true, _pred). To specify different
metrics for different outputs of a multi-output model, you could
also pass a dictionary, such as
metrics={'a':'accuracy', 'b':['accuracy', 'mse']}.
You can also pass a list to specify a metric or a list of
metrics for each output, such as
metrics=[['accuracy'], ['accuracy', 'mse']]
or metrics=['accuracy', ['accuracy', 'mse']]. When you pass
the strings 'accuracy' or 'acc', we convert this to one of
keras.metrics.BinaryAccuracy,
keras.metrics.CategoricalAccuracy,
keras.metrics.SparseCategoricalAccuracy based on the
shapes of the targets and of the model output. A similar
conversion is done for the strings "crossentropy"
and "ce" as well.
The metrics passed here are evaluated without sample weighting;
if you would like sample weighting to apply, you can specify
your metrics via the weighted_metrics argument instead.
weighted_metrics: List of metrics to be evaluated and weighted by
sample_weight or class_weight during training and testing.
run_eagerly: Bool. If True, this model's forward pass
will never be compiled. It is recommended to leave this
as False when training (for best performance),
and to set it to True when debugging.
steps_per_execution: Int. The number of batches to run
during each a single compiled function call. Running multiple batches inside a single compiled function call can greatly improve performance on TPUs or small models with a large
Python overhead. At most, one full epoch will be run each
execution. If a number larger than the size of the epoch is
passed, the execution will be truncated to the size of the
epoch. Note that if steps_per_execution is set to N,
Callback.on_batch_begin and Callback.on_batch_end methods
will only be called every N batches (i.e. before/after
each compiled function execution).
Not supported with the PyTorch backend.
jit_compile: Bool or "auto". Whether to use XLA compilation when
compiling a model. For jax and tensorflow backends,
jit_compile="auto" enables XLA compilation if the model
supports it, and disabled otherwise.
For torch backend, "auto" will default to eager
execution and jit_compile=True will run with torch.compile
with the "inductor" backend.
auto_scale_loss: Bool. If True and the model dtype policy is
"mixed_float16", the passed optimizer will be automatically

实例化模型和数据读取类,并实例化一个 tf.keras.optimizer 的优化器(这里使用常用的 Adam 优化器):

model = MLP()
data_loader = MNISTLoader()
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
然后迭代进行以下步骤:

从 DataLoader 中随机取一批训练数据;

将这批数据送入模型,计算出模型的预测值;

将模型预测值与真实值进行比较,计算损失函数(loss)。这里使用 tf.keras.losses 中的交叉熵函数作为损失函数;

计算损失函数关于模型变量的导数;

将求出的导数值传入优化器,使用优化器的 apply_gradients 方法更新模型参数以最小化损失函数(优化器的详细使用方法见 前章 )。

具体代码实现如下:

num_batches = int(data_loader.num_train_data // batch_size * num_epochs)
for batch_index in range(num_batches):
    X, y = data_loader.get_batch(batch_size)
    with tf.GradientTape() as tape:
        y_pred = model(X)
        loss = tf.keras.losses.sparse_categorical_crossentropy(y_true=y, y_pred=y_pred)
        loss = tf.reduce_mean(loss)
        print("batch %d: loss %f" % (batch_index, loss.numpy()))
    grads = tape.gradient(loss, model.variables)
    optimizer.apply_gradients(grads_and_vars=zip(grads, model.variables))

交叉熵(cross entropy)与 tf.keras.losses

你或许注意到了,在这里,我们没有显式地写出一个损失函数,而是使用了 tf.keras.losses 中的 sparse_categorical_crossentropy (交叉熵)函数,
将模型的预测值 y_pred 与真实的标签值 y 作为函数参数传入,由 Keras 帮助我们计算损失函数的值。

交叉熵作为损失函数,在分类问题中被广泛应用。其离散形式为 H(y, \hat{y}) = -\sum_{i=1}^{n}y_i \log(\hat{y_i}) ,其中 y 为真实概率分布, \hat{y} 为预测概率分布, n 为分类任务的类别个数。
预测概率分布与真实分布越接近,则交叉熵的值越小,反之则越大。更具体的介绍及其在机器学习中的应用可参考 这篇博客文章 。

tf.keras 有两个交叉熵相关的损失函数 tf.keras.losses.categorical_crossentropy 和 tf.keras.losses.sparse_categorical_crossentropy 。其中 sparse 的含义是,真实的标签值 y_true 可以直接传入 int 类型的标签类别。具体而言:

loss = tf.keras.losses.sparse_categorical_crossentropy(y_true=y, y_pred=y_pred)

loss = tf.keras.losses.categorical_crossentropy(
y_true=tf.one_hot(y, depth=tf.shape(y_pred)[-1]),
y_pred=y_pred
)
的结果相同。

模型的评估: tf.keras.metrics
模型的评估: tf.keras.metrics
最后,我们使用测试集评估模型的性能。
这里,我们使用 tf.keras.metrics 中的 SparseCategoricalAccuracy 评估器来评估模型在测试集上的性能,该评估器能够对模型预测的结果与真实结果进行比较,并输出预测正确的样本数占总样本数的比例。我们迭代测试数据集,每次通过 update_state() 方法向评估器输入两个参数: y_pred 和 y_true ,即模型预测出的结果和真实结果。评估器具有内部变量来保存当前评估指标相关的参数数值(例如当前已传入的累计样本数和当前预测正确的样本数)。迭代结束后,我们使用 result() 方法输出最终的评估指标值(预测正确的样本数占总样本数的比例)。

以下代码,实例化一个 tf.keras.metrics.SparseCategoricalAccuracy 评估器,
并使用 For 循环迭代分批次传入了测试集数据的预测结果与真实结果,
并输出训练后的模型在测试数据集上的准确率。

sparse_categorical_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
num_batches = int(data_loader.num_test_data // batch_size)
for batch_index in range(num_batches):
    start_index, end_index = batch_index * batch_size, (batch_index + 1) * batch_size
    y_pred = model.predict(data_loader.test_data[start_index: end_index])
    sparse_categorical_accuracy.update_state(y_true=data_loader.test_label[start_index: end_index], y_pred=y_pred)
print("test accuracy: %f" % sparse_categorical_accuracy.result())

输出结果:

test accuracy: 0.947900
可以注意到,使用这样简单的模型,已经可以达到 95% 左右的准确率。

神经网络的基本单位:神经元

如果我们将上面的神经网络放大来看,详细研究计算过程,
比如取第二层的第 k 个计算单元,可以得到示意图如下:

../../images/neuron.png
该计算单元 Q_k 有 100 个权值参数 w
, w_{1k}, ..., w_{99k} 和 1 个偏置参数 b_k 。将第 1 层中所有的 100 个计算单元 P_0, P_1, ..., P_{99} 的值作为输入,分别按权值 w_{ik} 加和(即 \sum_{i=0}^{99} w_{ik} P_i ),并加上偏置值 b_k ,然后送入激活函数 f 进行计算,即得到输出结果。

事实上,这种结构和真实的神经细胞(神经元)类似。神经元由树突、胞体和轴突构成。树突接受其他神经元传来的信号作为输入(一个神经元可以有数千甚至上万树突),胞体对电位信号进行整合,而产生的信号则通过轴突传到神经末梢的突触,传播到下一个(或多个)神经元。

../../_images/real_neuron.png
神经细胞模式图(修改自 Quasar Jarosz at English Wikipedia [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)])

上面的计算单元,可以被视作对神经元结构的数学建模。
在上面的例子里,第二层的每一个计算单元(人工神经元)有 100 个权值参数和 1 个偏置参数,
而第二层计算单元的数目是 10 个,因此这一个全连接层的总参数量为 100*10 个权值参数和 10 个偏置参数。
事实上,这正是该全连接层的两个变量 kernel 和 bias 的形状。
仔细研究一下,你会发现,这里基于神经元建模的介绍与上文基于矩阵计算的介绍是等价的。

事实上,应当是先有神经元建模的概念,再有基于人工神经元和层结构的人工神经网络。
但由于本手册着重介绍 TensorFlow 的使用方法,所以调换了介绍顺序。

posted @   abaelhe  阅读(18)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
点击右上角即可分享
微信分享提示