【2022-06-11】基于tensorflow的猫狗识别
一、项目介绍
1.1 背景
目的实现猫狗图像的高准确度智能分类。材料图像数据集datasets中包含训练数据集(train)和测试数据集(test)。train文件夹中共有10000张图像,其中猫狗各5000张。test文件夹中共有1000张图像,其中猫狗各500张。所有图像已经按”类别.序号.格式”的形式命名(即图像文件名的前3位字符是该图像的类别标记),如图1-1所示。
1.2 实验环境
1.3 文件结构
----CatAndDog\
|----utils\
| |----dataio.py #读取图像,存储为csv
| |----loaddata.py #加载图像
| |----plolfunction.py #绘制图表
| |----createModels.py #构建模型
| |----viewImage.py #测试文件
|----imgs\ #存放图像预处理、图像增强后生成的文件
|----logs\ #存放tensorboard生成的日志
|----models\ #存放训练的模型
|----main.py #主程序
|----loadmodel.py #加载模型,预测结果
1.4 实现思路
对图像进行归一化、标准化、平滑处理等预处理和图像增强,使用resetNet34、vgg16、改进的vgg16构建模型,最后用混淆矩阵评估模型。
二、数据流转换
将图像转换为csv格式的数据流(dataio.py
)
import os
import pandas as pd
def dataio(open_path,save_path):
images=[]
labels=[]
for root,_,filenames in os.walk(open_path):
root=root[1:len(root)]
for filename in filenames:
image_path=os.path.join(root,filename)
images.append(image_path)
s=filename[0:3]
labels.append(s)
dataframe=pd.DataFrame({'image':images,"label":labels})
dataframe.to_csv(save_path,index=False)
dataio('../datasets/test','../datasets/testinfo.csv')
dataio('../datasets/train','../datasets/traininfo.csv')
存储格式如图2-1所示
三、数据预处理
3.1 原始图像
从数据集中任意取出一张图像进行预处理测试,原始图像及对应矩阵如图3-1、3-2所示:
import cv2
import numpy as np
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing.image import array_to_img
im=cv2.imread(r'../imgs/cat.4.jpg')
im= cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
im = np.expand_dims(im, 0)
datagen = ImageDataGenerator()
查看图像:
times=1
i = 0
for batch in datagen.flow(im, batch_size=1):
plt.imshow(array_to_img(np.squeeze(batch)))
i += 1
if i==times:
plt.show()
break
3.2 归一化
因归一化具有不变特性,此处不展示归一化后的效果图,归一化后的矩阵发生如下变化:
datagen = ImageDataGenerator(rescale=1./255.)
3.3 标准化
定义一个zscore标准化,放入图像生成器的自定义函数中。
def zscore(image):
image_zs = (image - np.mean(image)) / np.std(image)
return image_zs
datagen = ImageDataGenerator(preprocessing_function=zscore)
或:
datagen = ImageDataGenerator(preprocessing_function=tf.image.per_image_standardization)
3.4 数据增强
根据生成器自带的函数进行几何变换,设置随机通道转换范围为10,随机旋转角度为10,变焦范围0.2,随机输竖直平移幅度为0.2。
datagen = ImageDataGenerator(channel_shift_range=10,rotation_range=10, # 随机转动的最大角度zoom_range=0.2, # 随机缩放的最大幅度 height_shift_range=0.2)
3.5 均值滤波
使用4*4的均值滤波进行图像平滑处理,效果如图3-5所示。
def MeanFilter(img, K_size=4):
h, w, c = img.shape
pad = K_size // 2
out = np.zeros((h + 2 * pad, w + 2 * pad, c), dtype=np.float)
out[pad:pad + h, pad:pad + w] = img.copy().astype(np.float)
tmp = out.copy()
for y in range(h):
for x in range(w):
for ci in range(c):
out[pad + y, pad + x, ci] = np.mean(tmp[y:y + K_size, x:x + K_size, ci])
out = out[pad:pad + h, pad:pad + w].astype(np.uint8)
return out
datagen = ImageDataGenerator(preprocessing_function=MeanFilter)
3.6 读取数据流
定义预处理的方法后,定义加载数据方法(loaddata.py
),从dataframe里读取数据,输出64*64大小的RGB图像,批量默认为32个样本,通道为(32, 64, 64, 3) (32,)。
def load_data(read_path,class_code,train=False):
info=pd.read_csv(read_path)
if train:
info=info.sample(frac=1).reset_index()
gen = ImageDataGenerator(
#channel_shift_range=10,
#rescale=1./255., # 缩放
#rotation_range=10,#随机转动的最大角度
#zoom_range=0.2, # 随机缩放的最大幅度
#height_shift_range=0.2,
preprocessing_function=zscore
)
data_gen=gen.flow_from_dataframe(dataframe=info,x_col='image',y_col='label', target_size=(64,64),shuffle=True ,class_mode=class_code)
else:
gen=ImageDataGenerator(rescale=1./255.)
data_gen=gen.flow_from_dataframe(dataframe=info,x_col='image',y_col='label', target_size=(64,64),shuffle=False, class_mode=class_code)
return data_gen
四、模型构建
4.1 resnet34
结构图见:resnet34结构图
4.2 vgg16
4.3 vgg16_imp
该模型基于VGG16,去掉最大池化层,将每个layer的第二个卷积层改为深度可分离卷积层,在卷积后添加BN层,以加速训练收敛速度,并保留三层全连接层。结构如图所示:
def imp_vgg16(input_shape, num_classes, activiation):
inputs = Input(shape=input_shape)
# layer1
x = Conv2D(filters=16, kernel_size=(3, 3), strides=(1, 1), padding='same',
activation='relu'
,kernel_regularizer=l2(0.0005)
)(inputs)
x = BatchNormalization()(x)
x = DepthwiseConv2D(kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu',
depth_multiplier=3)(x)
x = Dropout(0.1)(x)
# layer2
x = Conv2D(filters=32, kernel_size=(3, 3), strides=(2, 2),
activation='relu', padding='same'
,kernel_regularizer=l2(0.005)
)(x)
x = BatchNormalization()(x)
x = DepthwiseConv2D(kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu')(x)
x = Dropout(0.1)(x)
# layer3
x = Conv2D(filters=64, kernel_size=(3, 3), strides=(1, 1),
activation='relu', padding='same'
,kernel_regularizer=l2(0.005)
)(x)
x = BatchNormalization()(x)
x = DepthwiseConv2D(kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu')(x)
x = Dropout(0.4)(x)
# layer4
x = Conv2D(filters=64, kernel_size=(3, 3), strides=(1, 1),
activation='relu', padding='same'
,kernel_regularizer=l2(0.005)
)(x)
x = BatchNormalization()(x)
x = DepthwiseConv2D(kernel_size=(1, 1), strides=(1, 1), padding='same', activation='relu')(x)
# layer5
x = Conv2D(filters=128, kernel_size=(3, 3), strides=(2, 2),
activation='relu', padding='same'
,kernel_regularizer=l2(0.005)
)(x)
x = BatchNormalization()(x)
x = DepthwiseConv2D(kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu')(x)
# layer6
x = Conv2D(filters=256, kernel_size=(1, 1), strides=(2, 2),
activation='relu', padding='same'
,kernel_regularizer=l2(0.005)
)(x)
x = BatchNormalization()(x)
x = DepthwiseConv2D(kernel_size=(1, 1), strides=(1, 1), padding='same', activation='relu')(x)
x = Dropout(0.4)(x)
x = Flatten()(x)
x = Dropout(0.3)(x)
x = Dense(2048, activation='relu')(x)
x = Dropout(0.3)(x)
x = Dense(512, activation='relu')(x)
outputs = Dense(num_classes, activation=activiation)(x)
return Model(inputs, outputs)
五、模型训练
5.1 导入库
from tensorflow.keras.optimizers import *
from tensorflow.keras.callbacks import LearningRateScheduler, ReduceLROnPlateau,EarlyStopping
from tensorflow.keras.losses import *
from utils.createModels import *
from utils.loaddata import load_data
from utils.plolfunction import *
import pandas as pd
import numpy as np
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
from tensorflow.keras.callbacks import TensorBoard
from sklearn.metrics import *
# GPU按需分配声明
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
5.2 参数设置
对于二分类,模型训练的激活函数、损失函数、标签编码、输出维度四个参数有两种组合,一是sigmoid+binary_crossentropy+binary+1,二是softmax+
categorical_crossentropy+categorical+2。本项目中选择前者进行训练。
class_code = 'binary' # 标签编码
activiation = 'sigmoid' # 激活函数
num_classes = 1 # 输出维度
input_shape = (64, 64, 3) # 图像维度
loss = binary_crossentropy # 损失函数
epochs =200 # 训练轮数
label_dict ={0: 'cat', 1: 'dog'}
# 读取数据
train_gen = load_data(r'./datasets/traininfo.csv', class_code, train=True)
test_gen = load_data(r'./datasets/testinfo.csv', class_code)
# 批次数量=数据集大小÷批次大小
# 训练批次数量
step_train = train_gen.n // train_gen.batch_size
# 评估批次数量
step_test = test_gen.n // test_gen.batch_size
# 学习调整策略:设置初始学习率为0.001,并在优化器Adam中引用该方法,封装到LearningRateScheduler方法中,放入回调函数。
def lr_schedule(epoch):
lr = 1e-3
if epoch > 180:
lr *= 0.5e-3
elif epoch > 160:
lr *= 1e-3
elif epoch > 120:
lr *= 1e-2
elif epoch > 80:
lr *= 1e-1
print('Learning rate: ', lr)
return lr
lr_scheduler = LearningRateScheduler(lr_schedule)
# 设置步长
lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1),
cooldown=0,
patience=5,
min_lr=0.5e-6)
# 设置早停
earlystop=EarlyStopping(monitor='acc',patience=5,min_delta=1e-3,verbose=1)
5.3 模型训练
def train(model_name):
if model_name=="vgg16":
model = vgg16(input_shape, num_classes, activiation)
elif model_name=='imp_vgg16':
model = imp_vgg16(input_shape, num_classes, activiation)
elif model_name=='resNet34':
model=resNet34(input_shape,num_classes,activiation)
else:
return
model.summary()
model.compile(optimizer=Adam(lr=lr_schedule(0)),loss=loss,metrics=['acc'])
model.fit_generator(train_gen, epochs=epochs, workers=8, steps_per_epoch=step_train,shuffle=False,callbacks=[TensorBoard(log_dir=f'./logs/{model_name}_logs'), lr_scheduler])
test_loss, test_acc = model.evaluate_generator(test_gen, steps=step_test)
print("loss:",test_loss, "acc",test_acc)
test_gen.reset()
preds = model.predict_generator(test_gen)
if num_classes == 1:
preds = np.where(preds > 0.5, 1, 0).reshape(-1).tolist()
else:
preds=np.argmax(preds,axis=1).astype('int').tolist()
# acc=np.sum(preds==np.array(y_true))/len(y_true)
print(classification_report(test_gen.classes, preds))
conf = confusion_matrix(y_true=test_gen.classes, y_pred=preds) # 混淆矩阵
fpr, tpr, threshold = roc_curve(test_gen.classes, preds) # 真正率和假正率
roc_auc = auc(fpr, tpr) # AUC分数
print(f"AUC值:{roc_auc}\nFPR值:{fpr[1]}\nTPR值:{tpr[1]}")
print("混淆矩阵如下:\n", conf)
#保存至本地
if test_acc >= 0.88 and test_loss <= 0.4:
model.save(f'./models/{model_name}.h5')
5.4 训练结果
5.4.1 restnet34
5.4.2vgg16
5.4.3vgg16_imp
六、模型评估
七、模型预测
八、总结
- 调参依靠人工搜索,需要改进模型调参方法,如使用随机搜索。
- 训练模型时出现loss=0.69,acc=0.5不收敛的情况。训练VGG16模型始终不收敛,而该模型只有卷积层、池化层、全连接层,因此后续在两个卷积层之间加了批规范化层,成功解决模型不收敛的问题。
- 初始学习率影响模型收敛。设置初始学习率为1e-2时,则也会出现上述模型不收敛的情况,因此需要降低学习率,当学习率为1e-3时,模型训练效果是最佳的。
参考资料
图像增强:
https://blog.csdn.net
resnet34结构:
https://img-blog.csdnimg.cn
不同卷积层介绍:
https://zhuanlan.zhihu.com/p/117260363?utm_source=wechat_session
数据增强参考:
调参参考:
https://www.zhihu.com/question/452410923/answer/2157005791