Loading

【2022-06-11】基于tensorflow的猫狗识别

一、项目介绍

1.1 背景

  目的实现猫狗图像的高准确度智能分类。材料图像数据集datasets中包含训练数据集(train)和测试数据集(test)。train文件夹中共有10000张图像,其中猫狗各5000张。test文件夹中共有1000张图像,其中猫狗各500张。所有图像已经按”类别.序号.格式”的形式命名(即图像文件名的前3位字符是该图像的类别标记),如图1-1所示。

图1-1 数据集目录预览

1.2 实验环境

表1-1 实验环境配置表

1.3 文件结构

----CatAndDog\
    |----utils\
    |    |----dataio.py       #读取图像,存储为csv
    |    |----loaddata.py     #加载图像
    |    |----plolfunction.py #绘制图表
    |    |----createModels.py #构建模型     
    |    |----viewImage.py    #测试文件
    |----imgs\                #存放图像预处理、图像增强后生成的文件
    |----logs\                #存放tensorboard生成的日志
    |----models\              #存放训练的模型
    |----main.py              #主程序
    |----loadmodel.py         #加载模型,预测结果

1.4 实现思路

   对图像进行归一化、标准化、平滑处理等预处理和图像增强,使用resetNet34、vgg16、改进的vgg16构建模型,最后用混淆矩阵评估模型。

二、数据流转换

将图像转换为csv格式的数据流(dataio.py

import os
import pandas as pd

def dataio(open_path,save_path):
    images=[]
    labels=[]
    for root,_,filenames in os.walk(open_path):
        root=root[1:len(root)]
        for filename in filenames:
            image_path=os.path.join(root,filename)
            images.append(image_path)
            s=filename[0:3]
            labels.append(s)
    dataframe=pd.DataFrame({'image':images,"label":labels})
    dataframe.to_csv(save_path,index=False)

dataio('../datasets/test','../datasets/testinfo.csv')
dataio('../datasets/train','../datasets/traininfo.csv')

存储格式如图2-1所示

图2-1 数据流存储预览

三、数据预处理

3.1 原始图像

从数据集中任意取出一张图像进行预处理测试,原始图像及对应矩阵如图3-1、3-2所示:

图3-1
图3-2
读取图像代码(`viewImage.py`):
import cv2
import numpy as np
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing.image import array_to_img

im=cv2.imread(r'../imgs/cat.4.jpg')
im= cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
im = np.expand_dims(im, 0)
datagen = ImageDataGenerator()
查看图像:
times=1
i = 0
for batch in datagen.flow(im, batch_size=1):
    plt.imshow(array_to_img(np.squeeze(batch)))
    i += 1
    if i==times:
        plt.show()
        break

3.2 归一化

因归一化具有不变特性,此处不展示归一化后的效果图,归一化后的矩阵发生如下变化:

datagen = ImageDataGenerator(rescale=1./255.)
图3-3 归一化矩阵

3.3 标准化

定义一个zscore标准化,放入图像生成器的自定义函数中。

def zscore(image):
    image_zs = (image - np.mean(image)) / np.std(image)
    return image_zs
datagen = ImageDataGenerator(preprocessing_function=zscore)
或:
datagen = ImageDataGenerator(preprocessing_function=tf.image.per_image_standardization)
图3-4 标准化矩阵

3.4 数据增强

根据生成器自带的函数进行几何变换,设置随机通道转换范围为10,随机旋转角度为10,变焦范围0.2,随机输竖直平移幅度为0.2。

datagen = ImageDataGenerator(channel_shift_range=10,rotation_range=10,  # 随机转动的最大角度zoom_range=0.2,  # 随机缩放的最大幅度 height_shift_range=0.2)
图3-5 几何转换效果图

3.5 均值滤波

使用4*4的均值滤波进行图像平滑处理,效果如图3-5所示。

def MeanFilter(img, K_size=4):
    h, w, c = img.shape
    pad = K_size // 2
    out = np.zeros((h + 2 * pad, w + 2 * pad, c), dtype=np.float)
    out[pad:pad + h, pad:pad + w] = img.copy().astype(np.float)
    tmp = out.copy()
    for y in range(h):
        for x in range(w):
            for ci in range(c):
                out[pad + y, pad + x, ci] = np.mean(tmp[y:y + K_size, x:x + K_size, ci])
    out = out[pad:pad + h, pad:pad + w].astype(np.uint8)
    return out
datagen = ImageDataGenerator(preprocessing_function=MeanFilter)
图3-6 均值滤波效果图

3.6 读取数据流

定义预处理的方法后,定义加载数据方法(loaddata.py),从dataframe里读取数据,输出64*64大小的RGB图像,批量默认为32个样本,通道为(32, 64, 64, 3) (32,)。

def load_data(read_path,class_code,train=False):
    info=pd.read_csv(read_path)
    if train:
        info=info.sample(frac=1).reset_index()
        gen = ImageDataGenerator(
            #channel_shift_range=10,
            #rescale=1./255.,  # 缩放
             #rotation_range=10,#随机转动的最大角度
             #zoom_range=0.2,  # 随机缩放的最大幅度
            #height_shift_range=0.2,
            preprocessing_function=zscore
        )
        data_gen=gen.flow_from_dataframe(dataframe=info,x_col='image',y_col='label', target_size=(64,64),shuffle=True ,class_mode=class_code)
    else:
        gen=ImageDataGenerator(rescale=1./255.)
        data_gen=gen.flow_from_dataframe(dataframe=info,x_col='image',y_col='label', target_size=(64,64),shuffle=False, class_mode=class_code)
    return data_gen

四、模型构建

4.1 resnet34

结构图见:resnet34结构图

4.2 vgg16

4.3 vgg16_imp

该模型基于VGG16,去掉最大池化层,将每个layer的第二个卷积层改为深度可分离卷积层,在卷积后添加BN层,以加速训练收敛速度,并保留三层全连接层。结构如图所示:

def imp_vgg16(input_shape, num_classes, activiation):
    inputs = Input(shape=input_shape)
    # layer1
    x = Conv2D(filters=16, kernel_size=(3, 3), strides=(1, 1), padding='same',
               activation='relu'
               ,kernel_regularizer=l2(0.0005)
               )(inputs)
    x = BatchNormalization()(x)
    x = DepthwiseConv2D(kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu',
                        depth_multiplier=3)(x)
    x = Dropout(0.1)(x)
    # layer2
    x = Conv2D(filters=32, kernel_size=(3, 3), strides=(2, 2),
               activation='relu', padding='same'
               ,kernel_regularizer=l2(0.005)
               )(x)
    x = BatchNormalization()(x)
    x = DepthwiseConv2D(kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu')(x)
    x = Dropout(0.1)(x)

    # layer3
    x = Conv2D(filters=64, kernel_size=(3, 3), strides=(1, 1),
               activation='relu', padding='same'
               ,kernel_regularizer=l2(0.005)
               )(x)
    x = BatchNormalization()(x)
    x = DepthwiseConv2D(kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu')(x)
    x = Dropout(0.4)(x)
    # layer4
    x = Conv2D(filters=64, kernel_size=(3, 3), strides=(1, 1),
               activation='relu', padding='same'
               ,kernel_regularizer=l2(0.005)
              )(x)
    x = BatchNormalization()(x)
    x = DepthwiseConv2D(kernel_size=(1, 1), strides=(1, 1), padding='same', activation='relu')(x)
    # layer5
    x = Conv2D(filters=128, kernel_size=(3, 3), strides=(2, 2),
               activation='relu', padding='same'
               ,kernel_regularizer=l2(0.005)
               )(x)
    x = BatchNormalization()(x)
    x = DepthwiseConv2D(kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu')(x)
    # layer6
    x = Conv2D(filters=256, kernel_size=(1, 1), strides=(2, 2),
               activation='relu', padding='same'
               ,kernel_regularizer=l2(0.005)
               )(x)
    x = BatchNormalization()(x)
    x = DepthwiseConv2D(kernel_size=(1, 1), strides=(1, 1), padding='same', activation='relu')(x)
    x = Dropout(0.4)(x)

    x = Flatten()(x)
    x = Dropout(0.3)(x)
    x = Dense(2048, activation='relu')(x)
    x = Dropout(0.3)(x)
    x = Dense(512, activation='relu')(x)

    outputs = Dense(num_classes, activation=activiation)(x)
    return Model(inputs, outputs)

五、模型训练

5.1 导入库

from tensorflow.keras.optimizers import *
from tensorflow.keras.callbacks import LearningRateScheduler, ReduceLROnPlateau,EarlyStopping
from tensorflow.keras.losses import *
from utils.createModels import *
from utils.loaddata import load_data
from utils.plolfunction import *
import pandas as pd
import numpy as np
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
from tensorflow.keras.callbacks import TensorBoard
from sklearn.metrics import *

# GPU按需分配声明
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

5.2 参数设置

对于二分类,模型训练的激活函数、损失函数、标签编码、输出维度四个参数有两种组合,一是sigmoid+binary_crossentropy+binary+1,二是softmax+
categorical_crossentropy+categorical+2。本项目中选择前者进行训练。

class_code = 'binary' # 标签编码
activiation = 'sigmoid' # 激活函数 
num_classes = 1 # 输出维度
input_shape = (64, 64, 3) # 图像维度
loss = binary_crossentropy # 损失函数
epochs =200 # 训练轮数
label_dict ={0: 'cat', 1: 'dog'}
# 读取数据
train_gen = load_data(r'./datasets/traininfo.csv', class_code, train=True)
test_gen = load_data(r'./datasets/testinfo.csv', class_code)

# 批次数量=数据集大小÷批次大小
# 训练批次数量
step_train = train_gen.n // train_gen.batch_size
# 评估批次数量
step_test = test_gen.n // test_gen.batch_size
# 学习调整策略:设置初始学习率为0.001,并在优化器Adam中引用该方法,封装到LearningRateScheduler方法中,放入回调函数。
def lr_schedule(epoch):
    lr = 1e-3
    if epoch > 180:
        lr *= 0.5e-3
    elif epoch > 160:
        lr *= 1e-3
    elif epoch > 120:
        lr *= 1e-2
    elif epoch > 80:
        lr *= 1e-1
    print('Learning rate: ', lr)
    return lr
lr_scheduler = LearningRateScheduler(lr_schedule)
# 设置步长
lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1),
                               cooldown=0,
                               patience=5,
                               min_lr=0.5e-6)
# 设置早停
earlystop=EarlyStopping(monitor='acc',patience=5,min_delta=1e-3,verbose=1)

5.3 模型训练

def train(model_name):
    if model_name=="vgg16":
        model = vgg16(input_shape, num_classes, activiation)
    elif model_name=='imp_vgg16':
        model = imp_vgg16(input_shape, num_classes, activiation)
    elif model_name=='resNet34':
        model=resNet34(input_shape,num_classes,activiation)
    else:
        return 
    model.summary()
    model.compile(optimizer=Adam(lr=lr_schedule(0)),loss=loss,metrics=['acc'])
    model.fit_generator(train_gen, epochs=epochs, workers=8, steps_per_epoch=step_train,shuffle=False,callbacks=[TensorBoard(log_dir=f'./logs/{model_name}_logs'), lr_scheduler])
    test_loss, test_acc = model.evaluate_generator(test_gen, steps=step_test)
    print("loss:",test_loss, "acc",test_acc)
    test_gen.reset()
    preds = model.predict_generator(test_gen)

    if num_classes == 1:
        preds = np.where(preds > 0.5, 1, 0).reshape(-1).tolist()
    else:
        preds=np.argmax(preds,axis=1).astype('int').tolist()
    # acc=np.sum(preds==np.array(y_true))/len(y_true)
    print(classification_report(test_gen.classes, preds))
    conf = confusion_matrix(y_true=test_gen.classes, y_pred=preds)  # 混淆矩阵
    fpr, tpr, threshold = roc_curve(test_gen.classes, preds)   # 真正率和假正率
    roc_auc = auc(fpr, tpr)  # AUC分数
    print(f"AUC值:{roc_auc}\nFPR值:{fpr[1]}\nTPR值:{tpr[1]}")
    print("混淆矩阵如下:\n", conf)
    #保存至本地
    if test_acc >= 0.88 and test_loss <= 0.4:
        model.save(f'./models/{model_name}.h5')

5.4 训练结果

5.4.1 restnet34

5.4.2vgg16

5.4.3vgg16_imp

六、模型评估

七、模型预测

八、总结

  1. 调参依靠人工搜索,需要改进模型调参方法,如使用随机搜索。
  2. 训练模型时出现loss=0.69,acc=0.5不收敛的情况。训练VGG16模型始终不收敛,而该模型只有卷积层、池化层、全连接层,因此后续在两个卷积层之间加了批规范化层,成功解决模型不收敛的问题。
  3. 初始学习率影响模型收敛。设置初始学习率为1e-2时,则也会出现上述模型不收敛的情况,因此需要降低学习率,当学习率为1e-3时,模型训练效果是最佳的。

参考资料

图像增强:
https://blog.csdn.net

resnet34结构:
https://img-blog.csdnimg.cn

不同卷积层介绍:
https://zhuanlan.zhihu.com/p/117260363?utm_source=wechat_session

数据增强参考:

调参参考:
https://www.zhihu.com/question/452410923/answer/2157005791

posted @ 2023-07-10 09:22  踩坑大王  阅读(86)  评论(2编辑  收藏  举报