记CTC原理

CTC，Connectionist temporal classification。从字面上理解它是用来解决时序类数据的分类问题。语音识别端到端解决方案中应用的技术。主要是解决以下两个问题

解决语音输入和标签的对齐问题。对于一段语音输入，将其转化为声学频谱图，传统的声学模型需要对其频谱图上的每一帧对应的发音因素，而采用CTC作为损失函数，只需要一个输入序列和输出序列即可。
CTC是一种损失函数，用来衡量输入的序列经过神经网络之后，和真实的输出相差有多少。对于nihao这个发音，不同的人有不同的发音方式，可能是nnnnniiiihhhaaaooo... 等等，CTC能衡量长度不一的输入经过神经网络后与实际结果的损失值大小。

Keras中CTC实现

from keras import backend as K
from keras.models import Model
from keras.layers import (Input, Lambda)
from keras.optimizers import SGD
from keras.callbacks import ModelCheckpoint   
import os

def ctc_lambda_func(args):
    y_pred, labels, input_length, label_length = args
    return K.ctc_batch_cost(labels, y_pred, input_length, label_length)

def add_ctc_loss(input_to_softmax):
    the_labels = Input(name='the_labels', shape=(None,), dtype='float32')
    input_lengths = Input(name='input_length', shape=(1,), dtype='int64')
    label_lengths = Input(name='label_length', shape=(1,), dtype='int64')
    output_lengths = Lambda(input_to_softmax.output_length)(input_lengths)
    # CTC loss is implemented in a lambda layer
    loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')(
        [input_to_softmax.output, the_labels, output_lengths, label_lengths])
    model = Model(
        inputs=[input_to_softmax.input, the_labels, input_lengths, label_lengths], 
        outputs=loss_out)
    return model