摘要: To be specific, given an audio clip, the two-dimensional time-frequency representation (e.g. Log-Mel) is first extracted. Convolutional layers are the 阅读全文
posted @ 2023-01-04 14:00 prettysky 阅读(16) 评论(0) 推荐(0) 编辑