SciTech-AV-Audio-Coding-Sampling-PCM(Pulse Code Modulation, 无压缩无损级) + WAV格式(PCM格式加header头的Metadata信息, Microsoft和IBM开发的)

SciTech-AV-Audio-DAP(Digital Audio Processing)-Loudness Normalization(响度规范化): Perceived Loudness + RMS (Root Mean Square)

Digital Audio Processing:数字音频处理路线

转换统一规格: 用FFmpeg将不同类型/规格的Audio文件, 转换成足够高精度SampleRate/BitDepth的WAV格式。
PCM数据预处理: 由规格化统一的WAV文件, 提取PCM数据, 进行数字信号处理;
Audio数据的统计分析和变换: 由PCM数据分解出Time/Frequency和其它表示并处理:
- Time Domain的Waveform(波形)形式, 得到参数: Frequency, Amplitude, Phase 和 SampleRate/BitDepth
- Frequency Domain的Spectrum(Audio Signal进行Fourier Transform), 得到 Components of Frequency
- 用 librosa 和 FFmpeg 变换得到梅尔频谱，和其他格式；

Application应用:

Audio智能变换和生成: 用ML/AI/Statistics对Audio进行处理。例如，原创音乐，麦克风阵列，实时音频变换；
ChatGPT: AI和Neural Networks智能的Encoding/Decoding变换, 例如Transformer
将Text(NLP, Corpus), Image, Audio, Video, Word/PPT/Excel/其它格式的数据与文件，
统一规格化后, 经过AI/ML/NN进行智能化的交互。

参考Links:

PCM的全面文章: https://www.electronicshub.org/pulse-code-modulation-in-audio/
数字音频专用词汇表: https://manual.audacityteam.org/man/glossary.html

PCM(Pulse Code Modulation)也被称为脉码编码调制，
PCM的声音数据没有被压缩，它由模拟信号经过Sampling采样, Quantilization量化, Code(编码), 转换成标准的数字音频数据。
采样转换方式参考下图：

PCM 完整示意图:
PCM Transmitter Section Block Diagram
PCM Encoder Block Diagram
PCM Transform Simple Diagram

音频采样常用要素：

采样率
采样率表示音频信号每秒的数字快照数。该速率决定了音频文件的频率范围。
采样率越高，数字波形的形状越接近原始模拟波形. 但是数字音频的存储空间就越大。
采样率低，会导致录音还原的效果不佳。
- 根据奈奎斯特采样定理，为重现给定频率，采样率必须至少是该频率的两倍。
  例如, CD唱片的采样率为每秒 44,100 采样点, 可重现最高 22,050 Hz 频率(超过人类听力极限 20,000 Hz)。
- Oversampling超采样: is the process of sampling a signal with a sampling frequency significantly higher than the Nyquist rate.
  Oversampling:
  - improves resolution,
  - reduces noise
  - helps avoid aliasing and phase distortion by relaxing anti-aliasing filter performance requirements.
- Interpolation: 插值, 通过估计推导出插值。
- 采样图示：
  
  A是低采样率的音频信号，已经将原始声波扭曲；
  B是高采样率的音频信号, 更好的重现原始声波。
- 数字音频常用的采样率如下：
Bit-Depth(位深度)
位深度决定动态范围。位深度越高，提供的动态范围越大。
采样声波时，为每个采样点指定最接近 waveform波形的原始振幅的采样振幅值。
高Bit-Depth, 可提供更多可能的振幅值, 产生更大的动态范围、更低的噪声基准和更高的保真度。

二、PCM

PCM音频数据存储方式:
- 单声道的音频文件，采样数据按时间顺序先后存入。
  也可用 LRLRLR方式, 但另一个声道的数据全为 0。
- 双声道的音频文件，通常按照 LRLRLR方式存储。
- PCM的存储方式为Little Endian(小端模式)。
- 存储Data数据排列如下图所示：

PCM 音频数据的参数
描述 PCM 音频数据的参数的时候有如下描述方式：

44100Hz 16bit stereo: 每秒 44100 次采样, 采样数据用 16bit 记录, 双声道(立体声)
22050Hz 8bit  mono: 每秒 22050 次采样, 采样数据用 8bit 记录, 单声道
48000Hz 32bit 51ch: 每秒 48000 次采样, 采样数据用 32bit(4bytes 浮点型)记录, 5.1 声道
注释:
44100Hz 指的是**采样率**, 它的意思是每秒取样 44100 次.
16bit 指的是**采样精度**, 原始模拟信号被采样后的每一采样点用 16bit(2bytes)来表示.
Stereo 指的是**声道数**, 采样时用到的麦克风的数量, 越多越能真实还原(麦克风的放置位置是有规定的).

三、WAV

WAV 是 Microsoft 和 IBM 为PC 开发的音频格式, 符合 RIFF(Resource Interchange File Format)规范,
用于保存 Windows 平台的音频, 被 Windows 平台及其应用程序所广泛支持。
WAV文件遵循RIFF规范，其内容的数据结构用chunk为最小单位存储。
WAV文件，常用的有3个区块组成：RIFF chunk, Format chunk 和 Data chunk.
另外可能包含其他的可选区块，如：List chunk, Fact chunk、Cue points chunk等。
对RIFF大类的WAV细分类的 header of file, 可查看Microsoft的文档。
常用的 WAVE 文件只是有单个"WAVE"块的 RIFF 文件, 该块由两分数据块("fmt"块和"data"块)组成,
它的格式如下图所示：
- RIFF Chunk:
  - ID : 固定用 "RIFF"(字符串)作为标识(16进制: 0x52494646), BigEndian.
  - Size : 其值为'整个文件长度'减去ID和Size的长度. 注意LittleEndian.
    例如本WAV的Size为36(16进制: 0x24), 再加上Size和ID的长度, 就是44byte.
  - Type: 固定为 "WAVE"(字符串), BigEndian. 其后有 Format和Data两Chunk.
- FORMAT Chunk:
  - ID : 固定用 "fmt "(字符串)为标识(16进制: 0x666D7420)
  - Size : 表示该chunk的数据长度(不包含ID和Size的长度)
  - AudioFormat : 音频格式(Data Chunk存的音频), PCM音频的值为1
  - NumChannels : 声道数(Data Chunk存的音频), 常用的有 1:单声道, 2:双声道
  - SampleRate: 采样率(Data Chunk存的音频)
  - ByteRate : = SampleRate * NumChannels * BitsPerSample / 8
  - BlockAlign : 每个采样点所需的字节数 = NumChannels * BitsPerSample / 8
  - BitsPerSample : 每个采样点的BitDepth(位深), 8: 8bit, 16: 16bit, 32: 32bit
- DATA Chunk:
  - ID : 固定用 "data"(字符串)为标识
  - Size : 有效音频数据的长度, Size = ByteRate * seconds
  - Data : 有效音频数据的指针(起始地址)
- List Chunk:
  What is a "LIST" chunk in a RIFF/Wav header?
  I am writing a wav player and I am using this file format specification:
  http://soundfile.sapp.org/doc/WaveFormat/.
  As you can see, it expects:
  - an initial "Riff Chunk" (which tells us whether this is a wav file or some other RIFF file type)
  - followed by a "Format Chunk" (which tells us things like the bit rate and number of channels).
  - followed by a "Data Chunk" that has all of the audio data in it.
  I have an audio file which has a "LIST Chunk" between the Format Chunk and the Data Chunk, which you can see visually when I hex-edit the file:
  
  What is this LIST chunk, is it part of some standardized file format, and is there somewhere that has information on parsing it?
  Answer:
  Your example chunk contains a LIST Chunk of INFOrmation that includes the creating software (ISFT):
  
  LIST Chunks of type INFO are common in formats that extend RIFF. When a LIST Chunk carries the list type ID "INFO", the list contains information about the copyright, author, engineer of the file, and other similar text.
  ...
  ISFT Name of the software package used to create the file
  
  https://www.recordingblogs.com/wiki/list-chunk-of-a-wave-file
  There are other kinds of lists, but list of info is very common.
  The site above goes into great detail about what to expect, but in brief:
  - LIST Chunk is a well-defined container type in RIFF based files like WAVE and JPEG.
  - Not every file has a LIST Chunk container.
  - Not every LIST Chunk will contain the same kind of information. For example,
    WAVE may include duration; JPEG dimensions;
    JPEG and WAVE may both have author.
  - DLS type files rely on LIST Chunk containers to annotate all the malleable properties of the file: samples, regions, articulations, etc. any of which may be binary data.
WAV 文件示例(使用二进制打开)
此文件只写入了WAV头信息, 因此文件大小只有44byte.:

WAV 格式定义

WAV格式的实质(大多数场景)是在 PCM数据前加一个文件头Metadata，每个字段的的含义如下：

typedef struct {
    char          ChunkID[4]; // 内容为"RIFF"
    unsigned long ChunkSize;  // 存储内容的字节数(不含本结构体的ChunkID和ChunkSize这8个字节)
    char          Format[4];  // 内容为"WAVE"
} WAVE_HEADER;
 
typedef struct {
   char           Subchunk1ID[4]; // 内容为 "fmt"
   unsigned long  Subchunk1Size;  // 存储该Subchunk的字节数(不含Subchunk1ID和Subchunk1Size这8字节)
   unsigned short AudioFormat;    // 存储音频文件的 编码格式, 例如若为 PCM 则其存储值为1
   unsigned short NumChannels;    // 声道数, 单声道(Mono)值为1, 双声道(Stereo)值为2, 等等
   unsigned long  SampleRate;     // 采样率, 如8k, 44.1k, 96k, 128k, 256k, 320k等
   unsigned long  ByteRate;       // 每秒存储的bit数, 其值 = SampleRate * NumChannels * BitsPerSample / 8
   unsigned short BlockAlign;     // 块对齐大小, 其值 = NumChannels * BitsPerSample / 8
   unsigned short BitsPerSample;  // 每个采样点的bit数，一般为8, 16, 32等。
} WAVE_FMT;
 
typedef struct {
   char          Subchunk2ID[4]; // 内容为 "data"
   unsigned long Subchunk2Size;  // 正式数据的字节数, 其值 = NumSamples * NumChannels * BitsPerSample / 8
} WAVE_DATA;

WAV 文件头解析

这里是一个 WAVE 文件的开头 72 字节，字节显示为十六进制数字：

52 49 46 46 | 24 08 00 00 | 57 41 56 45
66 6d 74 20 | 10 00 00 00 | 01 00 02 00 
22 56 00 00 | 88 58 01 00 | 04 00 10 00
64 61 74 61 | 00 08 00 00 | 00 00 00 00 
24 17 1E F3 | 3C 13 3C 14 | 16 F9 18 F9
34 E7 23 A6 | 3C F2 24 F2 | 11 CE 1A 0D

字段解析如下图：

三、PCM & WAV 开发实践

1. PCM转WAV(基于C语言)

int simplest_pcm16le_to_wave(const char *pcmpath,
   int channels, int sample_rate, const char *wavepath){
    typedef struct _WAVE_HEADER{  
        char         fccID[4];        
        unsigned   long    dwSize;            
        char         fccType[4];    
    } WAVE_HEADER;

    typedef struct _WAVE_FMT{  
        char         fccID[4];        
        unsigned   long       dwSize;            
        unsigned   short     wFormatTag;    
        unsigned   short     wChannels;  
        unsigned   long       dwSamplesPerSec;  
        unsigned   long       dwAvgBytesPerSec;  
        unsigned   short     wBlockAlign;  
        unsigned   short     uiBitsPerSample;  
    } WAVE_FMT;  
    typedef struct _WAVE_DATA{  
        char       fccID[4];          
        unsigned long dwSize;              
    } WAVE_DATA;  

    if(channels==0 || sample_rate==0){
    channels = 2;
    sample_rate = 44100;
    }

    int bits = 16;
    WAVE_HEADER   pcmHEADER;  
    WAVE_FMT   pcmFMT;  
    WAVE_DATA   pcmDATA;  
 
    unsigned   short   m_pcmData;

    FILE   *fp,*fpout;  
    fp=fopen(pcmpath, "rb");
    if(fp == NULL) {  
        printf("open pcm file error\n");
        return -1;
    }
    fpout=fopen(wavepath,   "wb+");
    if(fpout == NULL) {    
        printf("create wav file error\n");  
        return -1; 
    }

    //WAVE_HEADER
    memcpy(pcmHEADER.fccID,"RIFF",strlen("RIFF"));    
    memcpy(pcmHEADER.fccType,"WAVE",strlen("WAVE"));
    fseek(fpout,sizeof(WAVE_HEADER),1); 

    //WAVE_FMT
    pcmFMT.dwSamplesPerSec= sample_rate;  
    pcmFMT.dwAvgBytesPerSec= pcmFMT.dwSamplesPerSec * sizeof(m_pcmData);  
    pcmFMT.uiBitsPerSample= bits;
    pcmFMT.dwSize=16;  
    pcmFMT.wBlockAlign=2;  
    pcmFMT.wChannels=channels;  
    pcmFMT.wFormatTag=1;  
     memcpy(pcmFMT.fccID,"fmt ",strlen("fmt "));  

    fwrite(&pcmFMT,sizeof(WAVE_FMT),1,fpout);

    //WAVE_DATA;
    memcpy(pcmDATA.fccID,"data",strlen("data"));  
    pcmDATA.dwSize=0;
    fseek(fpout,sizeof(WAVE_DATA),SEEK_CUR);
    fread(&m_pcmData,sizeof(unsigned short),1,fp);
    while(!feof(fp)){  
        pcmDATA.dwSize+=2;
        fwrite(&m_pcmData,sizeof(unsigned short),1,fpout);
        fread(&m_pcmData,sizeof(unsigned short),1,fp);
    }  
    pcmHEADER.dwSize=44+pcmDATA.dwSize;

    rewind(fpout);
    fwrite(&pcmHEADER,sizeof(WAVE_HEADER),1,fpout);
    fseek(fpout,sizeof(WAVE_FMT),SEEK_CUR);
    fwrite(&pcmDATA,sizeof(WAVE_DATA),1,fpout);
    
    fclose(fp);
    fclose(fpout);
    return 0;
}

注意：

函数声明的数据类型unsigned long, 在有些C编译器上是64位的，这时候要改成unsigned int才可以，否则wav头有88bytes，标准的是44bytes，改完就正常.
声道数和采样率也要设置好才会有正确的转换结果。
要确认采样率，常用采样率有 44100/16000/8000
声道是1，2，或其它.

2. PCM减小某个声道的音量(基于C语言)

PCM音频数据的波形幅值越大, 代表音量越大;

PCM音频数据的幅值(即采样点的采样值大小)代表音量大小。
减小PCM音频数据的某个声道的所有采样值可实现减小某个声道的音量。

int pcm16le_half_volume_left( char *url ) {
    FILE *fp_in = fopen( url, "rb+" );
    FILE *fp_out = fopen( "output_half_left.pcm", "wb+" );

    // 一次读取一个sample，因为是2声道，所以是4字节 
    unsigned char *sample = ( unsigned char * )malloc(4);
    while ( !feof( fp_in ) ){
        fread( sample, 1, 4, fp_in );
        short* sample_num = ( short * )sample; // 转成左右声道两个short数据
        *sample_num = *sample_num / 2; // 左声道数据减半
        fwrite( sample, 1, 2, fp_out );        // L
        fwrite( sample + 2, 1, 2, fp_out ); // R
    }
    free( sample );
    fclose( fp_in );
    fclose( fp_out );
    return 0;
}

以上示例代码：

在读出左声道的 2 Byte 的取样值之后，将其转成 C 语言的一个 short 类型的变量。
将该数值除以2 之后, 覆写原 PCM 文件。

3. 分离PCM音频的左右声道

因为PCM音频数据是按照LRLRLR的方式来存储左右声道的音频数据的，
所以我们可以通过将它们交叉的读出来的方式，来分离左右声道的数据：

int simplest_pcm16le_split(char *url) {
    FILE *fp=fopen(url,"rb+");
    FILE *fp1=fopen("output_l.pcm","wb+");
    FILE *fp2=fopen("output_r.pcm","wb+");
    unsigned char *sample=(unsigned char *)malloc(4);
    while(!feof(fp)){
        fread(sample,1,4,fp);
        //L
        fwrite(sample,1,2,fp1);
        //R
        fwrite(sample+2,1,2,fp2);
    }
    free(sample);
    fclose(fp);
    fclose(fp1);
    fclose(fp2);
    return 0;
}

4. 由音频采样数据(PCM16LE单声道)截取一部分数据

本程序的函数可由PCM16LE单声道数据，截取出一段数据并输出其采样值。
函数的代码如下所示：

/**
 * Re-sample to double the speed of 16LE PCM file
 * @param url  Location of PCM file.
 */
int simplest_pcm16le_doublespeed(char *url){
    FILE *fp=fopen(url,"rb+");
    FILE *fp1=fopen("output_doublespeed.pcm","wb+");
 
    int cnt=0;
 
    unsigned char *sample=(unsigned char *)malloc(4);
 
    while(!feof(fp)){
        fread(sample,1,4,fp);
 
        if(cnt%2 !=0){
            fwrite(sample,1,2,fp1);      //L
            fwrite(sample+2,1,2,fp1); //R
        }
        cnt++;
    }
    printf("Sample Cnt:%d\n",cnt);
 
    free(sample);
    fclose(fp);
    fclose(fp1);
    return 0;
}

5. 将音频的采样位数由PCM16LE双声道转PCM8音频

本程序的函数可通过计算将PCM16LE双声道数据的采样位数, 由16bit转换为8bit。
函数的代码如下所示：

/**
 * Convert PCM-16 data to PCM-8 data.
 * @param url  Location of PCM file.
 */
int simplest_pcm16le_to_pcm8(char *url){
    FILE *fp=fopen(url,"rb+");
    FILE *fp1=fopen("output_8.pcm","wb+");
 
    int cnt=0;
 
    unsigned char *sample=(unsigned char *)malloc(4);
 
    while(!feof(fp)){
 
        short *samplenum16=NULL;
        char samplenum8=0;
        unsigned char samplenum8_u=0;
        fread(sample,1,4,fp);
        //(-32768-32767)
        samplenum16=(short *)sample;
        samplenum8=(*samplenum16)>>8;
        //(0-255)
        samplenum8_u=samplenum8+128;
        fwrite(&samplenum8_u,1,1,fp1); //L
 
        samplenum16=(short *)(sample+2);
        samplenum8=(*samplenum16)>>8;
        samplenum8_u=samplenum8+128;
 
        fwrite(&samplenum8_u,1,1,fp1); //R
        cnt++;
    }
    printf("Sample Cnt:%d\n",cnt);
 
    free(sample);
    fclose(fp);
    fclose(fp1);
    return 0;
}

本程序的:
16bit采样数据是用 short类型变量存储的,
8bit采样数据, 是用 unsigned char类型存储的.

PCM16LE的采样数据: 取值范围是 [-32768, 32767]， PCM8的采样数据: 取值范围是 [0, 255]。
所以PCM16LE 转换到PCM8, 需要经过两步：

将 [-32768, 32767]的 16bit 有符号数值, 转换为 [-128, 127] 的 8bit signed 数值
将 [-128, 127] 的 8bit signed 数值, 转换为[0, 255]的 8bit unsigned数值。

6. 将PCM16LE双声道音频数据的播放速度提高一倍

本程序的函数用resample重采样每声道的奇(偶)数点的采样值的方式，
将PCM16LE双声道音频数据的播放速度提高一倍。

函数的代码如下所示：

/**
 * Re-sample to double the speed of 16LE PCM file
 * @param url  Location of PCM file.
 */
int simplest_pcm16le_doublespeed(char *url){
    FILE *fp=fopen(url,"rb+");
    FILE *fp1=fopen("output_doublespeed.pcm","wb+");
 
    int cnt=0;
    unsigned char *sample=(unsigned char *)malloc(4);
 
    while(!feof(fp)){
        fread(sample,1,4,fp);
 
        if(cnt%2!=0){
            fwrite(sample,1,2,fp1); //L
            fwrite(sample+2,1,2,fp1); //R
        }
        cnt++;
    }
    printf("Sample Cnt:%d\n",cnt);
 
    free(sample);
    fclose(fp);
    fclose(fp1);
    return 0;
}

posted @ 2024-10-16 22:42 abaelhe 阅读(167) 评论(0) 收藏举报

刷新页面返回顶部

abaelhe