mp4封装格式与MPEG4Extractor
首先来看mp4的封装格式,mp4数据都被放在一个个的箱子当中,也就是box,box的字节序为网络字节序,也就是大端存储,box由header和body组成,header指明box的大小和类型,body根据header的类型存储对应的内容。
box size有三种可能:
box开头的4个字节为box size,该大小包括box header以及整个box的大小,这样我们就可以在文件中定位各个box
box size为1,则表明这个box的大小为large size(mdat)
box size为0,表明这个box是文件的最后一个box,文件结尾即box的结尾
box size后面紧接着是32位的box type,一般为4个字符,比如ftyp moov等(整个box header为8字节),来看看比较重要的box type:
ftyp box:file type,该box只能有一个,该box应该被放在文件的最开始,指示该mp4文件应用的相关信息,不能被其他box包含;
moov box:一种容器箱子container box,意思是该box中装的是box,该box中包含有文件媒体的元数据信息,具体信息要通过解析子box获得;该box只有一个,并且不能被其他box包含;一般情况下会包含一个mvhd子box和若干trak子box;该box是解析mp4文件最重要的一个box,包含了音视频数据的编码格式、音视频数据样本、chunks大小、存储位置(offset,为音视频每帧数据在mdat box中的具体位置)、DTS、PTS等;
mvhd box:movie header box,描述了具体音频或视频流无关的文件整体信息,duration为媒体时长和timescale为时长单位
trak box:track box,它是一个container box,包含了该track的媒体数据的引用和描述。trak box必须韩寒有一个tkhd 和 一个mdia 子box
tkhd box:trak header box,描述track的信息的box,如果是视频会有宽高信息
elst box:记录了流的起始时间,该值可用来计算PTS和DTS
mdia box:track media structure 描述了这条音视频track的媒体数据样本的主要信息,非常重要!同样它也是一个container box,包含有mdhd、hdlr、minf等box
mdhd box:存储有当前track的timescale 和 duration信息,这里的timescale和duration和mvhd box中是不一样的,这里的信息是当前track用于计算媒体时长的信息,计算真正的duration需要用该值除以timescale
hdlr box:存储了当前track的stream type,是video还是audio,但是在MPEG4Extractor中似乎并不是按照这个信息来判断audio和video的
stbl box:子box中存储了codec type以及相关信息,每帧视频在文件中的位置以及PTS等信息
stsd box:该box的子box用于存储当前track的编码类型,如果是avc那么它的子box avcC会存储有SPS、PPS等信息
stts box:decoding time to samp box,保存有参数对sample_count 和 sample_delta,sample_delta可以理解为sample的持续时间,除以mdhd中的timescale就是真实时间,1/(sample_delta / timescale)这样就可以计算出帧率了
stss box:sync sample box,存放了关键帧的序号,seek时需要从关键帧开始解码,里面有个entry count表示关键帧数量
ctss box:composition time to sample box,表示PTS和DTS之间的差值,如果没有该box,说明不存在B帧,PTS等于DTS;DTS计算方法sample_delta * sample_cnt - start_time,如果有B帧那么PTS计算方法为DTS+composition_offset
stsc box:sample to chunk box,媒体数据样本被打包进chunks,chunks和样本samples大小不固定,该box说明chunks关联样本的信息
stsz box:sample size box,记录了每个样本的大小,
stco box:chunk offset box,描述每个chunk相对文件的偏移量,需要根据stsc中的信息计算每个sample对应的offset
参考:mp4封装格式各box类型讲解及IBP帧计算 - 知乎 (zhihu.com)
参考:视频解码研究之PTS(2)Mp4格式,AVI格式和MKV格式_面海烹鲜的博客-CSDN博客_avi pts
MP4在线解析:Online Mp4 Parser
接下来看看MPEG4Extractor中是如何解析文件的。
status_t MPEG4Extractor::parseChunk(off64_t *offset, int depth) { ALOGV("entering parseChunk %lld/%d", (long long)*offset, depth); if (*offset < 0) { ALOGE("b/23540914"); return ERROR_MALFORMED; } if (depth > 100) { ALOGE("b/27456299"); return ERROR_MALFORMED; } // 先读取8个字节,前4个字节为box size,后4个字节为box type uint32_t hdr[2]; if (mDataSource->readAt(*offset, hdr, 8) < 8) { return ERROR_IO; } uint64_t chunk_size = ntohl(hdr[0]); int32_t chunk_type = ntohl(hdr[1]); off64_t data_offset = *offset + 8; // 如果truck size 为1,说明为mdat box,这个box的最小值为16 if (chunk_size == 1) { if (mDataSource->readAt(*offset + 8, &chunk_size, 8) < 8) { return ERROR_IO; } chunk_size = ntoh64(chunk_size); data_offset += 8; if (chunk_size < 16) { // The smallest valid chunk is 16 bytes long in this case. return ERROR_MALFORMED; } } else if (chunk_size == 0) { // 如果chunk_size 为 0 说明当前为最后一个box if (depth == 0) { // atom extends to end of file off64_t sourceSize; if (mDataSource->getSize(&sourceSize) == OK) { chunk_size = (sourceSize - *offset); // 最后一个box的size需要根据文件大小来判断 } else { // XXX could we just pick a "sufficiently large" value here? ALOGE("atom size is 0, and data source has no size"); return ERROR_MALFORMED; } } else { // not allowed for non-toplevel atoms, skip it *offset += 4; return OK; } } else if (chunk_size < 8) { // The smallest valid chunk is 8 bytes long. ALOGE("invalid chunk size: %" PRIu64, chunk_size); return ERROR_MALFORMED; } char chunk[5]; // 将type转换为ASSIC码 MakeFourCCString(chunk_type, chunk); ALOGV("chunk: %s @ %lld, %d", chunk, (long long)*offset, depth); if (kUseHexDump) { static const char kWhitespace[] = " "; const char *indent = &kWhitespace[sizeof(kWhitespace) - 1 - 2 * depth]; printf("%sfound chunk '%s' of size %" PRIu64 "\n", indent, chunk, chunk_size); char buffer[256]; size_t n = chunk_size; if (n > sizeof(buffer)) { n = sizeof(buffer); } if (mDataSource->readAt(*offset, buffer, n) < (ssize_t)n) { return ERROR_IO; } hexdump(buffer, n); } PathAdder autoAdder(&mPath, chunk_type); // (data_offset - *offset) is either 8 or 16 // 计算box中的数据的长度,data_offset为读取的位置,offset为起始位置 off64_t chunk_data_size = chunk_size - (data_offset - *offset); if (chunk_data_size < 0) { ALOGE("b/23540914"); return ERROR_MALFORMED; } // 检查box的大小,如果不是mdat,但是其数据大小超过一定范围说明这个box存在问题 if (chunk_type != FOURCC("mdat") && chunk_data_size > kMaxAtomSize) { char errMsg[100]; sprintf(errMsg, "%s atom has size %" PRId64, chunk, chunk_data_size); ALOGE("%s (b/28615448)", errMsg); android_errorWriteWithInfoLog(0x534e4554, "28615448", -1, errMsg, strlen(errMsg)); return ERROR_MALFORMED; } // 不去研究这个box if (chunk_type != FOURCC("cprt") && chunk_type != FOURCC("covr") && mPath.size() == 5 && underMetaDataPath(mPath)) { off64_t stop_offset = *offset + chunk_size; *offset = data_offset; while (*offset < stop_offset) { status_t err = parseChunk(offset, depth + 1); if (err != OK) { return err; } } if (*offset != stop_offset) { return ERROR_MALFORMED; } return OK; } switch(chunk_type) { case FOURCC("moov"): case FOURCC("trak"): case FOURCC("mdia"): case FOURCC("minf"): case FOURCC("dinf"): case FOURCC("stbl"): case FOURCC("mvex"): case FOURCC("moof"): case FOURCC("traf"): case FOURCC("mfra"): case FOURCC("udta"): case FOURCC("ilst"): case FOURCC("sinf"): case FOURCC("schi"): case FOURCC("edts"): case FOURCC("wave"): { // 如果是moov box,但是其深度不为0,意思是moov box在一个container box中,那么就报错 if (chunk_type == FOURCC("moov") && depth != 0) { ALOGE("moov: depth %d", depth); return ERROR_MALFORMED; } // 如果是moov box,但是已经初始化完毕了,说明前面已经解析过一个moov了,那也是不对的 if (chunk_type == FOURCC("moov") && mInitCheck == OK) { ALOGE("duplicate moov"); return ERROR_MALFORMED; } if (chunk_type == FOURCC("moof") && !mMoofFound) { // store the offset of the first segment mMoofFound = true; mMoofOffset = *offset; } if (chunk_type == FOURCC("stbl")) { ALOGV("sampleTable chunk is %" PRIu64 " bytes long.", chunk_size); if (mDataSource->flags() & (DataSourceBase::kWantsPrefetching | DataSourceBase::kIsCachingDataSource)) { CachedRangedDataSource *cachedSource = new CachedRangedDataSource(mDataSource); if (cachedSource->setCachedRange( *offset, chunk_size, true /* assume ownership on success */) == OK) { mDataSource = cachedSource; } else { delete cachedSource; } } if (mLastTrack == NULL) { return ERROR_MALFORMED; } // 扫描到stbl之后为Track创建一个SampleTable,后面来看这个SampleTable做什么用的 mLastTrack->sampleTable = new SampleTable(mDataSource); } bool isTrack = false; if (chunk_type == FOURCC("trak")) { if (depth != 1) { ALOGE("trak: depth %d", depth); return ERROR_MALFORMED; } isTrack = true; // 扫描到trak box,则在Track链表上添加一个节点 ALOGV("adding new track"); Track *track = new Track; if (mLastTrack) { mLastTrack->next = track; } else { mFirstTrack = track; } mLastTrack = track; track->meta = AMediaFormat_new(); // 给track设置一个默认的mime AMediaFormat_setString(track->meta, AMEDIAFORMAT_KEY_MIME, "application/octet-stream"); } // 上面的box type都是conatiner box,这里会去递归解析子box off64_t stop_offset = *offset + chunk_size; *offset = data_offset; // 子box的起始位置起始就是原先的起始位置 + box header length(8) while (*offset < stop_offset) { // pass udata terminate if (mIsQT && stop_offset - *offset == 4 && chunk_type == FOURCC("udta")) { // handle the case that udta terminates with terminate code x00000000 // note that 0 terminator is optional and we just handle this case. uint32_t terminate_code = 1; mDataSource->readAt(*offset, &terminate_code, 4); if (0 == terminate_code) { *offset += 4; ALOGD("Terminal code for udta"); continue; } else { ALOGW("invalid udta Terminal code"); } } // 递归去parse status_t err = parseChunk(offset, depth + 1); if (err != OK) { if (isTrack) { mLastTrack->skipTrack = true; break; } return err; } } if (*offset != stop_offset) { return ERROR_MALFORMED; } // 递归解析结束之后,如果是解析的trak box,那就要整理解析的内容到Track当中 if (isTrack) { int32_t trackId; // There must be exactly one track header per track. // 如果track没有trackid,那么将当前track置为skip if (!AMediaFormat_getInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_TRACK_ID, &trackId)) { mLastTrack->skipTrack = true; } status_t err = verifyTrack(mLastTrack); if (err != OK) { mLastTrack->skipTrack = true; } // skipTrack被置为true说明该track无效,会从链表中删除该Track if (mLastTrack->skipTrack) { ALOGV("skipping this track..."); Track *cur = mFirstTrack; if (cur == mLastTrack) { delete cur; mFirstTrack = mLastTrack = NULL; } else { while (cur && cur->next != mLastTrack) { cur = cur->next; } if (cur) { cur->next = NULL; } delete mLastTrack; mLastTrack = cur; } return OK; } // place things we built elsewhere into their final locations // put aggregated tx3g data into the metadata if (mLastTrack->mTx3gFilled > 0) { ALOGV("Putting %zu bytes of tx3g data into meta data", mLastTrack->mTx3gFilled); AMediaFormat_setBuffer(mLastTrack->meta, AMEDIAFORMAT_KEY_TEXT_FORMAT_DATA, mLastTrack->mTx3gBuffer, mLastTrack->mTx3gFilled); // drop it now to reduce our footprint free(mLastTrack->mTx3gBuffer); mLastTrack->mTx3gBuffer = NULL; mLastTrack->mTx3gFilled = 0; mLastTrack->mTx3gSize = 0; } const char *mime; AMediaFormat_getString(mLastTrack->meta, AMEDIAFORMAT_KEY_MIME, &mime); // 判断mime是否为Video_dobly_vision,后面的暂时就不看了 if (!strcasecmp(mime, MEDIA_MIMETYPE_VIDEO_DOLBY_VISION)) { void *data; size_t size; if (AMediaFormat_getBuffer(mLastTrack->meta, AMEDIAFORMAT_KEY_CSD_2, &data, &size)) { const uint8_t *ptr = (const uint8_t *)data; const uint8_t profile = ptr[2] >> 1; const uint8_t bl_compatibility_id = (ptr[4]) >> 4; bool create_two_tracks = false; if (bl_compatibility_id && bl_compatibility_id != 15) { create_two_tracks = true; } if (4 == profile || 7 == profile || (profile >= 8 && profile < 11 && create_two_tracks)) { // we need a backward compatible track ALOGV("Adding new backward compatible track"); Track *track_b = new Track; track_b->timescale = mLastTrack->timescale; track_b->sampleTable = mLastTrack->sampleTable; track_b->includes_expensive_metadata = mLastTrack->includes_expensive_metadata; track_b->skipTrack = mLastTrack->skipTrack; track_b->elst_needs_processing = mLastTrack->elst_needs_processing; track_b->elst_media_time = mLastTrack->elst_media_time; track_b->elst_segment_duration = mLastTrack->elst_segment_duration; track_b->elst_shift_start_ticks = mLastTrack->elst_shift_start_ticks; track_b->elst_initial_empty_edit_ticks = mLastTrack->elst_initial_empty_edit_ticks; track_b->subsample_encryption = mLastTrack->subsample_encryption; track_b->mTx3gBuffer = mLastTrack->mTx3gBuffer; track_b->mTx3gSize = mLastTrack->mTx3gSize; track_b->mTx3gFilled = mLastTrack->mTx3gFilled; track_b->meta = AMediaFormat_new(); AMediaFormat_copy(track_b->meta, mLastTrack->meta); mLastTrack->next = track_b; track_b->next = NULL; auto id = track_b->meta->mFormat->findEntryByName(AMEDIAFORMAT_KEY_CSD_2); track_b->meta->mFormat->removeEntryAt(id); if (4 == profile || 7 == profile || 8 == profile ) { AMediaFormat_setString(track_b->meta, AMEDIAFORMAT_KEY_MIME, MEDIA_MIMETYPE_VIDEO_HEVC); } else if (9 == profile) { AMediaFormat_setString(track_b->meta, AMEDIAFORMAT_KEY_MIME, MEDIA_MIMETYPE_VIDEO_AVC); } else if (10 == profile) { AMediaFormat_setString(track_b->meta, AMEDIAFORMAT_KEY_MIME, MEDIA_MIMETYPE_VIDEO_AV1); } // Should never get to else part mLastTrack = track_b; } } } } else if (chunk_type == FOURCC("moov")) { // 如果当前递归扫描的是moov box,那么将mInitCheck置为true mInitCheck = OK; return UNKNOWN_ERROR; // Return a dummy error. } break; } // 暂时不研究这个,应该是用于加密视频播放 case FOURCC("schm"): { *offset += chunk_size; if (!mLastTrack) { return ERROR_MALFORMED; } uint32_t scheme_type; if (mDataSource->readAt(data_offset + 4, &scheme_type, 4) < 4) { return ERROR_IO; } scheme_type = ntohl(scheme_type); int32_t mode = kCryptoModeUnencrypted; switch(scheme_type) { case FOURCC("cbc1"): { mode = kCryptoModeAesCbc; break; } case FOURCC("cbcs"): { mode = kCryptoModeAesCbc; mLastTrack->subsample_encryption = true; break; } case FOURCC("cenc"): { mode = kCryptoModeAesCtr; break; } case FOURCC("cens"): { mode = kCryptoModeAesCtr; mLastTrack->subsample_encryption = true; break; } } if (mode != kCryptoModeUnencrypted) { AMediaFormat_setInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_CRYPTO_MODE, mode); } break; } // elst这个box 保存有视频的起始时间 case FOURCC("elst"): { *offset += chunk_size; if (!mLastTrack) { return ERROR_MALFORMED; } // 读取版本信息 // See 14496-12 8.6.6 uint8_t version; if (mDataSource->readAt(data_offset, &version, 1) < 1) { return ERROR_IO; } // 读取box中内容条数 uint32_t entry_count; if (!mDataSource->getUInt32(data_offset + 4, &entry_count)) { return ERROR_IO; } if (entry_count > 2) { /* We support a single entry for gapless playback or negating offset for * reordering B frames, two entries (empty edit) for start offset at the moment. */ ALOGW("ignoring edit list with %d entries", entry_count); } else { off64_t entriesoffset = data_offset + 8; uint64_t segment_duration; int64_t media_time; bool empty_edit_present = false; for (int i = 0; i < entry_count; ++i) { switch (version) { // 这里只看version为0的版本 case 0: { uint32_t sd; int32_t mt; // 读取segment_duration,应该就是track的时长 // 读取media_time,为流的起始时间用于计算DTS和PTS if (!mDataSource->getUInt32(entriesoffset, &sd) || !mDataSource->getUInt32(entriesoffset + 4, (uint32_t*)&mt)) { return ERROR_IO; } segment_duration = sd; media_time = mt; // 4(segment duration) + 4(media time) + 4(media rate) entriesoffset += 12; break; } case 1: { if (!mDataSource->getUInt64(entriesoffset, &segment_duration) || !mDataSource->getUInt64(entriesoffset + 8, (uint64_t*)&media_time)) { return ERROR_IO; } // 8(segment duration) + 8(media time) + 4(media rate) entriesoffset += 20; break; } default: return ERROR_IO; break; } // Empty edit entry would have to be first entry. if (media_time == -1 && i == 0) { empty_edit_present = true; ALOGV("initial empty edit ticks: %" PRIu64, segment_duration); /* In movie header timescale, and needs to be converted to media timescale * after we get that from a track's 'mdhd' atom, * which at times come after 'elst'. */ mLastTrack->elst_initial_empty_edit_ticks = segment_duration; } else if (media_time >= 0 && i == 0) { ALOGV("first edit list entry - from gapless playback files"); // 保存elst信息到Track当中 mLastTrack->elst_media_time = media_time; mLastTrack->elst_segment_duration = segment_duration; ALOGV("segment_duration: %" PRIu64 " media_time: %" PRId64, segment_duration, media_time); // media_time is in media timescale as are STTS/CTTS entries. mLastTrack->elst_shift_start_ticks = media_time; } else if (empty_edit_present && i == 1) { // Process second entry only when the first entry was an empty edit entry. ALOGV("second edit list entry"); mLastTrack->elst_shift_start_ticks = media_time; } else { ALOGW("for now, unsupported entry in edit list %" PRIu32, entry_count); } } // save these for later, because the elst atom might precede // the atoms that actually gives us the duration and sample rate // needed to calculate the padding and delay values mLastTrack->elst_needs_processing = true; } break; } // 如果有frmabox case FOURCC("frma"): { *offset += chunk_size; uint32_t original_fourcc; if (mDataSource->readAt(data_offset, &original_fourcc, 4) < 4) { return ERROR_IO; } original_fourcc = ntohl(original_fourcc); ALOGV("read original format: %d", original_fourcc); if (mLastTrack == NULL) { return ERROR_MALFORMED; } // 设定track的mime AMediaFormat_setString(mLastTrack->meta, AMEDIAFORMAT_KEY_MIME, FourCC2MIME(original_fourcc)); uint32_t num_channels = 0; uint32_t sample_rate = 0; if (AdjustChannelsAndRate(original_fourcc, &num_channels, &sample_rate)) { AMediaFormat_setInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_CHANNEL_COUNT, num_channels); AMediaFormat_setInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_SAMPLE_RATE, sample_rate); } if (!mIsQT && original_fourcc == FOURCC("alac")) { off64_t tmpOffset = *offset; status_t err = parseALACSampleEntry(&tmpOffset); if (err != OK) { ALOGE("parseALACSampleEntry err:%d Line:%d", err, __LINE__); return err; } *offset = tmpOffset + 8; } break; } // ...... // 解析track header case FOURCC("tkhd"): { *offset += chunk_size; status_t err; // 主要用来解析track id,video track的width、height,并且保存在meta data中 if ((err = parseTrackHeader(data_offset, chunk_data_size)) != OK) { return err; } break; } // ...... // 解析mdhd case FOURCC("mdhd"): { *offset += chunk_size; if (chunk_data_size < 4 || mLastTrack == NULL) { return ERROR_MALFORMED; } uint8_t version; if (mDataSource->readAt( data_offset, &version, sizeof(version)) < (ssize_t)sizeof(version)) { return ERROR_IO; } off64_t timescale_offset; if (version == 1) { timescale_offset = data_offset + 4 + 16; } else if (version == 0) { timescale_offset = data_offset + 4 + 8; } else { return ERROR_IO; } // 读取timescale uint32_t timescale; if (mDataSource->readAt( timescale_offset, ×cale, sizeof(timescale)) < (ssize_t)sizeof(timescale)) { return ERROR_IO; } if (!timescale) { ALOGE("timescale should not be ZERO."); return ERROR_MALFORMED; } // 将timescale保存到track中 mLastTrack->timescale = ntohl(timescale); // 14496-12 says all ones means indeterminate, but some files seem to use // 0 instead. We treat both the same. int64_t duration = 0; if (version == 1) { if (mDataSource->readAt( timescale_offset + 4, &duration, sizeof(duration)) < (ssize_t)sizeof(duration)) { return ERROR_IO; } if (duration != -1) { duration = ntoh64(duration); } } else { // 这里只看version为0的版本 uint32_t duration32; // 读取当前track的duration if (mDataSource->readAt( timescale_offset + 4, &duration32, sizeof(duration32)) < (ssize_t)sizeof(duration32)) { return ERROR_IO; } if (duration32 != 0xffffffff) { duration = ntohl(duration32); } } if (duration != 0 && mLastTrack->timescale != 0) { // 真正的duration需要用这边获取的duration除以timescale long double durationUs = ((long double)duration * 1000000) / mLastTrack->timescale; if (durationUs < 0 || durationUs > INT64_MAX) { ALOGE("cannot represent %lld * 1000000 / %lld in 64 bits", (long long) duration, (long long) mLastTrack->timescale); return ERROR_MALFORMED; } // 设置给meta的duration是用的微秒 AMediaFormat_setInt64(mLastTrack->meta, AMEDIAFORMAT_KEY_DURATION, durationUs); } uint8_t lang[2]; off64_t lang_offset; if (version == 1) { lang_offset = timescale_offset + 4 + 8; } else if (version == 0) { lang_offset = timescale_offset + 4 + 4; } else { return ERROR_IO; } if (mDataSource->readAt(lang_offset, &lang, sizeof(lang)) < (ssize_t)sizeof(lang)) { return ERROR_IO; } // To get the ISO-639-2/T three character language code // 1 bit pad followed by 3 5-bits characters. Each character // is packed as the difference between its ASCII value and 0x60. char lang_code[4]; lang_code[0] = ((lang[0] >> 2) & 0x1f) + 0x60; lang_code[1] = ((lang[0] & 0x3) << 3 | (lang[1] >> 5)) + 0x60; lang_code[2] = (lang[1] & 0x1f) + 0x60; lang_code[3] = '\0'; // 给meta设置key language AMediaFormat_setString(mLastTrack->meta, AMEDIAFORMAT_KEY_LANGUAGE, lang_code); break; } // 非常中要的box,子box可以解析出mime case FOURCC("stsd"): { uint8_t buffer[8]; if (chunk_data_size < (off64_t)sizeof(buffer)) { return ERROR_MALFORMED; } if (mDataSource->readAt( data_offset, buffer, 8) < 8) { return ERROR_IO; } if (U32_AT(buffer) != 0) { // Should be version 0, flags 0. return ERROR_MALFORMED; } uint32_t entry_count = U32_AT(&buffer[4]); if (entry_count > 1) { // For 3GPP timed text, there could be multiple tx3g boxes contain // multiple text display formats. These formats will be used to // display the timed text. // For encrypted files, there may also be more than one entry. const char *mime; if (mLastTrack == NULL) return ERROR_MALFORMED; CHECK(AMediaFormat_getString(mLastTrack->meta, AMEDIAFORMAT_KEY_MIME, &mime)); if (strcasecmp(mime, MEDIA_MIMETYPE_TEXT_3GPP) && strcasecmp(mime, "application/octet-stream")) { // For now we only support a single type of media per track. mLastTrack->skipTrack = true; *offset += chunk_size; break; } } off64_t stop_offset = *offset + chunk_size; *offset = data_offset + 8; for (uint32_t i = 0; i < entry_count; ++i) { // 递归parse子box,可以解析出mime type status_t err = parseChunk(offset, depth + 1); if (err != OK) { return err; } } if (*offset != stop_offset) { return ERROR_MALFORMED; } break; } // stsd子box type如果是以下内容,说明是audio track case FOURCC("mp4a"): case FOURCC("enca"): case FOURCC("samr"): case FOURCC("sawb"): case FOURCC("Opus"): case FOURCC("twos"): case FOURCC("sowt"): case FOURCC("alac"): case FOURCC("fLaC"): case FOURCC(".mp3"): case 0x6D730055: // "ms U" mp3 audio { if (mIsQT && depth >= 1 && mPath[depth - 1] == FOURCC("wave")) { if (chunk_type == FOURCC("alac")) { off64_t offsetTmp = *offset; status_t err = parseALACSampleEntry(&offsetTmp); if (err != OK) { ALOGE("parseALACSampleEntry err:%d Line:%d", err, __LINE__); return err; } } // Ignore all atoms embedded in QT wave atom ALOGV("Ignore all atoms embedded in QT wave atom"); *offset += chunk_size; break; } uint8_t buffer[8 + 20]; if (chunk_data_size < (ssize_t)sizeof(buffer)) { // Basic AudioSampleEntry size. return ERROR_MALFORMED; } if (mDataSource->readAt( data_offset, buffer, sizeof(buffer)) < (ssize_t)sizeof(buffer)) { return ERROR_IO; } uint16_t data_ref_index __unused = U16_AT(&buffer[6]); uint16_t version = U16_AT(&buffer[8]); uint32_t num_channels = U16_AT(&buffer[16]); uint16_t sample_size = U16_AT(&buffer[18]); uint32_t sample_rate = U32_AT(&buffer[24]) >> 16; if (mLastTrack == NULL) return ERROR_MALFORMED; off64_t stop_offset = *offset + chunk_size; *offset = data_offset + sizeof(buffer); if (mIsQT) { if (version == 1) { if (mDataSource->readAt(*offset, buffer, 16) < 16) { return ERROR_IO; } #if 0 U32_AT(buffer); // samples per packet U32_AT(&buffer[4]); // bytes per packet U32_AT(&buffer[8]); // bytes per frame U32_AT(&buffer[12]); // bytes per sample #endif *offset += 16; } else if (version == 2) { uint8_t v2buffer[36]; if (mDataSource->readAt(*offset, v2buffer, 36) < 36) { return ERROR_IO; } #if 0 U32_AT(v2buffer); // size of struct only sample_rate = (uint32_t)U64_AT(&v2buffer[4]); // audio sample rate num_channels = U32_AT(&v2buffer[12]); // num audio channels U32_AT(&v2buffer[16]); // always 0x7f000000 sample_size = (uint16_t)U32_AT(&v2buffer[20]); // const bits per channel U32_AT(&v2buffer[24]); // format specifc flags U32_AT(&v2buffer[28]); // const bytes per audio packet U32_AT(&v2buffer[32]); // const LPCM frames per audio packet #endif *offset += 36; } } if (chunk_type != FOURCC("enca")) { // if the chunk type is enca, we'll get the type from the frma box later AMediaFormat_setString(mLastTrack->meta, AMEDIAFORMAT_KEY_MIME, FourCC2MIME(chunk_type)); AdjustChannelsAndRate(chunk_type, &num_channels, &sample_rate); if (!strcasecmp(MEDIA_MIMETYPE_AUDIO_RAW, FourCC2MIME(chunk_type))) { AMediaFormat_setInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_BITS_PER_SAMPLE, sample_size); if (chunk_type == FOURCC("twos")) { AMediaFormat_setInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_PCM_BIG_ENDIAN, 1); } } } // 将读取出的sample size和sample rate保存到meta当中 ALOGV("*** coding='%s' %d channels, size %d, rate %d\n", chunk, num_channels, sample_size, sample_rate); AMediaFormat_setInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_CHANNEL_COUNT, num_channels); AMediaFormat_setInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_SAMPLE_RATE, sample_rate); // ...... if (!mIsQT && chunk_type == FOURCC("alac")) { data_offset += sizeof(buffer); status_t err = parseALACSampleEntry(&data_offset); if (err != OK) { ALOGE("parseALACSampleEntry err:%d Line:%d", err, __LINE__); return err; } *offset = data_offset; CHECK_EQ(*offset, stop_offset); } if (chunk_type == FOURCC("fLaC")) { // From https://github.com/xiph/flac/blob/master/doc/isoflac.txt // 4 for mime, 4 for blockType and BlockLen, 34 for metadata uint8_t flacInfo[4 + 4 + 34]; // skipping dFla, version data_offset += sizeof(buffer) + 12; size_t flacOffset = 4; // Add flaC header mime type to CSD strncpy((char *)flacInfo, "fLaC", 4); if (mDataSource->readAt( data_offset, flacInfo + flacOffset, sizeof(flacInfo) - flacOffset) < (ssize_t)sizeof(flacInfo) - flacOffset) { return ERROR_IO; } data_offset += sizeof(flacInfo) - flacOffset; AMediaFormat_setBuffer(mLastTrack->meta, AMEDIAFORMAT_KEY_CSD_0, flacInfo, sizeof(flacInfo)); *offset = data_offset; CHECK_EQ(*offset, stop_offset); } while (*offset < stop_offset) { // 继续递归子box status_t err = parseChunk(offset, depth + 1); if (err != OK) { return err; } } if (*offset != stop_offset) { return ERROR_MALFORMED; } break; } // 如果box type是以下内容,那么说明当前track为video track case FOURCC("mp4v"): case FOURCC("encv"): case FOURCC("s263"): case FOURCC("H263"): case FOURCC("h263"): case FOURCC("avc1"): case FOURCC("hvc1"): case FOURCC("hev1"): case FOURCC("dvav"): case FOURCC("dva1"): case FOURCC("dvhe"): case FOURCC("dvh1"): case FOURCC("dav1"): case FOURCC("av01"): { uint8_t buffer[78]; if (chunk_data_size < (ssize_t)sizeof(buffer)) { // Basic VideoSampleEntry size. return ERROR_MALFORMED; } if (mDataSource->readAt( data_offset, buffer, sizeof(buffer)) < (ssize_t)sizeof(buffer)) { return ERROR_IO; } uint16_t data_ref_index __unused = U16_AT(&buffer[6]); uint16_t width = U16_AT(&buffer[6 + 18]); uint16_t height = U16_AT(&buffer[6 + 20]); // The video sample is not standard-compliant if it has invalid dimension. // Use some default width and height value, and // let the decoder figure out the actual width and height (and thus // be prepared for INFO_FOMRAT_CHANGED event). if (width == 0) width = 352; if (height == 0) height = 288; // printf("*** coding='%s' width=%d height=%d\n", // chunk, width, height); if (mLastTrack == NULL) return ERROR_MALFORMED; if (chunk_type != FOURCC("encv")) { // if the chunk type is encv, we'll get the type from the frma box later AMediaFormat_setString(mLastTrack->meta, AMEDIAFORMAT_KEY_MIME, FourCC2MIME(chunk_type)); } // 同样可以解析出视频的宽高,并且将他们设置到meta当中 AMediaFormat_setInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_WIDTH, width); AMediaFormat_setInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_HEIGHT, height); off64_t stop_offset = *offset + chunk_size; *offset = data_offset + sizeof(buffer); while (*offset < stop_offset) { // 继续parse子box status_t err = parseChunk(offset, depth + 1); if (err != OK) { return err; } } if (*offset != stop_offset) { return ERROR_MALFORMED; } break; } // 解析stco,这里面存储的是trunk在mtdt中的偏移量 case FOURCC("stco"): case FOURCC("co64"): { if ((mLastTrack == NULL) || (mLastTrack->sampleTable == NULL)) { return ERROR_MALFORMED; } // 设置chunk offset的参数,当时创建sampleTable时,是直接将包含stbl box在内的剩余数据全部拷贝到了sample table当中 status_t err = mLastTrack->sampleTable->setChunkOffsetParams( chunk_type, data_offset, chunk_data_size); *offset += chunk_size; if (err != OK) { return err; } break; } case FOURCC("stsc"): { if ((mLastTrack == NULL) || (mLastTrack->sampleTable == NULL)) return ERROR_MALFORMED; // 设置stsc的相关数据区域 status_t err = mLastTrack->sampleTable->setSampleToChunkParams( data_offset, chunk_data_size); *offset += chunk_size; if (err != OK) { return err; } break; } case FOURCC("stsz"): case FOURCC("stz2"): { if ((mLastTrack == NULL) || (mLastTrack->sampleTable == NULL)) { return ERROR_MALFORMED; } // 设置stsz的数据区域 status_t err = mLastTrack->sampleTable->setSampleSizeParams( chunk_type, data_offset, chunk_data_size); *offset += chunk_size; if (err != OK) { return err; } adjustRawDefaultFrameSize(); size_t max_size; err = mLastTrack->sampleTable->getMaxSampleSize(&max_size); if (err != OK) { return err; } if (max_size != 0) { // Assume that a given buffer only contains at most 10 chunks, // each chunk originally prefixed with a 2 byte length will // have a 4 byte header (0x00 0x00 0x00 0x01) after conversion, // and thus will grow by 2 bytes per chunk. if (max_size > SIZE_MAX - 10 * 2) { ALOGE("max sample size too big: %zu", max_size); return ERROR_MALFORMED; } AMediaFormat_setInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_MAX_INPUT_SIZE, max_size + 10 * 2); } else { // No size was specified. Pick a conservatively large size. uint32_t width, height; if (!AMediaFormat_getInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_WIDTH, (int32_t*)&width) || !AMediaFormat_getInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_HEIGHT,(int32_t*) &height)) { ALOGE("No width or height, assuming worst case 1080p"); width = 1920; height = 1080; } else { // A resolution was specified, check that it's not too big. The values below // were chosen so that the calculations below don't cause overflows, they're // not indicating that resolutions up to 32kx32k are actually supported. if (width > 32768 || height > 32768) { ALOGE("can't support %u x %u video", width, height); return ERROR_MALFORMED; } } const char *mime; CHECK(AMediaFormat_getString(mLastTrack->meta, AMEDIAFORMAT_KEY_MIME, &mime)); if (!strncmp(mime, "audio/", 6)) { // for audio, use 128KB max_size = 1024 * 128; } else if (!strcmp(mime, MEDIA_MIMETYPE_VIDEO_AVC) || !strcmp(mime, MEDIA_MIMETYPE_VIDEO_HEVC) || !strcmp(mime, MEDIA_MIMETYPE_VIDEO_DOLBY_VISION)) { // AVC & HEVC requires compression ratio of at least 2, and uses // macroblocks max_size = ((width + 15) / 16) * ((height + 15) / 16) * 192; } else { // For all other formats there is no minimum compression // ratio. Use compression ratio of 1. max_size = width * height * 3 / 2; } // HACK: allow 10% overhead // TODO: read sample size from traf atom for fragmented MPEG4. max_size += max_size / 10; // 设定最大的buffer输入大小 AMediaFormat_setInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_MAX_INPUT_SIZE, max_size); } // NOTE: setting another piece of metadata invalidates any pointers (such as the // mimetype) previously obtained, so don't cache them. const char *mime; CHECK(AMediaFormat_getString(mLastTrack->meta, AMEDIAFORMAT_KEY_MIME, &mime)); // Calculate average frame rate. if (!strncasecmp("video/", mime, 6)) { size_t nSamples = mLastTrack->sampleTable->countSamples(); if (nSamples == 0) { int32_t trackId; if (AMediaFormat_getInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_TRACK_ID, &trackId)) { for (size_t i = 0; i < mTrex.size(); i++) { Trex *t = &mTrex.editItemAt(i); if (t->track_ID == (uint32_t) trackId) { if (t->default_sample_duration > 0) { int32_t frameRate = mLastTrack->timescale / t->default_sample_duration; AMediaFormat_setInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_FRAME_RATE, frameRate); } break; } } } } else { int64_t durationUs; if (AMediaFormat_getInt64(mLastTrack->meta, AMEDIAFORMAT_KEY_DURATION, &durationUs)) { if (durationUs > 0) { int32_t frameRate = (nSamples * 1000000LL + (durationUs >> 1)) / durationUs; // 给meta设置帧率 AMediaFormat_setInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_FRAME_RATE, frameRate); } } ALOGV("setting frame count %zu", nSamples); // 给meta设置帧数量 AMediaFormat_setInt32(mLastTrack->meta, AMEDIAFORMAT_KEY_FRAME_COUNT, nSamples); } } break; } case FOURCC("stts"): { if ((mLastTrack == NULL) || (mLastTrack->sampleTable == NULL)) return ERROR_MALFORMED; *offset += chunk_size; if (depth >= 1 && mPath[depth - 1] != FOURCC("stbl")) { char chunk[5]; MakeFourCCString(mPath[depth - 1], chunk); ALOGW("stts's parent box (%s) is not stbl, skip it.", chunk); break; } status_t err = mLastTrack->sampleTable->setTimeToSampleParams( data_offset, chunk_data_size); if (err != OK) { return err; } break; } case FOURCC("ctts"): { if ((mLastTrack == NULL) || (mLastTrack->sampleTable == NULL)) return ERROR_MALFORMED; *offset += chunk_size; status_t err = mLastTrack->sampleTable->setCompositionTimeToSampleParams( data_offset, chunk_data_size); if (err != OK) { return err; } break; } case FOURCC("stss"): { if ((mLastTrack == NULL) || (mLastTrack->sampleTable == NULL)) return ERROR_MALFORMED; *offset += chunk_size; status_t err = mLastTrack->sampleTable->setSyncSampleParams( data_offset, chunk_data_size); if (err != OK) { return err; } break; } // ...... // 如果avc1的子box是avcC,那么可以解析出sps pps信息 case FOURCC("avcC"): { *offset += chunk_size; auto buffer = heapbuffer<uint8_t>(chunk_data_size); if (buffer.get() == NULL) { ALOGE("b/28471206"); return NO_MEMORY; } if (mDataSource->readAt( data_offset, buffer.get(), chunk_data_size) < chunk_data_size) { return ERROR_IO; } if (mLastTrack == NULL) return ERROR_MALFORMED; // 将读取到的buffer作为csd buffer AMediaFormat_setBuffer(mLastTrack->meta, AMEDIAFORMAT_KEY_CSD_AVC, buffer.get(), chunk_data_size); break; } case FOURCC("hvcC"): { auto buffer = heapbuffer<uint8_t>(chunk_data_size); if (buffer.get() == NULL) { ALOGE("b/28471206"); return NO_MEMORY; } if (mDataSource->readAt( data_offset, buffer.get(), chunk_data_size) < chunk_data_size) { return ERROR_IO; } if (mLastTrack == NULL) return ERROR_MALFORMED; // 同样的,如果是hevc,也去读取vps sps pps信息作为csd buffer,存储到meta中 AMediaFormat_setBuffer(mLastTrack->meta, AMEDIAFORMAT_KEY_CSD_HEVC, buffer.get(), chunk_data_size); *offset += chunk_size; break; } case FOURCC("av1C"): { auto buffer = heapbuffer<uint8_t>(chunk_data_size); if (buffer.get() == NULL) { ALOGE("b/28471206"); return NO_MEMORY; } if (mDataSource->readAt( data_offset, buffer.get(), chunk_data_size) < chunk_data_size) { return ERROR_IO; } if (mLastTrack == NULL) return ERROR_MALFORMED; AMediaFormat_setBuffer(mLastTrack->meta, AMEDIAFORMAT_KEY_CSD_0, buffer.get(), chunk_data_size); *offset += chunk_size; break; } // 杜比相关内容 case FOURCC("dvcC"): case FOURCC("dvvC"): { CHECK_EQ(chunk_data_size, 24); auto buffer = heapbuffer<uint8_t>(chunk_data_size); if (buffer.get() == NULL) { ALOGE("b/28471206"); return NO_MEMORY; } if (mDataSource->readAt(data_offset, buffer.get(), chunk_data_size) < chunk_data_size) { return ERROR_IO; } if (mLastTrack == NULL) return ERROR_MALFORMED; AMediaFormat_setBuffer(mLastTrack->meta, AMEDIAFORMAT_KEY_CSD_2, buffer.get(), chunk_data_size); AMediaFormat_setString(mLastTrack->meta, AMEDIAFORMAT_KEY_MIME, MEDIA_MIMETYPE_VIDEO_DOLBY_VISION); *offset += chunk_size; break; } // ...... // mvhd中解析出的是文件的元信息 case FOURCC("mvhd"): { *offset += chunk_size; if (depth != 1) { ALOGE("mvhd: depth %d", depth); return ERROR_MALFORMED; } if (chunk_data_size < 32) { return ERROR_MALFORMED; } uint8_t header[32]; if (mDataSource->readAt( data_offset, header, sizeof(header)) < (ssize_t)sizeof(header)) { return ERROR_IO; } uint64_t creationTime; uint64_t duration = 0; if (header[0] == 1) { creationTime = U64_AT(&header[4]); mHeaderTimescale = U32_AT(&header[20]); duration = U64_AT(&header[24]); if (duration == 0xffffffffffffffff) { duration = 0; } } else if (header[0] != 0) { return ERROR_MALFORMED; } else { creationTime = U32_AT(&header[4]); mHeaderTimescale = U32_AT(&header[12]); uint32_t d32 = U32_AT(&header[16]); if (d32 == 0xffffffff) { d32 = 0; } duration = d32; } if (duration != 0 && mHeaderTimescale != 0 && duration < UINT64_MAX / 1000000) { AMediaFormat_setInt64(mFileMetaData, AMEDIAFORMAT_KEY_DURATION, duration * 1000000 / mHeaderTimescale); } String8 s; if (convertTimeToDate(creationTime, &s)) { AMediaFormat_setString(mFileMetaData, AMEDIAFORMAT_KEY_DATE, s.string()); } break; } // 将mMdatFound置为true,并将chunk_size返回 case FOURCC("mdat"): { mMdatFound = true; *offset += chunk_size; break; } // hdlr中的handler_type并不会作为mime type,但是应该是可以用来确定audio和video case FOURCC("hdlr"): { *offset += chunk_size; if (underQTMetaPath(mPath, 3)) { break; } uint32_t buffer; if (mDataSource->readAt( data_offset + 8, &buffer, 4) < 4) { return ERROR_IO; } uint32_t type = ntohl(buffer); // For the 3GPP file format, the handler-type within the 'hdlr' box // shall be 'text'. We also want to support 'sbtl' handler type // for a practical reason as various MPEG4 containers use it. if (type == FOURCC("text") || type == FOURCC("sbtl")) { if (mLastTrack != NULL) { AMediaFormat_setString(mLastTrack->meta, AMEDIAFORMAT_KEY_MIME, MEDIA_MIMETYPE_TEXT_3GPP); } } break; } // ...... // 这个box我记得可能是存储的媒体的缩略图等信息 case FOURCC("tx3g"): { if (mLastTrack == NULL) return ERROR_MALFORMED; // complain about ridiculous chunks if (chunk_size > kMaxAtomSize) { return ERROR_MALFORMED; } // complain about empty atoms if (chunk_data_size <= 0) { ALOGE("b/124330204"); android_errorWriteLog(0x534e4554, "124330204"); return ERROR_MALFORMED; } // should fill buffer based on "data_offset" and "chunk_data_size" // instead of *offset and chunk_size; // but we've been feeding the extra data to consumers for multiple releases and // if those apps are compensating for it, we'd break them with such a change // if (mLastTrack->mTx3gBuffer == NULL) { mLastTrack->mTx3gSize = 0; mLastTrack->mTx3gFilled = 0; } if (mLastTrack->mTx3gSize - mLastTrack->mTx3gFilled < chunk_size) { size_t growth = kTx3gGrowth; if (growth < chunk_size) { growth = chunk_size; } // although this disallows 2 tx3g atoms of nearly kMaxAtomSize... if ((uint64_t) mLastTrack->mTx3gSize + growth > kMaxAtomSize) { ALOGE("b/124330204 - too much space"); android_errorWriteLog(0x534e4554, "124330204"); return ERROR_MALFORMED; } uint8_t *updated = (uint8_t *)realloc(mLastTrack->mTx3gBuffer, mLastTrack->mTx3gSize + growth); if (updated == NULL) { return ERROR_MALFORMED; } mLastTrack->mTx3gBuffer = updated; mLastTrack->mTx3gSize += growth; } if ((size_t)(mDataSource->readAt(*offset, mLastTrack->mTx3gBuffer + mLastTrack->mTx3gFilled, chunk_size)) < chunk_size) { // advance read pointer so we don't end up reading this again *offset += chunk_size; return ERROR_IO; } mLastTrack->mTx3gFilled += chunk_size; *offset += chunk_size; break; } case FOURCC("ac-3"): { *offset += chunk_size; // bypass ac-3 if parse fail if (parseAC3SpecificBox(data_offset) != OK) { if (mLastTrack != NULL) { ALOGW("Fail to parse ac-3"); mLastTrack->skipTrack = true; } } return OK; } case FOURCC("ec-3"): { *offset += chunk_size; // bypass ec-3 if parse fail if (parseEAC3SpecificBox(data_offset) != OK) { if (mLastTrack != NULL) { ALOGW("Fail to parse ec-3"); mLastTrack->skipTrack = true; } } return OK; } case FOURCC("ac-4"): { *offset += chunk_size; // bypass ac-4 if parse fail if (parseAC4SpecificBox(data_offset) != OK) { if (mLastTrack != NULL) { ALOGW("Fail to parse ac-4"); mLastTrack->skipTrack = true; } } return OK; } case FOURCC("ftyp"): { if (chunk_data_size < 8 || depth != 0) { return ERROR_MALFORMED; } off64_t stop_offset = *offset + chunk_size; uint32_t numCompatibleBrands = (chunk_data_size - 8) / 4; std::set<uint32_t> brandSet; for (size_t i = 0; i < numCompatibleBrands + 2; ++i) { if (i == 1) { // Skip this index, it refers to the minorVersion, // not a brand. continue; } uint32_t brand; if (mDataSource->readAt(data_offset + 4 * i, &brand, 4) < 4) { return ERROR_MALFORMED; } brand = ntohl(brand); brandSet.insert(brand); } if (brandSet.count(FOURCC("qt ")) > 0) { mIsQT = true; } else { if (brandSet.count(FOURCC("mif1")) > 0 && brandSet.count(FOURCC("heic")) > 0) { ALOGV("identified HEIF image"); mIsHeif = true; brandSet.erase(FOURCC("mif1")); brandSet.erase(FOURCC("heic")); } if (!brandSet.empty()) { // This means that the file should have moov box. // It could be any iso files (mp4, heifs, etc.) mHasMoovBox = true; if (mIsHeif) { ALOGV("identified HEIF image with other tracks"); } } } *offset = stop_offset; break; } default: { // check if we're parsing 'ilst' for meta keys // if so, treat type as a number (key-id). if (underQTMetaPath(mPath, 3)) { status_t err = parseQTMetaVal(chunk_type, data_offset, chunk_data_size); if (err != OK) { return err; } } *offset += chunk_size; break; } } return OK; }
Sample Table持有一个DataSource,解析stts、stss等box时把对应的偏移量以及结束位置初始化了SampleTable,
MPEG4Extractor::getTrack
MediaTrackHelper *MPEG4Extractor::getTrack(size_t index) { status_t err; if ((err = readMetaData()) != OK) { return NULL; } // 循环拿到nIndex对应的track Track *track = mFirstTrack; while (index > 0) { if (track == NULL) { return NULL; } track = track->next; --index; } if (track == NULL) { return NULL; } // 检查trackID Trex *trex = NULL; int32_t trackId; if (AMediaFormat_getInt32(track->meta, AMEDIAFORMAT_KEY_TRACK_ID, &trackId)) { for (size_t i = 0; i < mTrex.size(); i++) { Trex *t = &mTrex.editItemAt(i); if (t->track_ID == (uint32_t) trackId) { trex = t; break; } } } else { ALOGE("b/21657957"); return NULL; } ALOGV("getTrack called, pssh: %zu", mPssh.size()); // 检查mime const char *mime; if (!AMediaFormat_getString(track->meta, AMEDIAFORMAT_KEY_MIME, &mime)) { return NULL; } sp<ItemTable> itemTable; // 如果是avc,那么需要检查CSD buffer if (!strcasecmp(mime, MEDIA_MIMETYPE_VIDEO_AVC)) { void *data; size_t size; if (!AMediaFormat_getBuffer(track->meta, AMEDIAFORMAT_KEY_CSD_AVC, &data, &size)) { return NULL; } const uint8_t *ptr = (const uint8_t *)data; // 读取CSB buffer,检查configurationVersion值 if (size < 7 || ptr[0] != 1) { // configurationVersion == 1 return NULL; } } else if (!strcasecmp(mime, MEDIA_MIMETYPE_VIDEO_HEVC) || !strcasecmp(mime, MEDIA_MIMETYPE_IMAGE_ANDROID_HEIC)) { void *data; size_t size; if (!AMediaFormat_getBuffer(track->meta, AMEDIAFORMAT_KEY_CSD_HEVC, &data, &size)) { return NULL; } const uint8_t *ptr = (const uint8_t *)data; if (size < 22 || ptr[0] != 1) { // configurationVersion == 1 return NULL; } if (!strcasecmp(mime, MEDIA_MIMETYPE_IMAGE_ANDROID_HEIC)) { itemTable = mItemTable; } } else if (!strcasecmp(mime, MEDIA_MIMETYPE_VIDEO_DOLBY_VISION)) { void *data; size_t size; if (!AMediaFormat_getBuffer(track->meta, AMEDIAFORMAT_KEY_CSD_2, &data, &size)) { return NULL; } const uint8_t *ptr = (const uint8_t *)data; // dv_major.dv_minor Should be 1.0 or 2.1 if (size != 24 || ((ptr[0] != 1 || ptr[1] != 0) && (ptr[0] != 2 || ptr[1] != 1))) { return NULL; } } else if (!strcasecmp(mime, MEDIA_MIMETYPE_VIDEO_AV1)) { void *data; size_t size; if (!AMediaFormat_getBuffer(track->meta, AMEDIAFORMAT_KEY_CSD_0, &data, &size)) { return NULL; } const uint8_t *ptr = (const uint8_t *)data; if (size < 5 || ptr[0] != 0x81) { // configurationVersion == 1 return NULL; } } ALOGV("track->elst_shift_start_ticks :%" PRIu64, track->elst_shift_start_ticks); uint64_t elst_initial_empty_edit_ticks = 0; if (mHeaderTimescale != 0) { // Convert empty_edit_ticks from movie timescale to media timescale. uint64_t elst_initial_empty_edit_ticks_mul = 0, elst_initial_empty_edit_ticks_add = 0; if (__builtin_mul_overflow(track->elst_initial_empty_edit_ticks, track->timescale, &elst_initial_empty_edit_ticks_mul) || __builtin_add_overflow(elst_initial_empty_edit_ticks_mul, (mHeaderTimescale / 2), &elst_initial_empty_edit_ticks_add)) { ALOGE("track->elst_initial_empty_edit_ticks overflow"); return nullptr; } elst_initial_empty_edit_ticks = elst_initial_empty_edit_ticks_add / mHeaderTimescale; } ALOGV("elst_initial_empty_edit_ticks in MediaTimeScale :%" PRIu64, elst_initial_empty_edit_ticks); // 创建MediaSource并返回 MPEG4Source* source = new MPEG4Source(track->meta, mDataSource, track->timescale, track->sampleTable, mSidxEntries, trex, mMoofOffset, itemTable, track->elst_shift_start_ticks, elst_initial_empty_edit_ticks); if (source->init() != OK) { delete source; return NULL; } return source; }
MPEG4Source::read
media_status_t MPEG4Source::read( MediaBufferHelper **out, const ReadOptions *options) { Mutex::Autolock autoLock(mLock); CHECK(mStarted); if (options != nullptr && options->getNonBlocking() && !mBufferGroup->has_buffers()) { *out = nullptr; return AMEDIA_ERROR_WOULD_BLOCK; } if (mFirstMoofOffset > 0) { return fragmentedRead(out, options); } *out = NULL; int64_t targetSampleTimeUs = -1; int64_t seekTimeUs; ReadOptions::SeekMode mode; // 用于seek读取 if (options && options->getSeekTo(&seekTimeUs, &mode)) { ALOGV("seekTimeUs:%" PRId64, seekTimeUs); if (mIsHeif) { CHECK(mSampleTable == NULL); CHECK(mItemTable != NULL); int32_t imageIndex; if (!AMediaFormat_getInt32(mFormat, AMEDIAFORMAT_KEY_TRACK_ID, &imageIndex)) { return AMEDIA_ERROR_MALFORMED; } status_t err; if (seekTimeUs >= 0) { err = mItemTable->findImageItem(imageIndex, &mCurrentSampleIndex); } else { err = mItemTable->findThumbnailItem(imageIndex, &mCurrentSampleIndex); } if (err != OK) { return AMEDIA_ERROR_UNKNOWN; } } else { // 解析出seek mode uint32_t findFlags = 0; switch (mode) { case ReadOptions::SEEK_PREVIOUS_SYNC: findFlags = SampleTable::kFlagBefore; break; case ReadOptions::SEEK_NEXT_SYNC: findFlags = SampleTable::kFlagAfter; break; case ReadOptions::SEEK_CLOSEST_SYNC: case ReadOptions::SEEK_CLOSEST: findFlags = SampleTable::kFlagClosest; break; case ReadOptions::SEEK_FRAME_INDEX: findFlags = SampleTable::kFlagFrameIndex; break; default: CHECK(!"Should not be here."); break; } if( mode != ReadOptions::SEEK_FRAME_INDEX) { int64_t elstInitialEmptyEditUs = 0, elstShiftStartUs = 0; if (mElstInitialEmptyEditTicks > 0) { elstInitialEmptyEditUs = ((long double)mElstInitialEmptyEditTicks * 1000000) / mTimescale; /* Sample's composition time from ctts/stts entries are non-negative(>=0). * Hence, lower bound on seekTimeUs is 0. */ seekTimeUs = std::max(seekTimeUs - elstInitialEmptyEditUs, (int64_t)0); } if (mElstShiftStartTicks > 0) { elstShiftStartUs = ((long double)mElstShiftStartTicks * 1000000) / mTimescale; seekTimeUs += elstShiftStartUs; } ALOGV("shifted seekTimeUs:%" PRId64 ", elstInitialEmptyEditUs:%" PRIu64 ", elstShiftStartUs:%" PRIu64, seekTimeUs, elstInitialEmptyEditUs, elstShiftStartUs); } uint32_t sampleIndex; // 调用Sample Table的findSampleAttime方法,根据seek mode来查找到seek sample index status_t err = mSampleTable->findSampleAtTime( seekTimeUs, 1000000, mTimescale, &sampleIndex, findFlags); if (mode == ReadOptions::SEEK_CLOSEST || mode == ReadOptions::SEEK_FRAME_INDEX) { // We found the closest sample already, now we want the sync // sample preceding it (or the sample itself of course), even // if the subsequent sync sample is closer. findFlags = SampleTable::kFlagBefore; } uint32_t syncSampleIndex = sampleIndex; // assume every non-USAC audio sample is a sync sample. This works around // seek issues with files that were incorrectly written with an // empty or single-sample stss block for the audio track if (err == OK && (!mIsAudio || mIsUsac)) { err = mSampleTable->findSyncSampleNear( sampleIndex, &syncSampleIndex, findFlags); } // 获取到sample对应的开始位置以及长度 uint64_t sampleTime; if (err == OK) { err = mSampleTable->getMetaDataForSample( sampleIndex, NULL, NULL, &sampleTime); } if (err != OK) { if (err == ERROR_OUT_OF_RANGE) { // An attempt to seek past the end of the stream would // normally cause this ERROR_OUT_OF_RANGE error. Propagating // this all the way to the MediaPlayer would cause abnormal // termination. Legacy behaviour appears to be to behave as if // we had seeked to the end of stream, ending normally. return AMEDIA_ERROR_END_OF_STREAM; } ALOGV("end of stream"); return AMEDIA_ERROR_UNKNOWN; } if (mode == ReadOptions::SEEK_CLOSEST || mode == ReadOptions::SEEK_FRAME_INDEX) { if (mElstInitialEmptyEditTicks > 0) { sampleTime += mElstInitialEmptyEditTicks; } if (mElstShiftStartTicks > 0){ if (sampleTime > mElstShiftStartTicks) { sampleTime -= mElstShiftStartTicks; } else { sampleTime = 0; } } targetSampleTimeUs = (sampleTime * 1000000ll) / mTimescale; } // 记录下当前读取的sampleIndex mCurrentSampleIndex = syncSampleIndex; } if (mBuffer != NULL) { mBuffer->release(); mBuffer = NULL; } // fall through } off64_t offset = 0; size_t size = 0; int64_t cts; uint64_t stts; bool isSyncSample; bool newBuffer = false; if (mBuffer == NULL) { newBuffer = true; status_t err; if (!mIsHeif) { // 读取出sample对应的offset、size err = mSampleTable->getMetaDataForSample(mCurrentSampleIndex, &offset, &size, (uint64_t*)&cts, &isSyncSample, &stts); if(err == OK) { if (mElstInitialEmptyEditTicks > 0) { cts += mElstInitialEmptyEditTicks; } // 计算DTS if (mElstShiftStartTicks > 0) { // cts can be negative. for example, initial audio samples for gapless playback. cts -= (int64_t)mElstShiftStartTicks; } } } else { err = mItemTable->getImageOffsetAndSize( options && options->getSeekTo(&seekTimeUs, &mode) ? &mCurrentSampleIndex : NULL, &offset, &size); cts = stts = 0; isSyncSample = 0; ALOGV("image offset %lld, size %zu", (long long)offset, size); } if (err != OK) { if (err == ERROR_END_OF_STREAM) { return AMEDIA_ERROR_END_OF_STREAM; } return AMEDIA_ERROR_UNKNOWN; } // 猜测是向内存池申请内存块 err = mBufferGroup->acquire_buffer(&mBuffer); if (err != OK) { CHECK(mBuffer == NULL); return AMEDIA_ERROR_UNKNOWN; } if (size > mBuffer->size()) { ALOGE("buffer too small: %zu > %zu", size, mBuffer->size()); mBuffer->release(); mBuffer = NULL; return AMEDIA_ERROR_UNKNOWN; // ERROR_BUFFER_TOO_SMALL } } // ...... // 读到avc/hevc数据,处理数据并返回给上层 else { // Whole NAL units are returned but each fragment is prefixed by // the start code (0x00 00 00 01). ssize_t num_bytes_read = 0; bool mSrcBufferFitsDataToRead = size <= mSrcBufferSize; if (mSrcBufferFitsDataToRead) { // 将对应sample读到srcBuffer中 num_bytes_read = mDataSource->readAt(offset, mSrcBuffer, size); } else { // We are trying to read a sample larger than the expected max sample size. // Fall through and let the failure be handled by the following if. android_errorWriteLog(0x534e4554, "188893559"); } if (num_bytes_read < (ssize_t)size) { mBuffer->release(); mBuffer = NULL; return mSrcBufferFitsDataToRead ? AMEDIA_ERROR_IO : AMEDIA_ERROR_MALFORMED; } uint8_t *dstData = (uint8_t *)mBuffer->data(); size_t srcOffset = 0; size_t dstOffset = 0; // 这里我觉得是一帧视频会有相当多的NALU构成,扫描每个NALU,检查其有效性并且加上NALU起始标志位 while (srcOffset < size) { bool isMalFormed = !isInRange((size_t)0u, size, srcOffset, mNALLengthSize); size_t nalLength = 0; if (!isMalFormed) { nalLength = parseNALSize(&mSrcBuffer[srcOffset]); srcOffset += mNALLengthSize; isMalFormed = !isInRange((size_t)0u, size, srcOffset, nalLength); } if (isMalFormed) { //if nallength abnormal,ignore it. ALOGW("abnormal nallength, ignore this NAL"); srcOffset = size; break; } if (nalLength == 0) { continue; } if (dstOffset > SIZE_MAX - 4 || dstOffset + 4 > SIZE_MAX - nalLength || dstOffset + 4 + nalLength > mBuffer->size()) { ALOGE("b/27208621 : %zu %zu", dstOffset, mBuffer->size()); android_errorWriteLog(0x534e4554, "27208621"); mBuffer->release(); mBuffer = NULL; return AMEDIA_ERROR_MALFORMED; } // 给HEVC 和 AVC 加上 NALU 的起始标志位 dstData[dstOffset++] = 0; dstData[dstOffset++] = 0; dstData[dstOffset++] = 0; dstData[dstOffset++] = 1; memcpy(&dstData[dstOffset], &mSrcBuffer[srcOffset], nalLength); srcOffset += nalLength; dstOffset += nalLength; } CHECK_EQ(srcOffset, size); CHECK(mBuffer != NULL); mBuffer->set_range(0, dstOffset); // 设定当前读取帧的PTS以及duration AMediaFormat *meta = mBuffer->meta_data(); AMediaFormat_clear(meta); AMediaFormat_setInt64( meta, AMEDIAFORMAT_KEY_TIME_US, ((long double)cts * 1000000) / mTimescale); AMediaFormat_setInt64( meta, AMEDIAFORMAT_KEY_DURATION, ((long double)stts * 1000000) / mTimescale); if (targetSampleTimeUs >= 0) { AMediaFormat_setInt64( meta, AMEDIAFORMAT_KEY_TARGET_TIME, targetSampleTimeUs); } if (mIsAVC) { uint32_t layerId = FindAVCLayerId( (const uint8_t *)mBuffer->data(), mBuffer->range_length()); AMediaFormat_setInt32(meta, AMEDIAFORMAT_KEY_TEMPORAL_LAYER_ID, layerId); } else if (mIsHEVC) { int32_t layerId = parseHEVCLayerId( (const uint8_t *)mBuffer->data(), mBuffer->range_length()); if (layerId >= 0) { AMediaFormat_setInt32(meta, AMEDIAFORMAT_KEY_TEMPORAL_LAYER_ID, layerId); } } if (isSyncSample) { AMediaFormat_setInt32(meta, AMEDIAFORMAT_KEY_IS_SYNC_FRAME, 1); } // 将sampleindex向后移动 ++mCurrentSampleIndex; // 将数据返回给上层 *out = mBuffer; mBuffer = NULL; return AMEDIA_OK; } }
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)