在DirectShow中支持DXVA 2.0(Supporting DXVA 2.0 in DirectShow)
这几天在做dxva2硬件加速,找不到什么资料,翻译了一下微软的两篇相关文档。并准备记录一下用ffmpeg实现dxva2,将在第三篇写到。这是第二篇。,英文原址:https://msdn.microsoft.com/en-us/library/aa965245(v=vs.85).aspx
第一篇翻译的Direct3D device manager,链接:http://www.cnblogs.com/betterwgo/p/6124588.html
本主题描述如何在DirectShow的解码器中支持DirectX Video Acceleration (DXVA) 2.0。具体而言,是描述解码器与视频渲染器之间的联通(communication )。本主题不描述如何实现DXVA解码。
1.准备(Prerequisites)
本主题假定你熟悉如何写DirectShow过滤器。更多信息请参考DirectShow SDK文档的Writing DirectShow Filters主题(https://msdn.microsoft.com/en-us/library/dd391013(v=vs.85).aspx )。代码简例假定解码器继承自CTransformFilter类,定义如下:
class CDecoder : public CTransformFilter { public: static CUnknown* WINAPI CreateInstance(IUnknown *pUnk, HRESULT *pHr); HRESULT CompleteConnect(PIN_DIRECTION direction, IPin *pPin); HRESULT InitAllocator(IMemAllocator **ppAlloc); HRESULT DecideBufferSize(IMemAllocator *pAlloc, ALLOCATOR_PROPERTIES *pProp); // TODO: The implementations of these methods depend on the specific decoder. HRESULT CheckInputType(const CMediaType *mtIn); HRESULT CheckTransform(const CMediaType *mtIn, const CMediaType *mtOut); HRESULT CTransformFilter::GetMediaType(int,CMediaType *); private: CDecoder(HRESULT *pHr); ~CDecoder(); CBasePin * GetPin(int n); HRESULT ConfigureDXVA2(IPin *pPin); HRESULT SetEVRForDXVA2(IPin *pPin); HRESULT FindDecoderConfiguration( /* [in] */ IDirectXVideoDecoderService *pDecoderService, /* [in] */ const GUID& guidDecoder, /* [out] */ DXVA2_ConfigPictureDecode *pSelectedConfig, /* [out] */ BOOL *pbFoundDXVA2Configuration ); private: IDirectXVideoDecoderService *m_pDecoderService; DXVA2_ConfigPictureDecode m_DecoderConfig; GUID m_DecoderGuid; HANDLE m_hDevice; FOURCC m_fccOutputFormat; };
本主题中,解码器是指decoder filter,包括接收压缩视频数据到输出解压缩的视频数据的过程。解码设备指图形驱动所实现的硬件视频加速器。
一个解码器要支持DXVA 2.0必须有以下基本步骤:
(1)确定一个文件类型(个人理解:应该是指根据获取到的原文件类型,找到DXVA2对应的文件类型。比如ffmpeg获取到了文件类型,要知道这个文件类型在DXVA2中对应的是什么文件类型)
(2)找到对应的DXVA解码器配置
(3)告知视频渲染设备解码器用的是DXVA
(4)提供一个客户分配器来分配Direct3D surfaces.
原文:
2.变更提示(Migration Notes)
如果你是从DXVA 1.0变更到DXVA 2.0,你需要注意这两个版本之间的以下一些重大区别:
(1)DXVA 2.0不使用 IAMVideoAccelerator 和 IAMVideoAcceleratorNotify 接口,因为解码器可以通过 IDirectXVideoDecoder 接口直接获得DXVA 2.0 的API
(2)确定文件类型时(原文:During media type negotiation),解码器不用video acceleration GUID做为子类型,子类型直接为和软解一样的解压缩的视频格式(如NV12)
(3)配置加速器的流程变更了。在DXVA 1.0 ,解码器调用带DXVA_ConfigPictureDecode结构的Execute函数来配置加速器。在DXVA 2.0中,解码器用IDirectXVideoDecoderService接口来配置,下一部分将会讲到。
(4)由解码器来分配解压缩数据的缓存,不再由视频渲染器来做这项工作。
(5)不再用IAMVideoAccelerator::DisplayFrame来显示解码帧,与软解一样,解码器调用IMemInputPin::Receive函数把解码帧数据传给渲染器
(6)解码器不再检查什么时候数据缓存是安全可更新的(原文:The decoder is no longer responsible for checking when data buffers are safe for updates)。因此DXVA 2.0没有任何方法(或函数,原文:method)是与IAMVideoAccelerator::QueryRenderStatus等效的。
(7)子像素混合(原文:Subpicture blending)由视频渲染器调用DXVA2.0视频处理API来做。提供子像素的解码器(如DVD解码器)应当把子像素数据发送到一个独立的输出Pin。(原文:Subpicture blending is done by the video renderer, using the DXVA2.0 video processor APIs. Decoders that provide subpictures (for example, DVD decoders) should send subpicture data on a separate output pin.)
对于解码操作,DXVA 2.0与DXVA 1.0用的相同的数据结构(原文:data structures)。(个人理解:这里的数据结构应该是指存储数据的结构体)
EVR过滤器支持DXVA 2.0。视频混合器(原文:Video Mixing Renderer filters)(VMR-7 和 VMR-9)仅支持DXVA 1.0。
3.查找解码器配置(Finding a Decoder Configuration)
解码器确定了输出媒体类型后,必须给DXVA解码器设备找到一个兼容的配置。你可以在输出Pin的CBaseOutputPin::CompleteConnect方法中完成这个步骤。这一步确保图形驱动器在解码器用DXVA之前支持解码器所需要的能力(原文:This step ensures that the graphics driver supports the capabilities needed by the decoder, before the decoder commits to using DXVA.)。
以下是为解码器设备查找配置:
1)为IMFGetService接口查询渲染器输入Pin
2)调用IMFGetService::GetService以获取IDirect3DDeviceManager9接口的指针。这项服务的GUID是MR_VIDEO_ACCELERATION_SERVICE。
3)调用IDirect3DDeviceManager9::OpenDeviceHandle以获取渲染器的Direct3D 设备的句柄。
4)调用IDirect3DDeviceManager9::GetVideoService并传入设备句柄。这个方法返回一个指向IDirectXVideoDecoderService接口的指针。
5)调用IDirectXVideoDecoderService::GetDecoderDeviceGuids。这个方法返回一个解码设备GUID的数组。
6)循环查找解码器GUID数组找到解码器支持的GUID。如,一个MPEG-2解码器,你可以查找DXVA2_ModeMPEG2_MOCOMP, DXVA2_ModeMPEG2_IDCT, 或者 DXVA2_ModeMPEG2_VLD。
7)当你找到一个可能的解码设备GUID,把GUID传给IDirectXVideoDecoderService::GetDecoderRenderTargets方法。这个方法返回一个渲染器目标格式数组,指定为D3DFORMAT 格式(原文:This method returns an array of render target formats, specified as D3DFORMAT values.)。
8)循环查找到匹配你的输出格式的渲染器目标格式。特别地,一个解码器只支持一个渲染目标格式。解码器将用这个子类型与渲染器连接。In the first call to CompleteConnect(不懂,不知道怎么翻译,大概CompleteConnect是个什么函数),解码器可以决定渲染目标格式,然后返回这个格式作为一个首选的输出类型。
9)调用IDirectXVideoDecoderService::GetDecoderConfigurations。传入相同的解码设备GUID,以及描述预期格式的DXVA2_VideoDesc结构。这个方法返回一个DXVA2_ConfigPictureDecode结构的数组。每个结构描述一个可能的解码器设备配置。
10)假定以上步骤都成功了,保存Direct3D 设备句柄、解码器设备GUID和所配置的结构(原文:and the configuration structure)。过滤器将用这个信息去创建解码器设备。
以下代码展示如何查找一个解码器设备:
HRESULT CDecoder::ConfigureDXVA2(IPin *pPin) { UINT cDecoderGuids = 0; BOOL bFoundDXVA2Configuration = FALSE; GUID guidDecoder = GUID_NULL; DXVA2_ConfigPictureDecode config; ZeroMemory(&config, sizeof(config)); // Variables that follow must be cleaned up at the end. IMFGetService *pGetService = NULL; IDirect3DDeviceManager9 *pDeviceManager = NULL; IDirectXVideoDecoderService *pDecoderService = NULL; GUID *pDecoderGuids = NULL; // size = cDecoderGuids HANDLE hDevice = INVALID_HANDLE_VALUE; // Query the pin for IMFGetService. HRESULT hr = pPin->QueryInterface(IID_PPV_ARGS(&pGetService)); // Get the Direct3D device manager. if (SUCCEEDED(hr)) { hr = pGetService->GetService( MR_VIDEO_ACCELERATION_SERVICE, IID_PPV_ARGS(&pDeviceManager) ); } // Open a new device handle. if (SUCCEEDED(hr)) { hr = pDeviceManager->OpenDeviceHandle(&hDevice); } // Get the video decoder service. if (SUCCEEDED(hr)) { hr = pDeviceManager->GetVideoService( hDevice, IID_PPV_ARGS(&pDecoderService)); } // Get the decoder GUIDs. if (SUCCEEDED(hr)) { hr = pDecoderService->GetDecoderDeviceGuids( &cDecoderGuids, &pDecoderGuids); } if (SUCCEEDED(hr)) { // Look for the decoder GUIDs we want. for (UINT iGuid = 0; iGuid < cDecoderGuids; iGuid++) { // Do we support this mode? if (!IsSupportedDecoderMode(pDecoderGuids[iGuid])) { continue; } // Find a configuration that we support. hr = FindDecoderConfiguration(pDecoderService, pDecoderGuids[iGuid], &config, &bFoundDXVA2Configuration); if (FAILED(hr)) { break; } if (bFoundDXVA2Configuration) { // Found a good configuration. Save the GUID and exit the loop. guidDecoder = pDecoderGuids[iGuid]; break; } } } if (!bFoundDXVA2Configuration) { hr = E_FAIL; // Unable to find a configuration. } if (SUCCEEDED(hr)) { // Store the things we will need later. SafeRelease(&m_pDecoderService); m_pDecoderService = pDecoderService; m_pDecoderService->AddRef(); m_DecoderConfig = config; m_DecoderGuid = guidDecoder; m_hDevice = hDevice; } if (FAILED(hr)) { if (hDevice != INVALID_HANDLE_VALUE) { pDeviceManager->CloseDeviceHandle(hDevice); } } SafeRelease(&pGetService); SafeRelease(&pDeviceManager); SafeRelease(&pDecoderService); return hr; } HRESULT CDecoder::FindDecoderConfiguration( /* [in] */ IDirectXVideoDecoderService *pDecoderService, /* [in] */ const GUID& guidDecoder, /* [out] */ DXVA2_ConfigPictureDecode *pSelectedConfig, /* [out] */ BOOL *pbFoundDXVA2Configuration ) { HRESULT hr = S_OK; UINT cFormats = 0; UINT cConfigurations = 0; D3DFORMAT *pFormats = NULL; // size = cFormats DXVA2_ConfigPictureDecode *pConfig = NULL; // size = cConfigurations // Find the valid render target formats for this decoder GUID. hr = pDecoderService->GetDecoderRenderTargets( guidDecoder, &cFormats, &pFormats ); if (SUCCEEDED(hr)) { // Look for a format that matches our output format. for (UINT iFormat = 0; iFormat < cFormats; iFormat++) { if (pFormats[iFormat] != (D3DFORMAT)m_fccOutputFormat) { continue; } // Fill in the video description. Set the width, height, format, // and frame rate. DXVA2_VideoDesc videoDesc = {0}; FillInVideoDescription(&videoDesc); // Private helper function. videoDesc.Format = pFormats[iFormat]; // Get the available configurations. hr = pDecoderService->GetDecoderConfigurations( guidDecoder, &videoDesc, NULL, // Reserved. &cConfigurations, &pConfig ); if (FAILED(hr)) { break; } // Find a supported configuration. for (UINT iConfig = 0; iConfig < cConfigurations; iConfig++) { if (IsSupportedDecoderConfig(pConfig[iConfig])) { // This configuration is good. *pbFoundDXVA2Configuration = TRUE; *pSelectedConfig = pConfig[iConfig]; break; } } CoTaskMemFree(pConfig); break; } // End of formats loop. } CoTaskMemFree(pFormats); // Note: It is possible to return S_OK without finding a configuration. return hr; }
由于这是个通用的例子,所以有些逻辑就放置在了辅助函数里面,需要由解码器来实现。以下是所用到的辅助函数:
// Returns TRUE if the decoder supports a given decoding mode. BOOL IsSupportedDecoderMode(const GUID& mode); // Returns TRUE if the decoder supports a given decoding configuration. BOOL IsSupportedDecoderConfig(const DXVA2_ConfigPictureDecode& config); // Fills in a DXVA2_VideoDesc structure based on the input format. void FillInVideoDescription(DXVA2_VideoDesc *pDesc);
4.通知视频渲染器(Notifying the Video Renderer)
如果解码器找到了解码配置,下一步就是通知视频渲染器将要使用硬件加速来解码。你可以在CompleteConnect方法中完成这个步骤。这一步必须在选择分配器之前做,因为它会影响分配器如何选择。
1)为IMFGetService接口查询渲染器的输入Pin(原文:Query the renderer's input pin for the IMFGetService interface.)
2)调用IMFGetService::GetService获取指向IDirectXVideoMemoryConfiguration接口的指针。该服务的GUID是MR_VIDEO_ACCELERATION_SERVICE。
3)循环调用IDirectXVideoMemoryConfiguration::GetAvailableSurfaceTypeByIndex,从0增长dwTypeIndex 变量。当该方法在pdwType 参数返回DXVA2_SurfaceType_DecoderRenderTarget 时停止循环。这一步确保视频渲染器支持硬件加速转码。对于EVR过滤器而言这一步总是成功的。
4)如果上一步成功,用DXVA2_SurfaceType_DecoderRenderTarget参数调用IDirectXVideoMemoryConfiguration::SetSurfaceType。用这个参数调用SetSurfaceType将视频渲染器置于DXVA模式。当视频渲染器处于这种模式时,解码器必须提供它自己的分配器。
以下代码展示如何通知视频渲染器:
HRESULT CDecoder::SetEVRForDXVA2(IPin *pPin) { HRESULT hr = S_OK; IMFGetService *pGetService = NULL; IDirectXVideoMemoryConfiguration *pVideoConfig = NULL; // Query the pin for IMFGetService. hr = pPin->QueryInterface(__uuidof(IMFGetService), (void**)&pGetService); // Get the IDirectXVideoMemoryConfiguration interface. if (SUCCEEDED(hr)) { hr = pGetService->GetService( MR_VIDEO_ACCELERATION_SERVICE, IID_PPV_ARGS(&pVideoConfig)); } // Notify the EVR. if (SUCCEEDED(hr)) { DXVA2_SurfaceType surfaceType; for (DWORD iTypeIndex = 0; ; iTypeIndex++) { hr = pVideoConfig->GetAvailableSurfaceTypeByIndex(iTypeIndex, &surfaceType); if (FAILED(hr)) { break; } if (surfaceType == DXVA2_SurfaceType_DecoderRenderTarget) { hr = pVideoConfig->SetSurfaceType(DXVA2_SurfaceType_DecoderRenderTarget); break; } } } SafeRelease(&pGetService); SafeRelease(&pVideoConfig); return hr; }
如果解码器找到了有效的配置并成功通知了视频渲染器,解码器就可以用DXVA来解码了。解码器必须给输出Pin实现客户分配器(原为:a custom allocator),如下面一部分描述的。
5.分配解码数据缓存(Allocating Uncompressed Buffers)
在DXVA 2.0中,解码器负责分配作为解压缩视频数据缓存的Direct3D surfaces。因此,解码器必须实现一个创建surfaces的custom allocator(不知道怎么翻译,不翻译了,意思大概是由用户来实现的分配器)。这个分配器提供的media samples会有一个指向Direct3D surfaces的指针。EVR通过调用这个media sample的IMFGetService::GetService取回这个指向surface的指针。这个服务的标识符是MR_BUFFER_SERVICE。
要实现custom allocator,需执行以下步骤:
1)给media samples定义一个类。这个类继承自CMediaSample。在这个类中,做以下:
a)保存一个指向the Direct3D surface的指针;b)实现IMFGetService接口。在GetService方法中,如果service GUID i是MR_BUFFER_SERVICE,query the Direct3D surface for the requested interface。否则,GetService 会返回MF_E_UNSUPPORTED_SERVICE。c)重写CMediaSample::GetPointer 方法来返回 E_NOTIMPL.
2)给the allocator定义一个类。the allocator可以继承自CBaseAllocator类。在这个类中,做以下:
a)重写CBaseAllocator::Alloc方法。在这个方法中,调用IDirectXVideoAccelerationService::CreateSurface创建surface。( IDirectXVideoDecoderService 接口从IDirectXVideoAccelerationService继承这个方法)。b)重写CBaseAllocator::Free方法释放surface。
3)在你的过滤器的输出Pin中,重写CBaseOutputPin::InitAllocator方法。在这个方法中,创建一个你实现的custom allocator的实例。
4)在你的filter中,实现CTransformFilter::DecideBufferSize方法。pProperties 参数表明EVR所需的surface的数量。把这个值增加的解码器所需的大小,并在allocator中调用IMemAllocator::SetProperties。
以下代码展示如何实现media sample类:
class CDecoderSample : public CMediaSample, public IMFGetService { friend class CDecoderAllocator; public: CDecoderSample(CDecoderAllocator *pAlloc, HRESULT *phr) : CMediaSample(NAME("DecoderSample"), (CBaseAllocator*)pAlloc, phr, NULL, 0), m_pSurface(NULL), m_dwSurfaceId(0) { } // Note: CMediaSample does not derive from CUnknown, so we cannot use the // DECLARE_IUNKNOWN macro that is used by most of the filter classes. STDMETHODIMP QueryInterface(REFIID riid, void **ppv) { CheckPointer(ppv, E_POINTER); if (riid == IID_IMFGetService) { *ppv = static_cast<IMFGetService*>(this); AddRef(); return S_OK; } else { return CMediaSample::QueryInterface(riid, ppv); } } STDMETHODIMP_(ULONG) AddRef() { return CMediaSample::AddRef(); } STDMETHODIMP_(ULONG) Release() { // Return a temporary variable for thread safety. ULONG cRef = CMediaSample::Release(); return cRef; } // IMFGetService::GetService STDMETHODIMP GetService(REFGUID guidService, REFIID riid, LPVOID *ppv) { if (guidService != MR_BUFFER_SERVICE) { return MF_E_UNSUPPORTED_SERVICE; } else if (m_pSurface == NULL) { return E_NOINTERFACE; } else { return m_pSurface->QueryInterface(riid, ppv); } } // Override GetPointer because this class does not manage a system memory buffer. // The EVR uses the MR_BUFFER_SERVICE service to get the Direct3D surface. STDMETHODIMP GetPointer(BYTE ** ppBuffer) { return E_NOTIMPL; } private: // Sets the pointer to the Direct3D surface. void SetSurface(DWORD surfaceId, IDirect3DSurface9 *pSurf) { SafeRelease(&m_pSurface); m_pSurface = pSurf; if (m_pSurface) { m_pSurface->AddRef(); } m_dwSurfaceId = surfaceId; } IDirect3DSurface9 *m_pSurface; DWORD m_dwSurfaceId; };
以下代码展示如何在allocator中实现Alloc方法
HRESULT CDecoderAllocator::Alloc() { CAutoLock lock(this); HRESULT hr = S_OK; if (m_pDXVA2Service == NULL) { return E_UNEXPECTED; } hr = CBaseAllocator::Alloc(); // If the requirements have not changed, do not reallocate. if (hr == S_FALSE) { return S_OK; } if (SUCCEEDED(hr)) { // Free the old resources. Free(); // Allocate a new array of pointers. m_ppRTSurfaceArray = new (std::nothrow) IDirect3DSurface9*[m_lCount]; if (m_ppRTSurfaceArray == NULL) { hr = E_OUTOFMEMORY; } else { ZeroMemory(m_ppRTSurfaceArray, sizeof(IDirect3DSurface9*) * m_lCount); } } // Allocate the surfaces. if (SUCCEEDED(hr)) { hr = m_pDXVA2Service->CreateSurface( m_dwWidth, m_dwHeight, m_lCount - 1, (D3DFORMAT)m_dwFormat, D3DPOOL_DEFAULT, 0, DXVA2_VideoDecoderRenderTarget, m_ppRTSurfaceArray, NULL ); } if (SUCCEEDED(hr)) { for (m_lAllocated = 0; m_lAllocated < m_lCount; m_lAllocated++) { CDecoderSample *pSample = new (std::nothrow) CDecoderSample(this, &hr); if (pSample == NULL) { hr = E_OUTOFMEMORY; break; } if (FAILED(hr)) { break; } // Assign the Direct3D surface pointer and the index. pSample->SetSurface(m_lAllocated, m_ppRTSurfaceArray[m_lAllocated]); // Add to the sample list. m_lFree.Add(pSample); } } if (SUCCEEDED(hr)) { m_bChanged = FALSE; } return hr; }
以下代码是Free方法:
void CDecoderAllocator::Free() { CMediaSample *pSample = NULL; do { pSample = m_lFree.RemoveHead(); if (pSample) { delete pSample; } } while (pSample); if (m_ppRTSurfaceArray) { for (long i = 0; i < m_lAllocated; i++) { SafeRelease(&m_ppRTSurfaceArray[i]); } delete [] m_ppRTSurfaceArray; } m_lAllocated = 0; }
6.解码(Decoding)
调用IDirectXVideoDecoderService::CreateVideoDecoder方法创建解码器设备,该方法返回一个指向解码器设备IDirectXVideoDecoder接口的指针。
对每一帧,调用IDirect3DDeviceManager9::TestDevice来测试设备句柄。如果设备改变了,方法将返回DXVA2_E_NEW_VIDEO_DEVICE。如果这种情况发生,做以下:
1)调用IDirect3DDeviceManager9::CloseDeviceHandle关闭设备句柄
2)释放IDirectXVideoDecoderService 和IDirectXVideoDecoder 指针
3)打开一个新的设备句柄
4)确定一个新的解码器配置,如3所述。
5)创建一个新的解码器设备。
假定设备句柄有效,解码进程以如下步骤工作:
1)调用IDirectXVideoDecoder::BeginFrame
2)做以下,一次或多次:
a)调用IDirectXVideoDecoder::GetBuffer获取一个DXVA解码器缓存
b)填充缓存
c)调用IDirectXVideoDecoder::ReleaseBuffer
3)调用IDirectXVideoDecoder::Execute对该帧执行解码操作
DXVA 2.0解码操作所用数据结构与DXVA 1.0相同。
在每一对BeginFrame/Execute的调用之间,你可能要多次调用GetBuffer,但每种DXVA缓存类型只能一次。如果你对同一种缓存类型调用两次,数据将会覆盖。
调用Execute之后,调用IMemInputPin::Receive把该帧传给视频渲染器,这与软解一样。Receive方法是异步的,它返回之后,解码器可以继续解码下一帧。显示驱动器(display driver)阻止任何解码命令在缓存使用期间覆写缓存。解码器不应该在渲染器释放sample之前重用surface来解码另一帧数据。当渲染器释放sample之后,分配器把sample放回可用sample池中。要获取下一个可用sample,调用CBaseOutputPin::GetDeliveryBuffer,它转而调用IMemAllocator::GetBuffer(原文:which in turn calls IMemAllocator::GetBuffer)。