CefSharp请求资源拦截及自定义处理

前言

在CefSharp中，我们不仅可以使用Chromium浏览器内核，还可以通过Cef暴露出来的各种Handler来实现我们自己的资源请求处理。

什么是资源请求呢？简单来说，就是前端页面在加载的过程中，请求的各种文本（js、css以及html）。在以Chromium内核的浏览器上，我们可以使用浏览器为我们提供的开发者工具来检查每一次页面加载发生的请求。

准备

鉴于本文的重心是了解CefSharp的资源拦截处理，所以我们不讨论前端的开发以及客户端嵌入CefSharp组件的细节。我们首先完成一个基本的嵌入CefSharp的WinForm程序：该程序界面如下，拥有一个地址输入栏和一个显示网页的Panel：

并且编写一个极其简单的页面，该页面会请求1个js资源和1个css资源：

demo:
    - index.html
    - test1.js
    - test1.css

这几个文件的代码都十分简单：

body
{
    background-color: aqua
}

function myFunc() {
    return 'test1 js file';
}

<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta charset="utf-8" />
    <title>Home</title>
    <!-- 如下记载js、css资源 -->
    <script type="text/javascript" src="test1.js"></script>
    <link type="text/css" rel="stylesheet" href="test1.css"/>
</head>
<body>
<h1>Resource Intercept Example</h1>
<h2 id="result"></h2>
<script>
    // 调用test1.js中的myFunc
    document.getElementById('result').innerHTML = myFunc();
</script>
</body>
</html>

代码很简单，效果也很容易知道，页面加载后，页面背景色为aqua，页面上会显示文本“test1 js file”。同时，当我们使用开发工具，刷新页面，能够看到对应的资源加载：

CefSharp资源拦截及自定义处理

完成上述准备后，我们进入正文：资源拦截及自定义处理。首先我们需要对目标的理解达成一致，资源拦截是指我们能够检测到上图中的html、js还有css的资源请求事件，在接下来的Example中，因为我们是使用的客户端程序，所以会在请求的过程中弹出提示；自定义处理是指，在完成拦截提示后，我们还能够替换这些资源，这里我们设定完成拦截后，可以把js和css换为我们想要另外的文件：test2.js和test2.css：

function myFunc() {
    return 'test2 js file';
}

body
{
    background-color: beige
}

即我们希望拦截并替换后，页面上的文字不再是之前的，而是“test2 js file”，页面的背景色是beige。

IRequestHandler

在CefSharp中，要想对请求进行拦截处理，最为核心的Handler就是IRequestHandler这个接口，查看官方的源码，会发现里面有数个方法的定义，通过阅读官方的summary，我们可以聚焦到如下的两个定义（注释本人进行了删减）：

/// <summary>
/// Called before browser navigation.
/// 译：在浏览器导航前调用
/// If the navigation is allowed <see cref="E:CefSharp.IWebBrowser.FrameLoadStart" /> and <see cref="E:CefSharp.IWebBrowser.FrameLoadEnd" />
/// will be called. If the navigation is canceled <see cref="E:CefSharp.IWebBrowser.LoadError" /> will be called with an ErrorCode
/// value of <see cref="F:CefSharp.CefErrorCode.Aborted" />.
/// </summary>
bool OnBeforeBrowse(
  IWebBrowser chromiumWebBrowser,
  IBrowser browser,
  IFrame frame,
  IRequest request,
  bool userGesture,
  bool isRedirect);

/// <summary>
/// Called on the CEF IO thread before a resource request is initiated.
/// 在一个资源请求初始化前在CEF IO线程上调用
/// </summary>
IResourceRequestHandler GetResourceRequestHandler(
  IWebBrowser chromiumWebBrowser,
  IBrowser browser,
  IFrame frame,
  IRequest request,
  bool isNavigation,
  bool isDownload,
  string requestInitiator,
  ref bool disableDefaultHandling);

于是，我们继承一个默认的名为RequestHandler的类（请区分DefaultRequestHandler），只重写上述的两个方法。

  public class MyRequestHandler : RequestHandler
 {
     protected override bool OnBeforeBrowse(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, bool userGesture,
         bool isRedirect)
     {
         // 先调用基类的实现，断点调试
         return base.OnBeforeBrowse(chromiumWebBrowser, browser, frame, request, userGesture, isRedirect);
     }

     protected override IResourceRequestHandler GetResourceRequestHandler(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame,
         IRequest request, bool isNavigation, bool isDownload, string requestInitiator, ref bool disableDefaultHandling)
     {
         // 先调用基类的实现，断点调试
         return base.GetResourceRequestHandler(
             chromiumWebBrowser, browser, frame, request, isNavigation, 
             isDownload, requestInitiator, ref disableDefaultHandling);
     }
 }

然后完成对该Handler的注册：

  this._webBrowser = new ChromiumWebBrowser(string.Empty)
 {
     RequestHandler = new MyRequestHandler()
 };

打上断点，开始访问我们的Example：index.html。这里会发现，OnBeforeBrowse调用了一次，而GetResourceRequestHandler会调用3次。检查OnBeforeBrowse中的request参数内容，是一次主页的请求，而GetResourceRequestHandler中的3次分别是：主页html资源、test1.js和test1.css。

结合官方注释和调试的结果，我们可以得出结论：要进行导航的拦截，我们可以重写OnBeforeBrowse方法，要想进行资源的拦截，我们需要实现自己的ResourceRequestHandler。

IResourceRequestHandler

查看IResourceRequestHandler的定义，我们再次聚焦一个函数定义：

/// <summary>
/// Called on the CEF IO thread before a resource is loaded. To specify a handler for the resource return a <see cref="T:CefSharp.IResourceHandler" /> object
/// </summary>
/// <returns>To allow the resource to load using the default network loader return null otherwise return an instance of <see cref="T:CefSharp.IResourceHandler" /> with a valid stream</returns>
IResourceHandler GetResourceHandler(
  IWebBrowser chromiumWebBrowser,
  IBrowser browser,
  IFrame frame,
  IRequest request);

该定义从注释可以看出，如果实现返回null，那么Cef会使用默认的网络加载器来发起请求，或者我们可以返回一个自定义的资源处理器ResourceHandler来处理一个合法的数据流（Stream）。也就是说，对于资源的处理，要想实现自定义的处理（不是拦截，拦截到目前为止我们可以在上述的两个Handler中进行处理）我们还需要实现一个IResourceHandler接口的实例，并在GetResourceHandler处进行返回，Cef才会在进行处理的时候使用我们的Handler。所以在GetResourceHandler中，我们进行资源的判断，如果是想要替换的资源，我们就使用WinForm提供的OpenFileDialog来选择本地的js或是css文件，并传给我们自定义的ResourceHandler，如果不是想要拦截的资源或是用户未选择任何的文件就走默认的：

public class MyResourceRequestHandler : ResourceRequestHandler
{
    protected override IResourceHandler GetResourceHandler(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request)
    {
        if (request.Url.EndsWith("test1.js") || request.Url.EndsWith("test1.css"))
        {
            MessageBox.Show($@"资源拦截：{request.Url}");

            string type = request.Url.EndsWith(".js") ? "js" : "css"; // 这里简单判断js还是css，不过多编写
            string fileName = null;
            using (OpenFileDialog openFileDialog = new OpenFileDialog())
            {
                openFileDialog.Filter = $@"{type}文件|*.{type}"; // 过滤
                openFileDialog.Multiselect = true;
                if (openFileDialog.ShowDialog() == DialogResult.OK)
                {
                    fileName = openFileDialog.FileName;
                }
            }

            if (string.IsNullOrWhiteSpace(fileName))
            {
                // 没有选择文件，还是走默认的Handler
                return base.GetResourceHandler(chromiumWebBrowser, browser, frame, request);
            }
            // 否则使用选择的资源返回
            return new MyResourceHandler(fileName);
        }

        return base.GetResourceHandler(chromiumWebBrowser, browser, frame, request);
    }
}

IResourceHandler

根据上文，我们进一步探究IResourceHandler，对该Handler，官方有一个默认的实现：RequestHandler，该Handler通过阅读源码可以知道是网络加载的Handler，这里为了实现我们自定义拦截策略，我们最好单独实现自己的IResourceHandler。对于该接口，有如下的注释：

/// <summary>
/// Class used to implement a custom resource handler. The methods of this class will always be called on the CEF IO thread.
/// Blocking the CEF IO thread will adversely affect browser performance. We suggest you execute your code in a Task (or similar).
/// To implement async handling, spawn a new Task (or similar), keep a reference to the callback. When you have a
/// fully populated stream, execute the callback. Once the callback Executes, GetResponseHeaders will be called where you
/// can modify the response including headers, or even redirect to a new Url. Set your responseLength and headers
/// Populate the dataOut stream in ReadResponse. For those looking for a sample implementation or upgrading from
/// a previous version <see cref="T:CefSharp.ResourceHandler" />. For those upgrading, inherit from ResourceHandler instead of IResourceHandler
/// add the override keywoard to existing methods e.g. ProcessRequestAsync.
/// </summary>
public interface IResourceHandler : IDisposable
{ ... }

该类的注释意思大致为：我们可以通过实现该接口来实现自定义资源的处理类。该类中的方法总是在CEF的IO线程中调用。然而，阻塞CEF IO线程将会不利于浏览器的性能。所以官方建议开发者通过把自己的处理代码放在Task（或是类似的异步编程框架）中异步执行，然后在完成或取消（失败）时，在异步中调用callback对应的操作函数（continue、cancel等方法）。当你拥有一个完全填充（fully populated）好了的Stream的时候，再执行callback（这一步对应Open方法）。一旦callback执行了，GetResponseHeaders这个方法将会调用，于是你可以在这个方法里面对Reponse的内容包括headers进行修改，或者甚至是重定向到一个新的Url。设置你自己的reponseLength和headers。接下来，通过在ReadResponse（实际上即将作废，而是Read）函数中，实现并填充dataOut这个Stream。最终CEF会对该Stream进行读取数据，获得资源数据。

事实上，该Handler的实现可以有很多花样，这里我们实现一个最简单的。

Dispose

对于通常进行资源释放的Dispose，因为我们这里只是一个Demo，所以暂时留空。

Open（ProcessRequest）

官方注释指出，ProcessRequest将会在不久的将来弃用，改为Open。所以ProcessRequest我们直接返回true。对于Open方法，其注释告诉我们：

要想要立刻进行资源处理（同步），请设置handleRequest参数为true，并返回true
决定稍后再进行资源的处理（异步），设置handleRequest为false，并调用callback对应的continue和cancel方法来让请求处理继续还是取消，并且当前Open返回false。
要立刻取消资源的处理，设置handleRequest为true，并返回false。

也就是说，handleRequest的true或false决定是同步还是异步处理。若同步，则Cef会立刻通过Open的返回值true或false来决定后续继续进行还是取消。若为异步，则Cef会通过异步的方式来检查callback的调用情况（这里的callback实际上是要我们创建Task回调触发的）。这里我们选择同步的方式（选择异步也没有问题）编写如下的代码：

public bool Open(IRequest request, out bool handleRequest, ICallback callback)
{
    handleRequest = true;
    return true;
}

GetResponseHeaders

在上小节中我们已经完成了对资源数据的入口（Open）的分析。既然我们已经告诉了Cef我们准备开始进行资源请求的处理了，那么接下来我们显然需要着手进行资源的处理。根据前面的概要注释，我们需要实现GetResponseHeaders方法，因为这是资源处理的第二步。该方法的注释如下：

/// <summary>
/// Retrieve response header information. If the response length is not known
/// set <paramref name="responseLength" /> to -1 and ReadResponse() will be called until it
/// returns false. If the response length is known set <paramref name="responseLength" />
/// to a positive value and ReadResponse() will be called until it returns
/// false or the specified number of bytes have been read.
/// 
/// It is also possible to set <paramref name="response" /> to a redirect http status code
/// and pass the new URL via a Location header. Likewise with <paramref name="redirectUrl" /> it
/// is valid to set a relative or fully qualified URL as the Location header
/// value. If an error occured while setting up the request you can call
/// <see cref="P:CefSharp.IResponse.ErrorCode" /> on <paramref name="response" /> to indicate the error condition.
/// </summary>
void GetResponseHeaders(IResponse response, out long responseLength, out string redirectUrl);

Summary翻译解释如下：获取响应头信息。如果响应的数据长度未知，则设置responseLength为-1，然后CEF会一直调用ReadResponse（即将废除，实际上是Read方法）直到该Read方法返回false。如果响应数据的长度是已知的，可以直接设置responseLength长度为一个正数，然后ReadResponse（Read）将会一直调用，直到该Read方法返回false或者在已经读取的数据的字节长度达到了设置的responseLength的值。当然你也可以通过设置response.StatusCode值为重定向的值（30x）以及redirectUrl为对应的重定向Url来实现资源重定向。

在本文中，我们采取简单的方式：直接返回资源的长度，然后交给下一步的Read方法来进行真正的资源处理。在该步骤中，我们编写获取本地文件字节数据来实现js和css文件的本地加载，并且将该数据保存在该ResourceHanlder实例私有变量中。

public void GetResponseHeaders(IResponse response, out long responseLength, out string redirectUrl)
{
    using (FileStream fileStream = new FileStream(this._localResourceFileName, FileMode.Open, FileAccess.Read))
    {
        using (BinaryReader binaryReader = new BinaryReader(fileStream))
        {
            long length = fileStream.Length;
            this._localResourceData = new byte[length];
            // 读取文件中的内容并保存到私有变量字节数组中
            binaryReader.Read(this._localResourceData, 0, this._localResourceData.Length);
        }
    }

    responseLength = this._localResourceData.LongLength;
    redirectUrl = null;
}

Read

该方法的定义和注释如下：

/// <summary>
/// Read response data. If data is available immediately copy up to
/// dataOut.Length bytes into dataOut, set bytesRead to the number of
/// bytes copied, and return true. To read the data at a later time keep a
/// pointer to dataOut, set bytesRead to 0, return true and execute
/// callback when the data is available (dataOut will remain valid until
/// the callback is executed). To indicate response completion set bytesRead
/// to 0 and return false. To indicate failure set bytesRead to &lt; 0 (e.g. -2
/// for ERR_FAILED) and return false. This method will be called in sequence
/// but not from a dedicated thread.
/// 
/// For backwards compatibility set bytesRead to -1 and return false and the ReadResponse method will be called.
/// </summary>
bool Read(Stream dataOut, out int bytesRead, IResourceReadCallback callback);

Summary的翻译大致为：读取响应数据。如果数据是可以立即获得的，那么可以直接将dataOut.Length长度的字节数据拷贝到dataOut这个流中，然后设置bytesRead的值为拷贝的数据字节长度值，最后再返回true。如果开发者希望继续持有dataOut的引用（注释是pointer指针，但是个人觉得这里写为指向该dataOut的引用更好）然后在稍后填充该数据流，那么可以设置bytesRead为0，通过异步方式在数据准备好的时候执行callback的操作函数，然后立刻返回true。（dataOut这个流会一直保持不被释放直到callback被调用为止）。为了让CEF知道当前的响应数据已经填充完毕，需要设置bytesRead为0然后返回false。要想让CEF知道响应失败，需要设置bytesRead为一个小于零的数（例如ERR_FAILED: -2），然后返回false。这个方法将会依次调用但不是在一个专有线程。

根据上述的注释，总结如下：

bytesRead > 0，return true：填充了数据，但Read还会被调用
bytesRead = 0，return false：数据填充完毕，当前为最后一次调用
bytesRead < 0，return false：出错，当前为最后一次调用
bytesRead = 0，return true：CEF不会释放dataOut流，在异步调用中准备好数据后调用callback

针对本例，我们增加一个该类的私有变量_dataReadCount用于标识已读的资源数据字节量并在构造函数中初始化为0。

每次在Read中进行读取的时候，首先检查剩余待读取字节数this._localResourceData.LongLength - this._dataReadCount，如果该值为零，则表明已经将所有的数据通过dataOut拷贝给了外围，此时设置bytesRead为0，直接返回false；若剩余值大于0，则需要继续进行拷贝操作，但需要注意的是dataOut并不是一个无限大的流，而是一个类似于缓存的流，它的Length值为2^16 = 65536，所以我们需要设置bytesRead来让外围知道我们实际在这个流中放了多少字节的数据。同时在使用Stream.WriteAPI的时候，需要设置正确的offset和count。

最终，Read的实现如下：

public bool Read(Stream dataOut, out int bytesRead, IResourceReadCallback callback)
{
    int leftToRead = this._localResourceData.Length - this._dataReadCount;
    if (leftToRead == 0)
    {
        bytesRead = 0;
        return false;
    }

    int needRead = Math.Min((int)dataOut.Length, leftToRead); // 最大为dataOut.Lenght
    dataOut.Write(this._localResourceData, this._dataReadCount, needRead);
    this._dataReadCount += needRead;
    bytesRead = needRead;
    return true;
}