启动速度是很重要的一个点,如何加快呢?有个简单的原理:局部性原理。在计算速度越来越快的今天,性能的瓶颈很多时候是在I/O上(SSD硬盘的机器明显比机械硬盘快很多),如果能减少程序运行过程中读取磁盘的次数,那就能有效提高速度。减少程序运行过程中读取磁盘次数,就是减少缺页(Page fault)错误,让运行过程中的多数数据提前加载到物理内存,所以有个词,叫做“预读”。

一、系统对启动加速的支持

1、Prefetch支持

  每当应用程序启动发生硬缺页中断时,操作系统会记录应用程序访问的文件及其位置,这些信息会被记录在\\Windows\Prefetch下,譬如,我机器上很容易就找到了“CHROME.EXE-D999B1BA.pf”文件。当下次启动程序时,系统会先把程序启动所需要的数据调入内存,因为预先知道了未来需要读取的所有磁盘位置,所以可以做到尽量的顺序读取,减少I/O次数和节省读取数据时的寻道时间。

2、空间局部性原理的支持

  操作系统的I/O最小单位是一页4KB,缺页中断引发磁盘I/O时,一般不会只取所需要的那一页数据,而是把相邻的数据都读取到物理内存,如果相邻的数据正好是马上需要访问的数据,那就可以减少触发I/O。

3、缓存的支持

  系统的虚拟内存管理机制、缓存管理机制对这方面提供了支持,程序关闭之后,一般不会立即把该程序的代码数据占用的物理内存全部释放,还会留着一段时间,接着第二次启动程序,就不需要再从磁盘读取,I/O少了,速度快了。这个也可以称呼为“时间局部性原理”:用户打开了该程序,关闭掉之后很可能还会打开第二次。  

  系统为程序启动的加速做了如此多的工作,以至于我们能做的已经很少了,很少就意味着还可以做些事情。

二、冷启动加速的方法

  冷启动就是本操作系统启动以来某应用程序的首次启动,相应的热启动是操作系统启动以来非首次启动应用程序。

  从减少I/O耗时的角度来讲,最好是启动的时候所有数据都已经在物理内存里了,不需要再去把磁盘数据调进物理内存,这一点热启动可以做到(但是我们没法确认,因为系统的缓存管理对程序是透明的)。热启动,我们在减少I/O上做不了什么事情,它已经是很好的状态了。能优化的是冷启动,它必然会触发大数据量的I/O,如何才能减少I/O次数,减少I/O耗时呢?分散多次读取磁盘的速度明显不如集中读取,所以要减少I/O耗时就是让随机分散读取变成集中顺序读取。

  其实很简单,在程序正在启动的之前,把用到的动态库当作普通数据读取一遍,这次集中读取之后系统会把磁盘数据映射进物理内存中,并且根据时间局部性原理,这些磁盘到物理内存的映射会保留一段时间,到了程序真正启动过程时,系统就不会随需的读取磁盘,启动速度也就快了。

  chromium就是这么做的,在\src\chrome\app\client_util.cc的LoadChromeWithDirectory()函数,会在加载chrome.dll之前先把该动态库读一遍。预读取的代码在\src\chrome\app\image_pre_reader_win.cc文件中,win7跟xp有别,估计是系统对缓存、局部性的支持在不同系统版本上不一致。chromium的这块代码很不错,我们可以直接拿来用,不必花时间去研究系统的支持。

三、chromium启动加速的效果

  使用Process Monitor查看对chrome.dll使用ReadFile的次数,发现有时候预读并不顶事,在程序运行的过程中还是会触发ReadFile,这估计跟当前系统的可用物理内存有关。测试发现最好的情况是刚开完机就打开chromium浏览器,启动过程对chrome.dll的ReadFile就只有预读的那些。而最坏的情况是系统内存占用很高,系统不能给chromium进程分配足够多的物理内存,可能导致ReadFile完之后,引发Page Fault,把之前预读的数据又替换出物理内存,这样子预读就没效果了。另外,从Process Monitor观察每次ReadFile的duration发现有时时间长有时时间短,一次时间长之后跟着好几次时间短的,可能磁盘也有根据局部性原理做缓存。

  chromium在热启动的时候也会触发预读,这点估计效果有限,可以考虑去掉,说不定可以加快热启动速度。如何判断是冷启动和热启动呢?可以使用ATOM,这个功能只有应用程序才可用,控制台程序不可用,详细参考msdn。例子代码:

bool IsColdStartUp()
{
    static int nRet = -1;
    if (nRet != -1)
    {
        return nRet == 1;
    }
    nRet = 0;
    ATOM atom = ::GlobalFindAtom(L"cswuyg_test_cold_startup");
    if (atom == 0)
    {
        nRet = 1;
        ::GlobalAddAtom(L"cswuyg_test_cold_startup");
    }
    return nRet == 1;
}

四、Process Monitor工具

  通过Process Monitor观察到了ReadFile操作,对其中显示的Fast I/O,Paging I/O,Non-cached I/O很迷茫,搜了些资料。大概是这样子:

1、Paging I/O 就是读取磁盘。
2、non-cached 一般是数据不在缓存中,需要从磁盘读取,或者是故意不使用缓存。
3、如果数据在缓存中,也就是cached,那就可以有Fast I/O

  下边是详细的资料信息:

1、看完这个图基本就知道那是什么意思

来自:http://i-web.i.u-tokyo.ac.jp/edu/training/ss/lecture/new-documents/Lectures/15-CacheManager/CacheManager.pdf

2、术语介绍

Q25 What is the difference between cached I/O, user non-cached I/O, and paging I/O? 


In a file system or file system filter driver, read and write operations fall into several different categories. For the purpose of discussing them, we normally consider the following types: 

- Cached I/O. This includes normal user I/O, both via the Fast I/O path as well as via the IRP_MJ_READ and IRP_MJ_WRITE path. It also includes the MDL operations (where the caller requests the FSD return an MDL pointing to the data in the cache). 

- Non-cached user I/O. This includes all non-cached I/O operations that originate outside the virtual memory system. 

- Paging I/O. These are I/O operations initiated by the virtual memory system in order to satisfy the needs of the demand paging system. 

Cached I/O is any I/O that can be satisfied by the file system data cache. In such a case, the operation is normally to copy the data from the virtual cache buffer into the user buffer. If the virtual cache buffer contents are resident in memory, the copy is fast and the results returned to the application quickly. If the virtual cache buffer contents are not all resident in memory, then the copy process will trigger a page fault, which generates a second re-entrant I/O operation via the paging mechanism. 

Non-cached user I/O is I/O that must bypass the cache - even if the data is present in the cache. For read operations, the FSD can retrieve the data directly from the storage device without making any changes to the cache. For write operations, however, an FSD must ensure that the cached data is properly invalidated (if this is even possible, which it will not be if the file is also memory mapped). 

Paging I/O is I/O that must be satisfied from the storage device (whether local to the system or located on some "other" computer system) and it is being requested by the virtual memory system as part of the paging mechanism (and hence has special rules that apply to its behavior as well as its serialization).

来自:http://www.osronline.com/article.cfm?article=17#Q25

3、Fast I/O DISALLOWED 是啥意思

I noticed this "FAST IO DISALLOWED" againest createfile  API used in exe. What does this error mean for.?

It's benign but the explanation is a bit long.
 
Basically, for a few I/O operations there are two ways that a driver can service the request. The first is through a procedural interface where the driver is called with a set of parameters that describe the I/O operation. The other is an interface where the driver receives a packetized description of the I/O operation.
 
The former interface is called the "fast I/O" interface and is entirely optional, the latter interface is the IRP based interface and what most drivers use. A driver may choose to register for both interfaces and in the fast I/O path simply return a code that means, "sorry, can't do it via the fast path, please build me an IRP and call me at my IRP based entry point." This is what you're seeing in the Process Monitor output, someone is returning "no" to the fast I/O path and this results in an IRP being generated and the normal path being taken.

  Fast I/O是可选的,如果系统不支持,那就DISALLOW。所以,就不能根据它来判断是否是命中缓存了。

来自:http://forum.sysinternals.com/what-is-fast-io-disallowed_topic23154.html

这方面的知识,《Winndows NT File System Internals》第5章的内容有讲解。

五、参考资料

1、《C++应用程序性能优化》第二版第9、10章

2、chromium源码的启动加速部分

3、以前写的《C++应用程序性能优化》读书笔记 http://www.cnblogs.com/cswuyg/archive/2010/08/27/1809808.html

posted on 2013-03-27 00:42  烛秋  阅读(5876)  评论(0编辑  收藏  举报