开个话题，C\C++文件操作对比

网上多是介绍几种文件操作方式，很少见到比较两种操作的，开个话题有时间研究总结一下：

C与C++文件操作的对比，包括各自特点、效率，C++编程里面究竟应该使用stdio还是stream的文件操作？

搜集的一些材料：

1. http://www.parashift.com/c++-faq-lite/input-output.html

[15.1] Why should I use `<iostream>` instead of the traditional `<cstdio>`?

Increase type safety, reduce errors, allow extensibility, and provide inheritability.

printf() is arguably not broken, and scanf() is perhaps livable despite being error prone, however both are limited with respect to what C++ I/O can do. C++ I/O (using << and >>) is, relative to C (using printf() and scanf()):

More type-safe: With <iostream>, the type of object being I/O'd is known statically by the compiler. In contrast, <cstdio> uses "%" fields to figure out the types dynamically.
Less error prone: With <iostream>, there are no redundant "%" tokens that have to be consistent with the actual objects being I/O'd. Removing redundancy removes a class 　　of errors.
Extensible: The C++ <iostream> mechanism allows new user-defined types to be I/O'd without breaking existing code. Imagine the chaos if everyone was simultaneously adding new incompatible "%" fields to printf() and scanf()?!
Inheritable: The C++ <iostream> mechanism is built from real classes such as std::ostream and std::istream. Unlike <cstdio>'s FILE*, these are real classes and hence inheritable. This means you can have other user-defined things that look and act like streams, yet that do whatever strange and wonderful things you want. You automatically get to use the zillions of lines of I/O code written by users you don't even know, and they don't need to know about your "extended stream" class.

[15.7] Should I end my output lines with `std::endl` or `'\n'`?

Using std::endl flushes the output buffer after sending a '\n', which means std::endl is more expensive in performance. Obviously if you need to flush the buffer after sending a '\n', then use std::endl; but if you don't need to flush the buffer, the code will run faster if you use '\n'.

This code simply outputs a '\n':

 void f()
 {
   std::cout << ...stuff... << '\n';
 } 

This code outputs a '\n', then flushes the output buffer:

 void g()
 {
   std::cout << ...stuff... << std::endl;
 } 

This code simply flushes the output buffer:

 void h()
 {
   std::cout << ...stuff... << std::flush;
 } 

Note: all three of the above examples require #include <iostream>

[15.16] Why can't I open a file in a different directory such as `"..\test.dat"`?

Because "\t" is a tab character.

You should use forward slashes in your filenames, even on operating systems that use backslashes (DOS, Windows, OS/2, etc.). For example:

 #include <iostream>
 #include <fstream>
 
 int main()
 {
   #if 1
     std::ifstream file("../test.dat");  // RIGHT!
   #else
     std::ifstream file("..\test.dat");  // WRONG!
   #endif
 
   ...
 }

Remember, the backslash ("\") is used in string literals to create special characters: "\n" is a newline, "\b" is a backspace, and "\t" is a tab, "\a" is an "alert", "\v" is a vertical-tab, etc. Therefore the file name "\version\next\alpha\beta\test.dat" is interpreted as a bunch of very funny characters. To be safe, use "/version/next/alpha/beta/test.dat" instead, even on systems that use a "\" as the directory separator. This is because the library routines on these operating systems handle "/" and "\" interchangeably.

Of course you could use "\\version\\next\\alpha\\beta\\test.dat", but that might hurt you (there's a non-zero chance you'll forget one of the "\"s, a rather subtle bug since most people don't notice it) and it can't help you (there's no benefit for using "\\" over "/"). Besides "/" is more portable since it works on all flavors of Unix, Plan 9, Inferno, all Windows, OS/2, etc., but "\\" works only on a subset of that list. So "\\" costs you something and gains you nothing: use "/" instead.

2. 读取效率

http://www.byvoid.com/blog/fast-readfile/

为确保准确性，我又换到Windows平台上测试了一下。结果如下表：

方法/平台/时间(秒)	Linux gcc	Windows mingw	Windows VC2008
scanf	2.010	3.704	3.425
cin	6.380	64.003	19.208
cin取消同步	2.050	6.004	19.616
fread	0.290	0.241	0.304
read	0.290	0.398	不支持
mmap	0.250	不支持	不支持
Pascal read	2.160	4.668

从上面可以看出几个问题

Linux平台上运行程序普遍比Windows上快。
Windows下VC编译的程序一般运行比MINGW（MINimal Gcc for Windows）快。
VC对cin取消同步与否不敏感，前后效率相同。反过来MINGW则非常敏感，前后效率相差8倍。
read本是linux系统函数，MINGW可能采用了某种模拟方式，read比fread更慢。
Pascal程序运行速度实在令人不敢恭维。

3. 提高速度

（1）内存映射

（2）使用WINAPI

（3）#优化算法（这才是王道）

http://dev.firnow.com/course/3_program/c++/cppjs/20090403/163891.html

FILE自己维护了一套缓存机制
FILE会使用默认的一个缓存值作为io缓存（4k），或者也可以通过setbuf来设置这个缓存的大小

假设你fread 1字节会导致ReadFile 4k，然后fread再将要读取的数据copy到指定的缓冲区中。以后访问只要不过这个边界，就一直从该io缓存中读取，fwrite也是，直到超过io 缓存边界才真正的调用WriteFile。可以调用flush主动强制刷新从而调用WriteFile 或者fclose被动刷新调用WriteFile（这时fclose会阻塞）。

再说一下硬盘的硬盘的cache由硬盘控制器管理和使用就像处理器的cache没法直接操作一样写硬盘的时候会先写入cache 然后硬盘内部会把数据慢慢写入磁盘这个过程中没有优化也就是说硬盘驱动按什么顺序写的写入磁盘就是什么顺序
而实际上硬盘是个随机访问设备先写哪个后写哪个无所谓所以一般在把应用层的io访问转化为底层的io请求后内核层会做io请求优化排序

假设一个io队列上目前挂着10个请求内核层会事先计算每个请求在物理上的位置然后进行排序以保证磁头转动一周，尽量让10个请求中的多个在一周内完成，想像一下最好的情况 10个请求都在一个盘面上磁头旋转1周 10个请求全部完成最坏的情况要转10周 10周的原因是一次只能操作一个磁头而10个请求可能不幸的在10个盘面上（这时候内核也不会再排序了）

因此让自己的io操作尽可能维持在连续的磁盘空间且在物理上不跨越盘面这样效果最好。为此你可能需要硬盘的准确的参数并精确计算。

缓存的优势在高强度的io操作会被抵消因为硬盘的写入速度始终跟不上处理器的请求 cache只能帮助缓冲一下 cache越大缓冲的时间越长当cache填满硬件上ready信号为无效硬盘驱动不能再写了只能挂起内核的io队列这时候上层还在不停的请求内核层要么继续往io请求队列上挂装请求要么阻塞发起io的进程等到cache有空间了硬件使能ready信号驱动重新从内河的io请求队列上摘取io请求再填cache 又满。。。。也就是说cache的优势只在一开始的缓存时间上这个优势对于小的io请求特别有好处因为能在填满cache之前不会遭到阻塞或挂起

纵上所述软件上其实做的很有限而且也很累何必呐 orz。。。。

posted @ 2011-05-25 02:55 旧博客阅读(1745) 评论(4) 编辑收藏举报

刷新页面返回顶部

开个话题，C\C++文件操作对比

[15.1] Why should I use <iostream> instead of the traditional <cstdio>?

[15.7] Should I end my output lines with std::endl or '\n'?

[15.16] Why can't I open a file in a different directory such as "..\test.dat"?

[15.1] Why should I use `<iostream>` instead of the traditional `<cstdio>`?

[15.7] Should I end my output lines with `std::endl` or `'\n'`?

[15.16] Why can't I open a file in a different directory such as `"..\test.dat"`?