读文件的常用方法及对比
不带格式的输入,将输入流直接按字节读取
测试文件均为24M大小的一个英文文件。
C语言,fread
#include <stdio.h> int main () { char c[1]; int num; FILE* fp = fopen("http://www.cnblogs.com/big.log", "rb"); while(1) { num = fread(c, 1, 1, fp); //每次只读一个char,第一个1表示一次读1个byte,第二个1表示连续读1次 //printf("%c",c[0]); if (num == 0) break; } return 0; }
运行时间
real 0m1.349s
user 0m1.028s
sys 0m0.316s
改为 char c[1024],即每次读1024个bytes作为一组,读1组,减少fread调用次数。
num = fread(c, 1024, 1, fp)
real 0m0.041s
user 0m0.004s
sys 0m0.036s
注意返回值是成功读出的组数,而不是读到的字节数目
假如文件中的字节数目不足1024,那么上面的fread则返回0.
按照huffman程序的要求,应该是一个byte一组,一次多读一些组
fread(c, 1, 1024, fp) 这样返回值相当与读到的字节数目。
NAME
fread, fwrite - binary stream input/output
SYNOPSIS
#include <stdio.h>
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
size_t fwrite(const void *ptr, size_t size, size_t nmemb,
FILE *stream);
The function fread() reads nmemb elements of data, each size bytes
long, from the stream pointed to by stream, storing them at the loca-
tion given by ptr.
python, read()
file = open('http://www.cnblogs.com/big.log', 'rb') while 1: c = file.read(1) if c =='': break #print c,
运行时间
real 0m15.013s
user 0m13.505s
Python慢在函数调用花费时间相比C太多。
C++ istreambuf_iterator
#include <iostream> #include <iterator> #include <string> #include <fstream> using namespace std; int main () { ifstream input_file("http://www.cnblogs.com/big.log",ios::binary); char c; //---istreambuf_iterator read,will read directly from the input stream istreambuf_iterator<char> eos; // end-of-range iterator istreambuf_iterator<char> iit (input_file); while (iit!=eos) c =*iit++; return 0; }
real 0m3.414s
user 0m2.528s
sys 0m0.884s
C++ istream_iterator
#include <iostream> #include <iterator> #include <string> #include <fstream> //Show C++ read a file by bytes //Show C read a file by bytes //Show system c read //================================================================== using namespace std; int main () { ifstream input_file("http://www.cnblogs.com/big.log",ios::binary); input_file.unsetf(ios::skipws); // 要接受空格符 char c; //---istreambuf_iterator read,will read directly from the input stream istream_iterator<char> eos; // end-of-range iterator istream_iterator<char> iit (input_file); while (iit!=eos) { c =*iit++; //cout << c; } return 0; }
real 0m2.433s
user 0m1.732s
sys 0m0.704s
以上都是gcc 4.2.4的结果,奇怪按照effective stl 29条的说法,用istream_iterator使用operator >>而istreambuf_iterator使直接从流缓冲区读,所以istreambuf_iterator会快很多。
但是我实验的结果反而它更慢啊。
如果强调文件读写的速度还是直接用C吧。
//C++ cin.get(ch),返回值是应用的istream对象
int main () { ifstream input_file("http://www.cnblogs.com/big.log",ios::binary); char c[1024]; //input_file.get(c,1024); while(input_file.get(c[0])); //cout << c[0]; return 0; }
real 0m2.300s
user 0m0.472s
sys 0m1.700s
//C++ cin.get(),返回值是一个int,-1表示结束 EOF
while((c[0] = input_file.get()) != EOF) cout << c[0];
real 0m0.885s
user 0m0.524s
sys 0m0.364s
//注意用get(char *, streamsize, delemeter = ‘\n’)的时候,读2个字节的话会第二个字节是\0,input_file.gcount显示1。
所以要想用这个函数一次读一个字符,要用2而不是1.
input_file.get(c,2, EOF);
//System C read()
#include <fcntl.h> int main () { //char buf[2048]; char c[1024]; int num; std::string s = "http://www.cnblogs.com/big.log"; //std::string s = "http://www.cnblogs.com/test.log"; //FILE* fp = fopen(s.c_str(), "rb"); int fp = open(s.c_str(), O_RDONLY); while(1) { //num = fread(c, 1024, 1, fp); num = read(fp, c, 1); //printf("%c",c[0]); if (num == 0) //if (c[0] == EOF) break; } return 0;
real 0m43.560s
user 0m4.908s
sys 0m38.594s