Python关于File学习过程

一、首先,认识下文件

文本文件和二进制文件的差异和区别

进行个总结:

计算机内的文件广义上来说,只有二进制文件

狭义上来讲分为两大类:二进制文件和文本文件。

先说数据的产生(即写操作)

文本文件的所有数据都是固定长度的,每条数据(也就是每个字符)都是1个字节。文本文件的“编/解码器”会将每条数据转换成ASCII码或者Unicode,然后以二进制的形式存到硬盘;

而二进制文件每条数据不固定,如short占2个字节,int占5个字节,float占8个字节(不一定,只是举个例子),这是二进制文件的写操作是将内存里的数据直接写入文件。

再说数据的读取:

文件的读过程是这样的:磁盘 》》 文件缓冲区》》应用程序内存空间。

我们说“文本文件和二进制文件没有区别”,实际上针对的是第一个过程;既然没有区别,那么打开方式不同,为何显示内容就不同呢?这个区别实际上是第二个过程造成的。

文件实际上包括两部分,控制信息和内容信息。纯文本文件仅仅是没有控制格式信息罢了;

 

 

 

1.以Numpy的multiarray.fromfile为例

numpy.fromfile()

def fromfile(file, dtype=None, count=-1, sep=''): # real signature unknown; restored from __doc__
    """
    fromfile(file, dtype=float, count=-1, sep='')
    
        Construct an array from data in a text or binary file.
    
        A highly efficient way of reading binary data with a known data-type,
        as well as parsing simply formatted text files.  Data written using the
        `tofile` method can be read using this function.
    
        Parameters
        ----------
        file : file or str
            Open file object or filename.
        dtype : data-type
            Data type of the returned array.
            For binary files, it is used to determine the size and byte-order
            of the items in the file.
        count : int
            Number of items to read. ``-1`` means all items (i.e., the complete
            file).
        sep : str
            Separator between items if file is a text file.
            Empty ("") separator means the file should be treated as binary.
            Spaces (" ") in the separator match zero or more whitespace characters.
            A separator consisting only of spaces must match at least one
            whitespace.
    
        See also
        --------
        load, save
        ndarray.tofile
        loadtxt : More flexible way of loading data from a text file.
    
        Notes
        -----
        Do not rely on the combination of `tofile` and `fromfile` for
        data storage, as the binary files generated are are not platform
        independent.  In particular, no byte-order or data-type information is
        saved.  Data can be stored in the platform independent ``.npy`` format
        using `save` and `load` instead.
    
        Examples
        --------
        Construct an ndarray:
    
        >>> dt = np.dtype([('time', [('min', int), ('sec', int)]),
        ...                ('temp', float)])
        >>> x = np.zeros((1,), dtype=dt)
        >>> x['time']['min'] = 10; x['temp'] = 98.25
        >>> x
        array([((10, 0), 98.25)],
              dtype=[('time', [('min', '<i4'), ('sec', '<i4')]), ('temp', '<f8')])
    
        Save the raw data to disk:
    
        >>> import os
        >>> fname = os.tmpnam()
        >>> x.tofile(fname)
    
        Read the raw data from disk:
    
        >>> np.fromfile(fname, dtype=dt)
        array([((10, 0), 98.25)],
              dtype=[('time', [('min', '<i4'), ('sec', '<i4')]), ('temp', '<f8')])
    
        The recommended way to store and load data:
    
        >>> np.save(fname, x)
        >>> np.load(fname + '.npy')
        array([((10, 0), 98.25)],
              dtype=[('time', [('min', '<i4'), ('sec', '<i4')]), ('temp', '<f8')])
    """
    pass

  值得注意的是,

Empty ("") separator means the file should be treated as binary.

 也就是说,default情况下,是将文件按照二进制文件读取的,加上separator参数后会将二进制转换后的ASCII码或者unicode再解码为文本数据,

以test.txt文件为例(1对应的ASCII码十进制为49,","为44)

test.txt

1,1,1,1,1

(1)使用默认sep参数读取:

filepath = "D://Documents/temp/testForPyStruct.txt"
data= np.fromfile(filepath , dtype=np.uint8, sep="")
print(data)

输出

[49 44 49 44 49 44 49 44 49]

(2)使用sep=","读取:

filepath = "D://Documents/temp/testForPyStruct.txt"
data= np.fromfile(filepath , dtype=np.uint8, sep=",")
print(data)

输出

[1 1 1 1 1]

 

 

2.

See also
        --------
        load, save
        ndarray.tofile
        loadtxt : More flexible way of loading data from a text file.

 

posted @ 2018-11-30 12:25  汉尼拔草  阅读(602)  评论(0编辑  收藏  举报