Python关于File学习过程

一、首先，认识下文件

文本文件和二进制文件的差异和区别

进行个总结：

计算机内的文件广义上来说，只有二进制文件

狭义上来讲分为两大类：二进制文件和文本文件。

先说数据的产生（即写操作）

文本文件的所有数据都是固定长度的，每条数据（也就是每个字符）都是1个字节。文本文件的“编/解码器”会将每条数据转换成ASCII码或者Unicode，然后以二进制的形式存到硬盘；

而二进制文件每条数据不固定，如short占2个字节，int占5个字节，float占8个字节（不一定，只是举个例子），这是二进制文件的写操作是将内存里的数据直接写入文件。

再说数据的读取：

文件的读过程是这样的：磁盘》》文件缓冲区》》应用程序内存空间。

我们说“文本文件和二进制文件没有区别”，实际上针对的是第一个过程；既然没有区别，那么打开方式不同，为何显示内容就不同呢？这个区别实际上是第二个过程造成的。

文件实际上包括两部分，控制信息和内容信息。纯文本文件仅仅是没有控制格式信息罢了；

1.以Numpy的multiarray.fromfile为例

numpy.fromfile()

def fromfile(file, dtype=None, count=-1, sep=''): # real signature unknown; restored from __doc__
    """
    fromfile(file, dtype=float, count=-1, sep='')
    
        Construct an array from data in a text or binary file.
    
        A highly efficient way of reading binary data with a known data-type,
        as well as parsing simply formatted text files.  Data written using the
        `tofile` method can be read using this function.
    
        Parameters
        ----------
        file : file or str
            Open file object or filename.
        dtype : data-type
            Data type of the returned array.
            For binary files, it is used to determine the size and byte-order
            of the items in the file.
        count : int
            Number of items to read. ``-1`` means all items (i.e., the complete
            file).
        sep : str
            Separator between items if file is a text file.
            Empty ("") separator means the file should be treated as binary.
            Spaces (" ") in the separator match zero or more whitespace characters.
            A separator consisting only of spaces must match at least one
            whitespace.
    
        See also
        --------
        load, save
        ndarray.tofile
        loadtxt : More flexible way of loading data from a text file.
    
        Notes
        -----
        Do not rely on the combination of `tofile` and `fromfile` for
        data storage, as the binary files generated are are not platform
        independent.  In particular, no byte-order or data-type information is
        saved.  Data can be stored in the platform independent ``.npy`` format
        using `save` and `load` instead.
    
        Examples
        --------
        Construct an ndarray:
    
        >>> dt = np.dtype([('time', [('min', int), ('sec', int)]),
        ...                ('temp', float)])
        >>> x = np.zeros((1,), dtype=dt)
        >>> x['time']['min'] = 10; x['temp'] = 98.25
        >>> x
        array([((10, 0), 98.25)],
              dtype=[('time', [('min', '<i4'), ('sec', '<i4')]), ('temp', '<f8')])
    
        Save the raw data to disk:
    
        >>> import os
        >>> fname = os.tmpnam()
        >>> x.tofile(fname)
    
        Read the raw data from disk:
    
        >>> np.fromfile(fname, dtype=dt)
        array([((10, 0), 98.25)],
              dtype=[('time', [('min', '<i4'), ('sec', '<i4')]), ('temp', '<f8')])
    
        The recommended way to store and load data:
    
        >>> np.save(fname, x)
        >>> np.load(fname + '.npy')
        array([((10, 0), 98.25)],
              dtype=[('time', [('min', '<i4'), ('sec', '<i4')]), ('temp', '<f8')])
    """
    pass

　值得注意的是，

Empty ("") separator means the file should be treated as binary.

　也就是说，default情况下，是将文件按照二进制文件读取的，加上separator参数后会将二进制转换后的ASCII码或者unicode再解码为文本数据，

以test.txt文件为例(1对应的ASCII码十进制为49，","为44)

test.txt

1,1,1,1,1

(1)使用默认sep参数读取：

filepath = "D://Documents/temp/testForPyStruct.txt"
data= np.fromfile(filepath , dtype=np.uint8, sep="")
print(data)

输出

[49 44 49 44 49 44 49 44 49]

(2)使用sep=","读取：

filepath = "D://Documents/temp/testForPyStruct.txt"
data= np.fromfile(filepath , dtype=np.uint8, sep=",")
print(data)

输出

[1 1 1 1 1]

See also
        --------
        load, save
        ndarray.tofile
        loadtxt : More flexible way of loading data from a text file.

posted @ 2018-11-30 12:25 汉尼拔草阅读(628) 评论(0) 收藏举报

刷新页面返回顶部

汉尼拔草

Python关于File学习过程

公告