PyTables 提供的一些工具

PyTables 提供了一些工具,可以方便查看以及分析生成的文件,以下是一个简单说明

ptdump

提供了查看数据以及元数据信息

  • 命令
usage: ptdump [-h] [-v] [-d] [-a] [-s] [-c] [-i] [-R RANGE] filename[:nodepath]

The ptdump utility allows you look into the contents of your PyTables files. It lets you see not only the data but also the metadata
(that is, the *structure* and additional information in the form of *attributes*).

positional arguments:
  filename[:nodepath]   name of the HDF5 file to dump

options:
  -h, --help            show this help message and exit
  -v, --verbose         dump more metainformation on nodes
  -d, --dump            dump data information on leaves
  -a, --showattrs       show attributes in nodes (only useful when -v or -d are active)
  -s, --sort            sort output by node name
  -c, --colinfo         show info of columns in tables (only useful when -v or -d are active)
  -i, --idxinfo         show info of indexed columns (only useful when -v or -d are active)
  -R RANGE, --range RANGE
                        select a RANGE of rows (in the form "start,stop,step") during the copy of *all* the leaves. Default values are
                        "None,None,1", which means a copy of all the rows.
  • ptrepack

提供了对于内部数据的复制操作能力

usage: ptrepack [-h] [-v] [-o] [-R RANGE] [--non-recursive] [--dest-title TITLE] [--dont-create-sysattrs] [--dont-copy-userattrs]
                [--overwrite-nodes] [--complevel COMPLEVEL]
                [--complib {zlib,lzo,bzip2,blosc,blosc:blosclz,blosc:lz4,blosc:lz4hc,blosc:zlib,blosc:zstd,blosc2,blosc2:blosclz,blosc2:lz4,blosc2:lz4hc,blosc2:zlib,blosc2:zstd}]
                [--shuffle {0,1}] [--bitshuffle {0,1}] [--fletcher32 {0,1}] [--keep-source-filters] [--chunkshape CHUNKSHAPE]
                [--upgrade-flavors] [--dont-regenerate-old-indexes] [--sortby COLUMN] [--checkCSI] [--propindexes]
                [--dont-allow-padding]
                sourcefile:sourcegroup destfile:destgroup

This utility is very powerful and lets you copy any leaf, group or complete subtree into another file. During the copy process you are
allowed to change the filter properties if you want so. Also, in the case of duplicated pathnames, you can decide if you want to
overwrite already existing nodes on the destination file. Generally speaking, ptrepack can be useful in may situations, like replicating
a subtree in another file, change the filters in objects and see how affect this to the compression degree or I/O performance,
consolidating specific data in repositories or even *importing* generic HDF5 files and create true PyTables counterparts.

positional arguments:
  sourcefile:sourcegroup
                        source file/group
  destfile:destgroup    destination file/group

options:
  -h, --help            show this help message and exit
  -v, --verbose         show verbose information
  -o, --overwrite       overwrite destination file
  -R RANGE, --range RANGE
                        select a RANGE of rows (in the form "start,stop,step") during the copy of *all* the leaves. Default values are
                        "None,None,1", which means a copy of all the rows.
  --non-recursive       do not do a recursive copy. Default is to do it
  --dest-title TITLE    title for the new file (if not specified, the source is copied)
  --dont-create-sysattrs
                        do not create sys attrs (default is to do it)
  --dont-copy-userattrs
                        do not copy the user attrs (default is to do it)
  --overwrite-nodes     overwrite destination nodes if they exist. Default is to not overwrite them
  --complevel COMPLEVEL
                        set a compression level (0 for no compression, which is the default)
  --complib {zlib,lzo,bzip2,blosc,blosc:blosclz,blosc:lz4,blosc:lz4hc,blosc:zlib,blosc:zstd,blosc2,blosc2:blosclz,blosc2:lz4,blosc2:lz4hc,blosc2:zlib,blosc2:zstd}
                        set the compression library to be used during the copy. Defaults to zlib
  --shuffle {0,1}       activate or not the shuffle filter (default is active if complevel > 0)
  --bitshuffle {0,1}    activate or not the bitshuffle filter (not active by default)
  --fletcher32 {0,1}    whether to activate or not the fletcher32 filter (not active by default)
  --keep-source-filters
                        use the original filters in source files. The default is not doing that if any of --complevel, --complib,
                        --shuffle --bitshuffle or --fletcher32 option is specified
  --chunkshape CHUNKSHAPE
                        set a chunkshape. Possible options are: "keep" | "auto" | int | tuple. A value of "auto" computes a sensible
                        value for the chunkshape of the leaves copied. The default is to "keep" the original value
  --upgrade-flavors     when repacking PyTables 1.x or PyTables 2.x files, the flavor of leaves will be unset. With this, such a leaves
                        will be serialized as objects with the internal flavor ('numpy' for 3.x series)
  --dont-regenerate-old-indexes
                        disable regenerating old indexes. The default is to regenerate old indexes as they are found
  --sortby COLUMN       do a table copy sorted by the index in "column". For reversing the order, use a negative value in the "step"
                        part of "RANGE" (see "-r" flag). Only applies to table objects
  --checkCSI            force the check for a CSI index for the --sortby column
  --propindexes         propagate the indexes existing in original tables. The default is to not propagate them. Only applies to table
                        objects
  --dont-allow-padding  remove the possible padding in compound types in source files. The default is to propagate it. Only applies to
                        table objects
  • pt2to3 主要是进行版本迁移的工具
usage: pt2to3 [-h] [-r] [-p] [-o OUTPUT] [-i] filename

PyTables 2.x -> 3.x API transition tool This tool displays to standard out, so it is common to pipe this to another file: $ pt2to3
oldfile.py > newfile.py

positional arguments:
  filename              path to input file.

options:
  -h, --help            show this help message and exit
  -r, --reverse         reverts changes, going from 3.x -> 2.x.
  -p, --no-ignore-previous
                        ignores previous_api() calls.
  -o OUTPUT             output file to write to.
  -i, --inplace         overwrites the file in-place.

说明

PyTables 内部的一些工具,实际上也是对于内部api的调用,只是提供了更加方便的使用,方便我们分析生成的数据文件以及进行一些额外的操作

参考资料

https://www.pytables.org/usersguide/utilities.html

posted on   荣锋亮  阅读(11)  评论(0编辑  收藏  举报

相关博文:
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· .NET10 - 预览版1新功能体验(一)
历史上的今天:
2024-02-05 dremio 下载大量查询结果数据的一个技巧
2024-02-05 spring-plugin简单使用
2023-02-05 dremio DacDaemonYarnApplication 简单说明
2023-02-05 apache twill 开发参考流程
2022-02-05 glob 方便的nodejs 文件查找包
2022-02-05 actionhero Initializer的优先级
2022-02-05 grouparoo 插件加载处理

导航

< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5
点击右上角即可分享
微信分享提示