PyTables 教程(二)多维表单元格和自动健全性检查,使用链接更方便地访问节点

翻译自http://www.pytables.org/usersguide/tutorials.html

多维表单元格和自动健全性检查

现在是一个更真实的例子(即代码中有错误)的时候了。我们将创建两个直接从根节点分支的组,Particles和Events。然后,我们将在每个组中创建三个表。在Particle中,我们将根据Particle 类创建表,在Events中根据Event类创建表。

之后,我们将为这些表提供许多记录。最后,我们将读取新创建的表 /Events/TEvent3 ,并利用复杂列表的方法从中选择一些值。

查看下一个脚本(您可以在examples/tutorial2.py 中找到它)。它似乎可以完成上述所有操作,但它包含一些bugs。请注意,此 Particle 类与上一教程中定义的类没有直接关系;这个类更简单(请注意,多维列名为pressure和temperature)。

我们还引入了一种新方式来将 Table 描述为结构化的 NumPy dtype(甚至是字典),如您在Event的描述符所见。有关可以传递给此方法的不同类型的描述符对象,请参阅 File.create_table()

import tables as tb
import numpy as np

# Describe a particle record
class Particle(tb.IsDescription):
    name        = tb.StringCol(itemsize=16)  # 16-character string
    lati        = tb.Int32Col()              # integer
    longi       = tb.Int32Col()              # integer
    pressure    = tb.Float32Col(shape=(2,3)) # array of floats (single-precision)
    temperature = tb.Float64Col(shape=(2,3)) # array of doubles (double-precision)

# Native NumPy dtype instances are also accepted
Event = np.dtype([
    ("name"     , "S16"),
    ("TDCcount" , np.uint8),
    ("ADCcount" , np.uint16),
    ("xcoord"   , np.float32),
    ("ycoord"   , np.float32)
    ])

# And dictionaries too (this defines the same structure as above)
# Event = {
#     "name"     : tb.StringCol(itemsize=16),
#     "TDCcount" : tb.UInt8Col(),
#     "ADCcount" : tb.UInt16Col(),
#     "xcoord"   : tb.Float32Col(),
#     "ycoord"   : tb.Float32Col(),
#     }

# Open a file in "w"rite mode
fileh = tb.open_file("tutorial2.h5", mode="w")

# Get the HDF5 root group
root = fileh.root

# Create the groups:
for groupname in ("Particles", "Events"):
    group = fileh.create_group(root, groupname)

# Now, create and fill the tables in Particles group
gparticles = root.Particles

# Create 3 new tables
for tablename in ("TParticle1", "TParticle2", "TParticle3"):
    # Create a table
    table = fileh.create_table("/Particles", tablename, Particle, "Particles: "+tablename)

    # Get the record object associated with the table:
    particle = table.row

    # Fill the table with 257 particles
    for i in range(257):
        # First, assign the values to the Particle record
        particle['name'] = f'Particle: {i:6d}'
        particle['lati'] = i
        particle['longi'] = 10 - i

        ########### Detectable errors start here. Play with them!
        particle['pressure'] = np.array(i * np.arange(2 * 3)).reshape((2, 4))  # Incorrect
        #particle['pressure'] = np.array(i * np.arange(2 * 3)).reshape((2, 3)) # Correct
        ########### End of errors

        particle['temperature'] = i ** 2     # Broadcasting

        # This injects the Record values
        particle.append()

    # Flush the table buffers
    table.flush()

# Now, go for Events:
for tablename in ("TEvent1", "TEvent2", "TEvent3"):
    # Create a table in Events group
    table = fileh.create_table(root.Events, tablename, Event, "Events: "+tablename)

    # Get the record object associated with the table:
    event = table.row

    # Fill the table with 257 events
    for i in range(257):
        # First, assign the values to the Event record
        event['name']  = f'Event: {i:6d}'
        event['TDCcount'] = i % (1<<8)   # Correct range

        ########### Detectable errors start here. Play with them!
        event['xcoor'] = float(i ** 2)     # Wrong spelling
        #event['xcoord'] = float(i ** 2)   # Correct spelling
        event['ADCcount'] = "sss"        # Wrong type
        #event['ADCcount'] = i * 2       # Correct type
        ########### End of errors

        event['ycoord'] = float(i) ** 4

        # This injects the Record values
        event.append()

    # Flush the buffers
    table.flush()

# Read the records from table "/Events/TEvent3" and select some
table = root.Events.TEvent3
e = [ p['TDCcount'] for p in table if p['ADCcount'] < 20 and 4 <= p['TDCcount'] < 15 ]
print(f"Last record ==> {p}")
print("Selected values ==> {e}")
print("Total selected records ==> {len(e)}")

# Finally, close the file (this also will flush all the remaining buffers!)
fileh.close()

 

1. 形状检查

如果您仔细查看代码,您会发现它不起作用。将返回以下错误:

$ python3 tutorial2.py
Traceback (most recent call last):
  File "tutorial2.py", line 60, in <module>
    particle['pressure'] = array(i * arange(2 * 3)).reshape((2, 4))  # Incorrect
ValueError: total size of new array must be unchanged
Closing remaining open files: tutorial2.h5... done

此错误表明您正尝试将形状不兼容的数组分配给表格单元格。查看源代码,我们看到我们试图将形状 (2,4) 的数组分配给pressure元素,而该元素定义的形状是 (2,3)。

通常,这些类型的操作是被禁止的,只有一个例外:当您将标量值分配给多维列单元格时,所有单元格元素都填充有标量值。例如:

particle['temperature'] = i ** 2    # Broadcasting

值 i**2 分配给温度表单元格的所有元素。此功能由 NumPy 包提供,称为广播

2. 字段名检查

修复上一个错误并重新运行程序后,我们又遇到了另一个错误。

$ python3 tutorial2.py
Traceback (most recent call last):
  File "tutorial2.py", line 73, in ?
    event['xcoor'] = float(i ** 2)     # Wrong spelling
  File "tableextension.pyx", line 1094, in tableextension.Row.__setitem__
  File "tableextension.pyx", line 127, in tableextension.get_nested_field_cache
  File "utilsextension.pyx", line 331, in utilsextension.get_nested_field
KeyError: 'no such column: xcoor'

此错误表明我们正在尝试为Event表对象中不存在的字段赋值。通过仔细查看 Event 类属性,我们发现我们拼错了 xcoord 字段(我们改写了 xcoor)。这对于 Python 来说是不寻常的行为,因为通常当您为不存在的实例变量赋值时,Python 会创建一个具有该名称的新变量。在处理包含固定字段名称列表的对象时,这样的功能可能很危险。 PyTables 检查该字段是否存在并在检查失败时引发 KeyError。

3. 数据类型检查

最后,我们将在这里找到的最后一个问题是 TypeError 异常。

$ python3 tutorial2.py
Traceback (most recent call last):
  File "tutorial2.py", line 75, in ?
    event['ADCcount'] = "sss"          # Wrong type
  File "tableextension.pyx", line 1111, in tableextension.Row.__setitem__
TypeError: invalid type (<type 'str'>) for column ``ADCcount``

而且,如果我们将受影响的行更改为:

event.ADCcount = i * 2        # Correct type

我们将看到脚本顺利结束。

您可以在Figure 4 中看到使用此(更正的)脚本创建的结构。特别要注意表 /Particles/TParticle2 中的多维列单元格。

../_images/tutorial2-tableview.png

图 4. 教程 2 的表层次结构。

posted @ 2021-11-28 12:21  chinagod  阅读(208)  评论(0编辑  收藏  举报