feather-format in-memory analytics 内存分析 内存计算 内存数据

 in-memory analytics

Apache Arrow

Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware.

The project is developing a multi-language collection of libraries for solving systems problems related to in-memory analytical data processing. This includes such topics as:

  • Zero-copy shared memory and RPC-based data movement

  • Reading and writing file formats (like CSV, Apache ORC, and Apache Parquet)

  • In-memory analytics and query processing

 

Apache Arrow for Go

Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and inter-process communication.

Reference Counting

arrow makes use of reference counting so that it can track when memory buffers are no longer used. This allows arrow to update resource accounting, pool memory such and track overall memory usage as objects are created and released. Types expose two methods to deal with this pattern. The Retain method will increase the reference count by 1 and Release method will reduce the count by 1. Once the reference count of an object is zero, any associated object will be freed. Retain and Release are safe to call from multiple goroutines.

When to call Retain / Release?
  • If you are passed an object and wish to take ownership of it, you must call Retain. You must later pair this with a call to Release when you no longer need the object. "Taking ownership" typically means you wish to access the object outside the scope of the current function call.

  • You own any object you create via functions whose name begins with New or Copy or when receiving an object over a channel. Therefore you must call Release once you no longer need the object.

  • If you send an object over a channel, you must call Retain before sending it as the receiver is assumed to own the object and will later call Release when it no longer needs the object.

 

 

feather-format · PyPI https://pypi.org/project/feather-format/

Python interface to the Apache Arrow-based Feather File Format

Feather efficiently stores pandas DataFrame objects on disk. It depends on the Apache Arrow for Python.

 Feather File Format — Apache Arrow v12.0.1 https://arrow.apache.org/docs/python/feather.html

posted @ 2023-08-02 16:43  papering  阅读(14)  评论(0编辑  收藏  举报