Hierarchical data in postgres
https://coderwall.com/p/whf3-a/hierarchical-data-in-postgres
-------------------------------------------------------------------------------
This tip will try to answer the following questions:
- How can we represent a tree of data in postgres
- How can we efficiently query for any entire single node and all of it's children (and children's children).
The test data
Since we want to keep this simple we will assume our data is just a bunch of sections
. A section just has a name
and each section
has a single parent section
.
Section A
|--- Section A.1
Section B
|--- Section B.1
|--- Section B.1
|--- Section B.1.1
We'll use this simple data for examples below.
Simple self-referencing
When designing a self-referential table (something that joins itself to itself) the most obvious choice is to have some kind of parent_id
column on our table that references itself.
CREATE TABLE section (
id INTEGER PRIMARY KEY,
name TEXT,
parent_id INTEGER REFERENCES section,
);
ALTER TABLE page ADD COLUMN parent_id INTEGER REFERENCES page;
CREATE INDEX section_parent_id_idx ON section (parent_id);
Now insert our example data, using the parent_id
to related the nodes together:
INSERT INTO section (id, name, parent_id) VALUES (1, 'Section A', NULL);
INSERT INTO section (id, name, parent_id) VALUES (2, 'Section A.1', 1);
INSERT INTO section (id, name, parent_id) VALUES (3, 'Section B', NULL);
INSERT INTO section (id, name, parent_id) VALUES (4, 'Section B.1', 3);
INSERT INTO section (id, name, parent_id) VALUES (5, 'Section B.2', 3);
INSERT INTO section (id, name, parent_id) VALUES (6, 'Section B.2.1', 5);
This works great for simple queries such as, fetch the direct children of Section B:
SELECT * FROM section WHERE parent = 3
but it will require complex or recursive queries for questions like fetch me all the children and children's children of Section B:
WITH RECURSIVE nodes(id,name,parent_id) AS (
SELECT s1.id, s1.name, s1.parent_id
FROM section s1 WHERE parent_id = 3
UNION
SELECT s2.id, s2.name, s2.parent_id
FROM section s2, nodes s1 WHERE s2.parent_id = s1.id
)
SELECT * FROM nodes;
So we have answered the "how to build a tree" part of the question, but are not happy with the "how to query for a node and all it's children" part.
Enter ltree. (Short for "label tree" - I think?).
The ltree extension
The ltree extension is a great choice for querying hierarchical data. This is especially true for self-referential relationships.
Lets rebuild the above example using ltree. We'll use the page's primary keys as the "labels" within our ltree paths and a special "root" label to denote the top of the tree.
CREATE EXTENSION ltree;
CREATE TABLE section (
id INTEGER PRIMARY KEY,
name TEXT,
parent_path LTREE
);
CREATE INDEX section_parent_path_idx ON section USING GIST (parent_path);
We'll add in our data again, this time rather than using the id
for the parent, we will construct an ltree path that represents the parent node.
INSERT INTO section (id, name, parent_path) VALUES (1, 'Section 1', 'root');
INSERT INTO section (id, name, parent_path) VALUES (2, 'Section 1.1', 'root.1');
INSERT INTO section (id, name, parent_path) VALUES (3, 'Section 2', 'root');
INSERT INTO section (id, name, parent_path) VALUES (4, 'Section 2.1', 'root.3');
INSERT INTO section (id, name, parent_path) VALUES (4, 'Section 2.2', 'root.3');
INSERT INTO section (id, name, parent_path) VALUES (5, 'Section 2.2.1', 'root.3.4');
Cool. So now we can make use of ltree's operators @>
and <@
to answer our original question like:
SELECT * FROM section WHERE parent_path <@ 'root.3';
However we have introduced a few issues.
- Our simple
parent_id
version ensured referential consistancy by making use of theREFERENCES
constraint. We lost that by switching to ltree paths. - Ensuring that the ltree paths are valid can be a bit of a pain, and if paths become stale for some reason your queries may return unexpected results or you may "orphan" nodes.
The final solution
To fix these issues we want a hybrid of our original parent_id
(for the referential consistency and simplicity of the child/parent relationship) and our ltree paths (for improved querying power/indexing). To achieve this we will hide the management of the ltree path behind a trigger and only ever update the parent_id
column.
CREATE EXTENSION ltree;
CREATE TABLE section (
id INTEGER PRIMARY KEY,
name TEXT,
parent_id INTEGER REFERENCES section,
parent_path LTREE
);
CREATE INDEX section_parent_path_idx ON section USING GIST (parent_path);
CREATE INDEX section_parent_id_idx ON section (parent_id);
CREATE OR REPLACE FUNCTION update_section_parent_path() RETURNS TRIGGER AS $$
DECLARE
path ltree;
BEGIN
IF NEW.parent_id IS NULL THEN
NEW.parent_path = 'root'::ltree;
ELSEIF TG_OP = 'INSERT' OR OLD.parent_id IS NULL OR OLD.parent_id != NEW.parent_id THEN
SELECT parent_path || id::text FROM section WHERE id = NEW.parent_id INTO path;
IF path IS NULL THEN
RAISE EXCEPTION 'Invalid parent_id %', NEW.parent_id;
END IF;
NEW.parent_path = path;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER parent_path_tgr
BEFORE INSERT OR UPDATE ON section
FOR EACH ROW EXECUTE PROCEDURE update_section_parent_path();
Much better.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:基于图像分类模型对图像进行分类
· go语言实现终端里的倒计时
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 分享一个免费、快速、无限量使用的满血 DeepSeek R1 模型,支持深度思考和联网搜索!
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· 25岁的心里话
· ollama系列01:轻松3步本地部署deepseek,普通电脑可用
· 按钮权限的设计及实现
2017-04-27 cakephp2.3.8中何为component