本书做为一本入门读物,介绍了如何使用图数据库及相关的技术和如何使用小精灵查询语言。然而花一些时间总结如下内容是值得的。这些内容包括:什么是图数据库、图数据库的一些使用案例,以及在一个满是关系型数据库(SQL DB)和非关系型(NoSQL DB)数据库的世界里为什么您要关注图数据库。
本书中我们将一些讨论有向属性图(directed Property graphs),从概念层面上理解这些图是相当容易的。您有三个基本的构建块,顶点有时也称为结点,边和属性。 顶点代表了实体,例如人或者地方。边表示顶点之间的关系,属性是附加在顶点或者边上的信息。 名称中的指示部分(directed part)表示边是有方向的。 它从一个顶点指向另一个顶点。您有时会听人们使用单词“有向图 digraph”做为“有方向的图dircted graph” 的简写。考虑一下关系”凯文认识杰克“ ,这个关系可以建模为:每个人都是图中的一个顶点,且两都之间有一条边的关系。如下所示:
凯文—认识→杰克
This book is mainly intended to be a tutorial in working with graph databases and related
technology using the Gremlin query language. However, it is worth spending just a few moments to
summarize why it is important to understand what a graph database is, what some good use cases
for graphs are and why you should care in a world that is already full of all kinds of SQL and
NoSQL databases. In this book we are going to be discussing directed property graphs. At the
conceptual level these types of graphs are quite simple to understand. You have three basic
building blocks. Vertices (often referred to as nodes), edges and properties. Vertices represent
"things" such as people or places. Edges represent connections between those vertices, and
properties are information added to the vertices and edges as needed. The directed part of the name
means that any edge has a direction. It goes out from one vertex and in to another. You will
sometimes hear people use the word digraph as shorthand for directed graph. Consider the
relationship "Kelvin knows Jack". This could be modeled as a vertex for each of the people and an
edge for the relationship as follows.
Kelvin — knows → Jack
需要注意的是,图中的箭头指示了关系中的方向。如果我们想要表达这样的事实:杰克同样也认识凯文。我们需要有第二条边从杰克指向凯文。属性也可以加到每个人的节点上,从而给出更多的信息。例如年龄可以做为每个顶点的信息。
Note the arrow which implies the direction of the relationship. If we wanted to record the fact that
Jack also admits to knowing Kelvin we would need to add a second edge from Jack to Kelvin.
Properties could be added to each person to give more information about them. For example, my
age might be a property on my vertex.
事实表明杰克喜欢猫,我们想要在图中存储这样的信息,我们同样可以创建如下的关系:
杰克 — 喜欢→猫
It turns out that Jack really likes cats. We might want to store that in our graph as well so we could
create the relationship:
Jack — likes → Cats
现在我们的图中有了更多的信息,我们可以回答这样的问题:凯文认识的人中谁喜欢猫呢?
凯文— 认识→杰克— 喜欢→ 猫
Now that we have a bit more in our graph we could answer the question "who does Kelvin know
that likes cats?"
Kelvin — knows → Jack — likes → Cats
这是一个简单的例子,但是希望您已经能够通过此例弄明白我们对现实世界中的事务进行数据建模的方式。 拥有了这些知识,您现在已经拥有了全部的基本的构建块,这些构建块将是您在构思如何把物体或实体建模抽像成为图时首先要想到的。
This is a simple example but hopefully you can already see that we are modelling our data the way
we think about it in the real world. Armed with this knowledge you now have all of the basic
building blocks you need in order to start thinking about how you might model things you are
familiar with as a graph.
现在回到问题“为什么我们要关注呢?”,如果一些东西看起来像是图,然后我们按此去对它进行建模那不是很好吗。 我们的日常生活中围绕着许多很适合用图形表示的东西。比如我们的社交和经济网络,通勤的路线,电话网络,旅行您可选的航线的选项,它们都是很好的例子。也有很多图数据库和算法的伟大的商业应用。它们包括推荐系统,预防犯罪和错误发现。在这我们仅列出这三个。
So getting back to the question "why should I care?", well, if something looks like a graph, then
wouldn’t it be great if we could model it that way. Many things in our everyday lives center around
things that can very nicely be represented in a graph. Things such as your social and business
networks, the route you take to get to work, the phone network, airline route choices for trips you
need to take are all great candidates. There are also many great business applications for graph
databases and algorithms. These include recommendation systems, crime prevention and fraud
detection to name but three.
反之亦然。如果某件事或某物感觉不像图,那么不要勉强它成为图。您的视频放在您现在存放它们的对象存储中是相当不错的。您的销售账薄系统建立在关系型数据库上,它也在那运转良好; 就像文件柜是您存放文件的正确的地方一样。所以“用正确的工具做事”是放之四海皆准的道理。当您要存储的数据天然的具有关联时,请使用图数据库。航线图是本书所有例子的基础,它就是天然带有关联的数据的典型。
The reverse is also true. If something does not feel like a graph then don’t try to force it to be. Your
videos are probably doing quite nicely living in the object store where you currently have them. A
sales ledger system built using a relational database is probably doing just fine where it is and
likewise a document store is quite possibly just the right place to be storing your documents. So
"use the right tool for the job" remains as valid a phrase here as elsewhere. Where graph databases
come into their own is when the data you are storing is intrinsically linked by its very nature, the
air routes network used as the basis for all of the examples in this book being a perfect example of
such a situation.
如果您的第一反应是“图已经存在了很多年了,为什么把它视为新概念呢?”,把图将为计算机科学课程的一部分,您是正确的。事实上,早在1763年,伦德纳·欧拉就创建并展示了第一个图问题并发明了“图理论的”完整的概念,当时他正在研究著名的“哥尼斯堡七桥问题”
Those of you that looked at graphs as part of a computer science course are correct if your reaction
was "Surely graphs have been around for ages, why is this considered new?". Indeed, Leonard
Euler is credited with demonstrating the first graph problem and inventing the whole concept of
"Graph Theory" all the way back in 1763 when he investigated the now famous "Seven Bridges of
Koenigsberg" problem.
如果您想要立读更多关于图论和它在当代的应用,您可以在线找到许多有趣的信息。您可以从维基百科中图论的链接开始您的探索。
If you want to read a bit more about graph theory and its present-day application, you can find a lot
of good information online. Here’s a Wikipedia link to get you started: https://en.wikipedia.org/wiki/Graph_theory
那么,既然图论不是新的理念,为什么刚好最近我们看到了图数据库系统和应用构建和实施的迅猛增长呢?答案的一部分内容是:电脑硬件和软件已达到了临界点,从这个临界点开始,您可以构建庞大的大数据系统,这样的系统可以以合理的价格扩缩容。事实上,如果您使用云服务,您不需要购买您的系统运行所需的硬件,这样做比构建一个大型系统还要容易。
So, given Graph Theory is anything but a new idea, why is it that only recently we are seeing a
massive growth in the building and deployment of graph database systems and applications? At
least part of the answer is that computer hardware and software has reached the point where you
can build large big data systems that scale well for a reasonable price. In fact, it’s even easier than
ever to build the large systems because you don’t have to buy the hardware that your system will
run on when you use the cloud.
您也可以在您的笔记本电脑上运行图数据库--笔者每天就是这样做的--事实上,在生产环境,扩容时,它们都是大数据系统。大型图数据中通常有数亿个结点和边。它们占据了千万亿字节的磁盘空间。图算法是计算密集的和存储密集的,也就是刚刚才开始为大数据系统布署必需的资源,,这些大数据系统在商业领域具有金融价值,而不仅仅是在监管或者学术领域。图型数据库应用的越来越广,从高端科学研究到金融网络,及更多领域。
While you can certainly run a graph database on your laptop—I do just that every day—the reality
is that in production, at scale, they are big data systems. Large graphs commonly have many
billions of vertices and edges in them, taking up petabytes of data on disk. Graph algorithms can be
both compute- and memory-intensive, and it is only fairly recently that deploying the necessary
resources for such big data systems has made financial sense for more everyday uses in business,
and not just in government or academia. Graph databases are becoming much more broadly
adopted across the spectrum, from high-end scientific research to financial networks and beyond.
另一个催生图数据库革命的事实是高质量的开源技术。存在有大数的优秀的开源项目,它们解决了一系列问题,您存储图数据的数据库,遍历访问数据所有的查询语言,做为用户接口层的展示图数据的各种可视化方法。特别是, 一种称为属性图的,我们可以看到它的快速的发展和使用。在一个属性图中,顶点和边都可以有属性(实际上是,健值对)与之关联。您可以构建各种各样风格的图,整本书将涉及到各种设计模式,但是在本书中,属性图,我们将重点关注能支持的绝大多数常见的使用模式(usage pattern),如果您听说过有向图或者无向图,环,非循环图这些概念或者更多的您在使用图数据库时将会遇到的术语,您可以通过在线搜索快速的查看并熟悉这些术语,关于这些模式的深入讨论超出了本书的围范,没必为了快速高效,掌握图论的全部背景知识。
Another factor that has really helped start this graph database revolution is the availability of high
quality open source technology. There are a lot of great open source projects addressing everything
from the databases you need to store the graph data, to the query languages used to traverse them,
all the way up to visually displaying graphs as part of the user interface layer. In particular, it is so
called property graphs where we are seeing the broadest development and uptake. In a property
graph, both vertices and edges can have properties (effectively, key-value pairs) associated with
them. There are many styles of graph that you may end up building and there have been whole
books written on these various design patterns, but the property graph technology we will focus on
in this book can support all of the most common usage patterns. If you hear phrases such as
directed graph and undirected graph, or cyclic and acyclic graph, and many more as you work with
graph databases, a quick online search will get you to a place where you can get familiar with that
terminology. A deep discussion of these patterns is beyond the scope of this book, and it’s in no way
essential to have a full background in graph theory to get productive quickly.
第三,也同样重要的因素是图数据库对于初学编程的朋友来说学习的门槛很低。就像您在本书例子中看到的一样,任何想体会图数据库的朋友都可以下载阿帕奇TinerPop的包,只要它安装了Java8 ,图数据库就可以无需配置就启动,时间不到5分钟。图数据库没有强制您在访问和着手构建图前,先定义数据库模式(schema)或者指定表的结构(Layout)和列(columns)。程序员或者开发工程师发现图风格的编程方式相当的自然,它很好的模拟了对我们认知世界的方式。
A third, and equally important, factor in the growth we are seeing in graph database adoption is the
low barrier of entry for programmers. As you will see from the examples in this book, someone
wanting to experiment with graph technology can download the Apache TinkerPop package and as
long as Java 8 is installed, be up and running with zero configuration (other than doing an unzip of
the files), in as little as five minutes. Graph databases do not force you to define schemas or specify
the layout of tables and columns before you can get going and start building a graph. Programmers
also seem to find the graph style of programming quite intuitive as it closely models the way they
think of the world.
图数据库技术不应被认为是“推倒重来”的技术,它是对您已经布署并使用的其它的数据库的必要的补充。一个通用的使用案例是图数据图作为其它数据存储的一种更智能(smart)的索引(Index)。这个有时也被称为是多语言数据架构(polyglot data architecture)。
Graph database technology should not be viewed as a "rip and replace" technology, but as very
much complementary to other databases that you may already have deployed. One common use
case is for the graph to be used as a form of smart index into other data stores. This is sometimes
called having a polyglot data architecture.
浙公网安备 33010602011771号