当您需要实验一些新的点子或者新的技术,或者有些时候,同时实验新点子和技术,笔者发现使用Docker容器是一种非常好的方式可以快速地让东西运行起来。有一个非常有用的Cassandra的容器化的实现,您可以下载并在几分钟运行它,并进行杰森图的测试。在本节笔者将要带您进行这些步骤,笔者使用了一个单Cassandra节点,启动并用它处理杰森图,配置航线图。笔者假设您在您的环境中下载并安装了所需的Docker运行时环境。笔者使用Linux系统做了大部分的Docker测试,但是也有一些Docker运行时环境是Windows和Mac OS也可以使用的。假设您已经安装了Docker,可以使用简单的docker pull命令像下面这样安装Cassandra.
I find that using Docker containers can be a great way to quickly get things running when you are
experimenting with new ideas or new technology, or as if often the case, both at the same time!
There is a very useful containerized implementation of Apache Cassandra available that you can
download and get running in a few seconds and use to test things with JanusGraph. In this section I
will walk you through the steps that I use to get a single Cassandra node up and running and use it
with JanusGraph to setup the air-routes graph. I am going to make the assumption that you have
already downloaded and installed the necessary Docker runtime for your platform. I do most of my
Docker testing using Linux systems but there are runtimes available for Windows and Mac OS as
well. Assuming you have docker installed, Cassandra can be installed using a simple docker pull
command as shown below.
注意要弄清楚在哪输入命令:需要输入Linux终端shell的命令,它是以"sh>"为前缀的;输入小精灵控制台的命令是以"gremlin>"为前缀的。
Note that to make it clearer where commands need to be entered commands that need to be
entered into the Linux terminal shell are prefixed with "sh>" and commands that are entered into
the Gremlin Console have the "gremlin>" prefix.

 

6.9.1. 开启Cassandra 容器 Starting the Cassandra container
Docker容器为您下载了Cassandra 镜像,启动和运行一个Cassandra 实例就非常简单。您可以用不同的方式来配置Docker容器。为了让事情尽可能简单,笔者要只使用命令行参数。命令做的事情如下注释所示。笔者把命令拆成了四行以便容易阅读。
Once Docker has downloaded the Cassandra image for you, it is quite simple to get a single instance
of Cassandra up and running. There are different ways that you can use to configure Docker. To
keep things simple I am going to just use command line parameters. The command does several
things as shown in the notes below it. I split the command over four lines to make it easier to read. 

 

①开启一个新的Cassandra容器实例。
① Starts a new instance of the Cassandra container. 
② 使用-d标记在后台运行命令
② Runs the command in the background using the "-d" flag. 
③ -p标记 暴露Cassandra使用的关键的接口,这样杰森图可以连接到Cassandra实例上
③ Exposes the key ports that Cassandra uses so that JanusGraph can connect to this Cassandra
instance ("-p" flags).  
④ Cassandra卷映射到本地磁盘。这是数据存储的地方。如果我们没有这么做,无论什么时候删除容器,数据就会丢失。
④ Maps (mounts) the Cassandra volume to the local disk. This is where the data will be stored. If
we did not do this the data would be lost whenever the container gets deleted ("-v" flag).
⑤使用-e CASSANDRA_START_RPC=true 的配置来启用Thrift支持。如果您使用CQL,这是默认启用的,它是不需要的。
⑤ Enables Thrift support using the -e CASSANDRA_START_RPC=true setting. This is not needed if
you use CQL which is enabled by default.
⑥ 将容器命名为"cass" ,为了我们以更方便引用它。
⑥ Names the container "cass" which makes it easier for us to refer to it later.
在任何时间,如果您要查看您的新容器的进度,您可以使用下面的命令来查看日志。
If you want to check on the progress of your new container at any time you can just check the logs
using the command below

 和其它的Docker容器一样,我们的Cassandra容器也可以使用下面的命令,按需停用或启动。如果杰森图仍忙于写数据,就需要注意别中止容器。

As with other Docker containers, our Cassandra container can be stopped and started as needed
using the following commands. Care should be taken not to stop the container if JanusGraph is still
busy writing data.

 

6.9.2. 连接杰森图到Cassandra Connecting JanusGraph to Cassandra
现在我们有了一个正在运行的Cassandra的实例,是时间启动控制台了。它包含了一个已下载的杰森图,并连接到了Cassandra。当要连接到它的时候,Cassandra支持不同种的协议。这些包括了:Astyanax(来自网飞),Thrift和CQL。在本节中,笔者将只讨论Thrift和CQL。深入地学习这些协议超出了本书的范围,但是如果您想要阅读更多关于它们,一些网络搜索您就可以找到大量的文档。要注意的是Thrift和Astyanax已淘汰,更推荐使用CQL。在将来的某些时间,对于老协议的支持将会废除,所以熟练使用CQL做为您的首选的连接杰森图到Cassandra可能是一个好的主意。
Now that we have an instance of a Cassandra running, it’s time to start the Gremlin Console that is
included with the JanusGraph download and connect to Cassandra. Cassandra supports different
protocols that can be used when connecting to it. These include Astyanax (from Netflix), Thrift and
CQL. In this section I am just going to discuss Thrift and CQL. An in depth study of these protocols is
beyond the scope of this book but if you want to read more about them a few web searches will find
you plenty of documentation. It should be noted that both Thrift and Astyanax are being
deprecated in favor of CQL. At some point in the future support for the older protocols is likely to
be dropped so it is probably a good idea to get comfortable using CQL as the primary way that you
connect JanusGraph to Cassandra,
在样例文件夹https://github.com/krlawrence/graph/tree/master/sample-code.中有一个名为 janus-cassandra.groovy的脚本,这个脚本可以自动化我们将在本节中讨论的一些处理,鼓励您学习它。
A script called janus-cassandra.groovy is available in the sample-code folder at
https://github.com/krlawrence/graph/tree/master/sample-code. The script will
automate everything that we are about to discuss in this section and you are
encouraged to study it.
在杰森图的下载中包含了一些属性文件。它们在/conf目录中,这个目录在杰森图的根目录文件中。属性文件可以用来帮助连接杰森图各种不同的后端技术。 如果需要可以编辑这些属性文件,但如果您使用 Cassandra默认的端口和运行在您本地的 Cassandra,为了本节中讨论的内容,您不需要编辑任何东西。
A number of properties files are included with the JanusGraph download. They are located in the
/conf folder below the root of the JanusGraph folder. The properties files can be used to help
connect JanusGraph to a number of different back end technologies. These properties files can be
edited as needed but so long as you are using the default Cassandra ports with Cassandra running
on your local machine (localhost) you should not have to edit anything for the purpose of this
discussion.
如果您决定在远程的机器上运行 Cassandra,您就需要编辑属性文件,或者创建一个新的,这样它包含了恰当的主机名和远程系统的IP地址。
If you decide to run Cassandra on a remote machine, you will need to edit the
properties file, or create a new one, so that it contains the appropriate host names
and IP addresses of the remote system.
如果您想要使用CQL协议连接杰森图到 Cassandra,您可以用到janusgraph-cql.properties文件,如下所示。
If you want to connect JanusGraph to Cassandra using the CQL protocol you can use the janusgraph
cql.properties file as shown below.
 

当您执行这个命令,您可能看到一个警告消息,它跟着一串长长的栈信息 。尽管看起来有点吓人,这个可以忽略,它仍是工作的。笔者相信这是社区中一个众人皆知的问题。

You may see a warning message followed by a long stack trace when you issue this command.
Despite looking like something horrible has happened this can be ignored and things will still work.
I believe that this is a known issue in the community.
除了这个可能的警告消息之外,如果一切正常,在运行命令以后,您应当看到类似下面这样的输出。这说明我们有一个CQL连接到我们的Cassandra实例,它运行在本地机器上。
Aside from a potential warning message, if all goes well you should see something like the output
below after the command has run. This shows that we have a CQL connection to our Cassandra
instance running on or local machine at 127.0.0.1.

 如果您想要使用Thrift协议来连接杰森图到 Cassandra,您可以用到janusgraph-cassandra.properties 文件,如下所示。

If you want to connect JanusGraph to Cassandra using the Thrift protocol you can use the
janusgraph-cassandra.properties file as shown below. 

如果命令成功,您将得到类似这样的一些输出。 

If the command succeeds, you should get back some output that looks like this. 

 运行上面的任一条命令,一个新的杰森图的实例将会被创建,杰森图将会使用指定的协议来尝试连接到Cassandra 。第一次您连接到一个全部的空的Cassandra 实例,您应当首先通过创建键的定义来定义图的模式,在创建顶点,边或属性前,您要创建您所需要的索引。如果想使用Cassandra 做为后端存储来进行航线图的实验。在样例代码目录中有一个名为 janus-cassandra.groovy的文件可以使用。如果您愿意自己实验,在增加新的顶点和边之前,您可以在控制台中使用杰森图的管理API来创建键和索引,创建遍历源对象。

When either of these commands are run, a new JanusGraph instance will be created and
JanusGraph will attempt to connect to Cassandra using the specified protocols. The first time you
connect to a brand new (empty) Cassandra instance you should first define the graph’s schema by
creating key definitions and create any indexes that you need before creating any vertices, edges or
properties. If you would like to experiment with the air-routes data using Cassandra as the backing
store, the script called janus-cassandra.groovy from the sample-code folder can be used for this. If
you prefer you can experiment yourself from the console using the JanusGraph management API to
create keys and indexes and creating a traversal source object before adding any vertices and
edges.
如果您选择了运行janus-cassandra.groovy脚本,它将创建需要的键和索引,然后加载航线图并运行一些测试来确保一切都正常工作。注意您只需要这样设置一次,下次这些数据已经加载了,模式也会定义好。
If you choose run the janus-cassandra.groovy script it will create the keys and indexes needed and
then load the air-routes graph and also run a few tests to make sure everything is working. Note
that you only need to do this setup step once as next time the data will have already been loaded
and the schema defined.
我们已将我们的图存储在了一个Cassandra的实例上,它的数据持久化在了我们本地的文件系统。下次,您启动杰森图,再次连接Cassandra,您的数据已经就绪了。
As we are storing our graph into an instance of Cassandra where the data is being
persisted on our local file system, the next time you start JanusGraph and re
connect to Cassandra your data will be waiting for you!
为了从控制台运行脚本,您只需使用下面这样的命令来加载它。
To run the script from the Gremlin Console you can just use the :load command to load it as shown
below.

 如果脚本按预期工作,您现在就可以查询图了。

If the script works as expected you should now be able to query the graph.

 无论什么时候您完成了图的处理,及时关闭它,这是个好主意。一旦关闭了,在您开始再次处理它之前,您就得重新使用上面所示的两种步骤之一来重新连接它。

Whenever you are finished working with the graph, it is a good idea to close it. Once closed you will
have to reconnect using one of the two open steps shown above before you can start working with
it again.

 如果您重新连接了图,那个前边已经加载过数据并关闭了的图,您可以使用下面的命令。如果您使用Thrift而不是CQL,您就只需使用cassandra.properties 文件了。

If you are reconnecting to your graph, having previously loaded some data and closed it, you can
use the following commands. If you are using Thrift instead of CQL you would use the janusgraph
cassandra.properties file instead.

在测试和实验时,一个常见的需求是可以抛出一些东西并再次启动。 这么做的最简单的方法就是使用如下所示的命令。这将移除您所有的数据、索引、模式定义,所以如果您真的想要重头开始,您可以这么做。 

A common requirement when testing and experimenting is to throw everything away and start
again. The easiest way to do this is to use the command shown below. This will remove all of your
data, indexes and schema definitions so only do this if you really want to start over.

 执行了一个删除操作,如果您前边加载航线图数据时是使用了脚本januscassandra.groovy ,您就需要再次运行脚本,从而让数据、索引、模式再就绪。

Having done a drop operation, if you previously loaded the air-routes data using the janus
cassandra.groovy script, you will need to run the script again to get the data, indexes and schema
back.
另一件要注意的事就是使用本节中出现的技术时,我们可以让小精灵和杰森图直接连到Cassandra。这就是说我们直接在控制台中执行了命令不需要其它的配置或者其它设置步骤,而不是告诉杰森图怎么去使用一个属性文件去连接Cassandra。在本书稍后讲会介绍小精灵服务器,它将允许您通过HTTP服务器来访问一个前端图。记住杰森图实际上一系列Java的JAR文件。它没有创建时任何它自己的进程,也没有做为一个服务运行。所在在这个例子中,杰森图是运行在小精灵控制台的进程中的。Cassandra当然是做为一个独立的服务运行的。
One other thing to realize is that using the techniques shown in this section we are connecting the
Gremlin Console and JanusGraph directly to Cassandra. This means that we can issue commands
directly from the Gremlin Console without needing to use any additional configuration or setup
steps other than telling JanusGraph how to connect to Cassandra using a properties file. Later in the
book we will introduce the Gremlin Server that allows you to front end a graph with an HTTP
server. Remember also that JanusGraph is really a set of Java libraries (JAR files). It does not create
any processes of its own and does not run as a service. So in this instance JanusGraph is running on
the process of the Gremlin Console. Cassandra of course is running as a standalone service.
6.9.3. 找节点的工具 Finding nodetool
出于一些原因,您需要查看Cassandra配置或者总体的状况,您可以使用nodetool 命令。因为在这个例子中我们使用了一个容器化的Cassandra 代码,为了运行nodetool工具,您需要在容器内启动一个shell 会话。可以使用docker exec命令来执行,如下所示。一旦您在一个容器内,您就会发现nodetool 工具在默认的路径中。下面的例子说明了如何开启一个bsh会话,输入一些nodetool命令。最后我们退出了会话。
If for any reason you need to check on Cassandra settings or overall status, you typically use the
nodetool command. Because in this case we are using a containerized version of the Cassandra
code, to run nodetool you need to start a shell session inside the container. This can be done using
the docker exec command as shown below. Once you are inside the container you will find nodetool
available on the default path. The examples below show how to start a bash session and enter a few
nodetool commands. Finally we exit the session.

 一旦shell进程启动,提示符就会变化,您现在运行在容器内。

Once the shell process has started the prompt will change and you are now running inside the
context of the container.

 我们现在可以输入nodetool命令。笔者已截断了一些输出,为了便于阅读。首先。让我们查看正在运行的Cassandra 的版本信息。

We can now enter nodetool commands. I have truncated the output a bit to aid reading. First, let’s
check the version of Cassandra we are running.

 我们查看并确认Thrift 正在运行着。

Let’s check to see that Thrift is running.

 如果您想要更多的关于总体状况的信息,您可以使用nodetool info命令。笔者截断了一些输出。

If you want more information about the overall state of things you can use the nodetool info
command. I have truncated this output.

 我们在容器内执行了exit命令,它将返回到我们进入容器时的那个Linux终端会话。

Once we are done with the container typing exit will return us to the Linux terminal session we
entered the container from.

 

6.10. 在杰森图中使用外部索引 Using an external index with JanusGraph
杰森图允许使用类似ES,Solr这样的技术创建外部索引。您可以创建这样的索引,如果您需要更加复杂的样式匹配做为您的查询的一部分。这个话题有点超出了本书的主要目标:详细介绍小精灵查询遍历语言和一些技术布署的方式。您可以在杰森图的文档中找到一些详细的解释,在杰森图中如何去创建外部的索引。文档位于如下的URL中https://docs.janusgraph.org/latest/indexes.html and https://docs.janusgraph.org/latest/index
backends.html.
JanusGraph allows an external index to be created using a technology such as ElasticSearch or
Apache Solr. You would create such an index in cases where you need to do more sophisticated
pattern matching as part of a graph query. This topic is currently a little beyond the main focus of
this book which is to give a detailed introduction to the Gremlin Query and Traversal language and
some of the ways that technology can be deployed. You can find a detailed explanation of how to
create an external index in the JanusGraph documentation which is located at the following URLs:
https://docs.janusgraph.org/latest/indexes.html and https://docs.janusgraph.org/latest/index
backends.html.
 
posted on 2022-05-04 17:27  bokeyuannicheng0000  阅读(83)  评论(0)    收藏  举报