Titan DB的一些问题

使用熟悉一点的系统来测试TitanDB，HBASE+ES，记录下来一些小tips。

1、首先TitanDB支持的Hadoop只有1.2.1，所以Hbase自然也只能取到0.98，虽然官网上提供了titan-1.0-hadoop2，但是并不好用，向hbase存数据时会报错，原因是因为hadoop1的configure格式和hadoop2的不同，创建的config hbase和hadoop没法用，只能退回到上述版本。（ES包是1.5.1，建议使用1.5.2避免奇怪的错误）

2、使用gremlin按照官方文档上的方法进行添加索引(参照官方文档第8节)

mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.buildIndex('byNameAndAgeComposite', Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()
//Wait for the index to become available
mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').call()
mgmt.awaitGraphIndexStatus(graph, 'byNameAndAgeComposite').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"), SchemaAction.REINDEX).get()
mgmt.commit()

　在执行完mgmt.commit()之后，第一个事物会关闭，一定要重新开一个management才能使用updateIndex。这和0.5.1版本不同

3、JAVA API在添加索引这里有个问题，titan建索引大致是这样：

　　（1）创建一个空索引，将其状态设置为registered（mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()）

　　（2）修改索引状态，将其状态设置为install（mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').call()）

　　（3）将表中现有数据reindex（mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()）

　　JAVA API中没有找到将索引状态转化为install的方法，还在摸索。但是使用gremlin创建的索引在java使用查询时是可以正确使用的。

　　这个titan由于活跃度较低，用eclipse import mvn project的方式导入出了各种各样的错误，最后只能用最原始的办法：下载源码包，然后将所有依赖包加进来，虽然还有编译错误，但是至少不影响代码阅读了，找到了Titanfactory.open方法，发现如下：

    public static TitanGraph open(ReadConfiguration configuration) {
        return new StandardTitanGraph(new GraphDatabaseConfiguration(configuration));
    }

　　找到StandardTitanGraph的openManagement方法

    public TitanManagement openManagement() {
        return new ManagementSystem(this,backend.getGlobalSystemConfig(),backend.getSystemMgmtLog(), mgmtLogger, schemaCache);
    }

　　好吧，原来是TitanManagement的子类，找到ManagementSystem类，查看其源码发现：

    public static GraphIndexStatusWatcher awaitGraphIndexStatus(TitanGraph g, String graphIndexName) {
        return new GraphIndexStatusWatcher(g, graphIndexName);
    }

　　原来还是个静态方法，在gremlin中使用的个mgmt.awaitGraphIndexStatus来更改index状态，而在api中是调用静态方法ManagementSystem.awaitGraphIndexStatus(g, indexname)来更改的。。

　　不过感觉有点奇怪，我使用updateindex方法更改其状态，titan竟然是把这个更改放在了触发器里而不是直接更改，按理说在不commit的时候是不会涉及到修改底层数据的，为什么要做成触发而且写在log类里？

　　现在一套走下来没什么问题了，Titan+HBASE+ES的代码如下（scala）：

    val g = TitanFactory.open(conf)
    var mgmt = g.openManagement
    val name = mgmt getPropertyKey "movieId"
    var index = mgmt.buildIndex("movie2WIndex666", classOf[Vertex]).addKey(name).buildCompositeIndex
    mgmt.updateIndex(index, SchemaAction.REGISTER_INDEX)
    mgmt.commit
    val ms = ManagementSystem.awaitGraphIndexStatus(g, "movie2WIndex666").call
    println(ms.getTargetStatus)
    mgmt = g.openManagement
    index = mgmt.getGraphIndex("movie2WIndex666")
    println(index.getIndexStatus(name))
    mgmt.updateIndex(index,  SchemaAction.REINDEX).get
    mgmt.commit
    val res = g.traversal().V().has("movieId",12345).out()
    println(res)
    g.close

4、这个和HBASE连接使用的时get方法，每次get一条数据，所以在没索引的前提下1秒只能检索100条左右的数据，测试时18万条的数据做一遍g.V().has(XX)需要34分钟左右，建立好索引的话查询一条只需要200ms左右。

　　建立索引时也是做一遍scan（这里是逐条get），所以百万级的数据对一个属性做CompositeIndex需要好几个小时- -mix索引和更大规模的数据集总感觉有点不对劲。

5、由于上述原因，hbase的连接超时要设置的很长，目前我设置的为180000秒，配置文件如下。

<property>
 <name>hbase.rootdir</name>
 <value>hdfs://cloud12:9000/hbase</value>
</property>
<property>
 <name>hbase.cluster.distributed</name>
 <value>true</value>
</property>
 <property>
 <name>hbase.master</name>
 <value>cloud12:60000</value>
 </property>
 <property>
 <name>hbase.zookeeper.property.dataDir</name>
 <value>/home/Titan/hbase/zookeeperDir</value>
 </property>
 <property>
 <name>hbase.zookeeper.quorum</name>
 <value>192.168.12.148</value>
 </property>
 <property>
 <name>hbase.regionserver.lease.period</name>
 <value>180000000</value>
 </property>
 <property>
 <name>hbase.rpc.timeout</name>
 <value>180000000</value>
 </property>

posted @ 2016-11-28 10:49 月影舞华阅读(2702) 评论(0) 编辑收藏举报

刷新页面返回顶部

月影舞华

Titan DB的一些问题

公告