最近试用了一段时间Cassandra,将Oracle中的数据导入进来,遇到了问题然后解决问题,收获挺大。在这个过程中,除了设计一个合理的数据模型,再就是使用Cassandra API进行交互了。
对于Cassandra的开发本身而言,这是使用Thrift的好处:支持多语言。坏处也是显而易见的:Thrift API功能过于简单,不具备在生产环境使用的条件。
在Cassandra Wiki页面上,也有基于Thrift API开发的更加高级的API,各个语言都有,具体信息可以参考:http://wiki.apache.org/cassandra/ClientExamples。
1 Thrift Java API
2 hector
Thrift Java API
如果你要使用Cassandra,那么我们必须要了解Thrift API,毕竟所有的其他更加高级的API都是基于这个来包装的。
插入数据需要指定keyspace,ColumnFamily, Column,Key,Value,timestamp和数据同步级别。(如何需要了Cassandra的解数据模型,可以参考《大话Cassandra数据模型》)
* Insert a Column consisting of (column_path.column, value, timestamp) at the given column_path.column_family and optional
* column_path.super_column. Note that column_path.column is here required, since a SuperColumn cannot directly contain binary
* values -- it can only contain sub-Columns.
* @param keyspace
* @param key
* @param column_path
* @param value
* @param timestamp
* @param consistency_level
public void insert(String keyspace, String key, ColumnPath column_path, byte[] value, long timestamp, int consistency_level) throws InvalidRequestException, UnavailableException, TimedOutException, TException;
* Insert Columns or SuperColumns across different Column Families for the same row key. batch_mutation is a
* map<string, list<ColumnOrSuperColumn>> -- a map which pairs column family names with the relevant ColumnOrSuperColumn
* objects to insert.
* @param keyspace
* @param key
* @param cfmap
* @param consistency_level
public void batch_insert(String keyspace, String key, Map<String,List<ColumnOrSuperColumn>> cfmap, int consistency_level) throws InvalidRequestException, UnavailableException, TimedOutException, TException;
* Get the Column or SuperColumn at the given column_path. If no value is present, NotFoundException is thrown. (This is
* the only method that can throw an exception under non-failure conditions.)
* @param keyspace
* @param key
* @param column_path
* @param consistency_level
public ColumnOrSuperColumn get(String keyspace, String key, ColumnPath column_path, int consistency_level) throws InvalidRequestException, NotFoundException, UnavailableException, TimedOutException, TException;
* Perform a get for column_path in parallel on the given list<string> keys. The return value maps keys to the
* ColumnOrSuperColumn found. If no value corresponding to a key is present, the key will still be in the map, but both
* the column and super_column references of the ColumnOrSuperColumn object it maps to will be null.
* @param keyspace
* @param keys
* @param column_path
* @param consistency_level
public Map<String,ColumnOrSuperColumn> multiget(String keyspace, List<String> keys, ColumnPath column_path, int consistency_level) throws InvalidRequestException, UnavailableException, TimedOutException, TException;
* Get the group of columns contained by column_parent (either a ColumnFamily name or a ColumnFamily/SuperColumn name
* pair) specified by the given SlicePredicate. If no matching values are found, an empty list is returned.
* @param keyspace
* @param key
* @param column_parent
* @param predicate
* @param consistency_level
public List<ColumnOrSuperColumn> get_slice(String keyspace, String key, ColumnParent column_parent, SlicePredicate predicate, int consistency_level) throws InvalidRequestException, UnavailableException, TimedOutException, TException;
* Performs a get_slice for column_parent and predicate for the given keys in parallel.
* @param keyspace
* @param keys
* @param column_parent
* @param predicate
* @param consistency_level
public Map<String,List<ColumnOrSuperColumn>> multiget_slice(String keyspace, List<String> keys, ColumnParent column_parent, SlicePredicate predicate, int consistency_level) throws InvalidRequestException, UnavailableException, TimedOutException, TException;
查询Key的取值范围(使用这个功能需要使用order-preserving partitioner)。
* @deprecated; use get_range_slice instead
* @param keyspace
* @param column_family
* @param start
* @param finish
* @param count
* @param consistency_level
public List<String> get_key_range(String keyspace, String column_family, String start, String finish, int count, int consistency_level) throws InvalidRequestException, UnavailableException, TimedOutException, TException;
* returns a subset of columns for a range of keys.
* @param keyspace
* @param column_parent
* @param predicate
* @param start_key
* @param finish_key
* @param row_count
* @param consistency_level
public List<KeySlice> get_range_slice(String keyspace, ColumnParent column_parent, SlicePredicate predicate, String start_key, String finish_key, int row_count, int consistency_level) throws InvalidRequestException, UnavailableException, TimedOutException, TException;
* get property whose value is of type string.
* @param property
public String get_string_property(String property) throws TException;
* get property whose value is list of strings.
* @param property
public List<String> get_string_list_property(String property) throws TException;
* describe specified keyspace
* @param keyspace
public Map<String,Map<String,String>> describe_keyspace(String keyspace) throws NotFoundException, TException;
其中一个比较有意思的查询信息是:token map,通过这个我们可以知道哪些Cassandra Service是可以提供服务的。
* Remove data from the row specified by key at the granularity specified by column_path, and the given timestamp. Note
* that all the values in column_path besides column_path.column_family are truly optional: you can remove the entire
* row by just specifying the ColumnFamily, or you can remove a SuperColumn or a single Column by specifying those levels too.
* @param keyspace
* @param key
* @param column_path
* @param timestamp
* @param consistency_level
public void remove(String keyspace, String key, ColumnPath column_path, long timestamp, int consistency_level) throws InvalidRequestException, UnavailableException, TimedOutException, TException;
import java.util.List;
import java.io.UnsupportedEncodingException;
import org.apache.thrift.transport.TTransport;
import org.apache.thrift.transport.TSocket;
import org.apache.thrift.protocol.TProtocol;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.TException;
import org.apache.cassandra.service.*;
public class CClient
public static void main(String[] args)
throws TException, InvalidRequestException, UnavailableException, UnsupportedEncodingException, NotFoundException
TTransport tr = new TSocket("localhost", 9160);
TProtocol proto = new TBinaryProtocol(tr);
Cassandra.Client client = new Cassandra.Client(proto);
String key_user_id = "逖靖寒的世界";
// insert data
long timestamp = System.currentTimeMillis();
new ColumnPath("Standard1", null, "网址".getBytes("UTF-8")),
new ColumnPath("Standard1", null, "作者".getBytes("UTF-8")),
// read single column
ColumnPath path = new ColumnPath("Standard1", null, "name".getBytes("UTF-8"));
System.out.println(client.get("Keyspace1", key_user_id, path, ConsistencyLevel.ONE));
// read entire row
SlicePredicate predicate = new SlicePredicate(null, new SliceRange(new byte[0], new byte[0], false, 10));
ColumnParent parent = new ColumnParent("Standard1", null);
List<ColumnOrSuperColumn> results = client.get_slice("Keyspace1", key_user_id, parent, predicate, ConsistencyLevel.ONE);
for (ColumnOrSuperColumn result : results)
Column column = result.column;
System.out.println(new String(column.name, "UTF-8") + " -> " + new String(column.value, "UTF-8"));
Hector是基于Thrift Java API包装的一个Java客户端,提供一个更加高级的一个抽象。
package me.prettyprint.cassandra.service;
import static me.prettyprint.cassandra.utils.StringUtils.bytes;
import static me.prettyprint.cassandra.utils.StringUtils.string;
import org.apache.cassandra.service.Column;
import org.apache.cassandra.service.ColumnPath;
public class ExampleClient {
public static void main(String[] args) throws IllegalStateException, PoolExhaustedException,
Exception {
CassandraClientPool pool = CassandraClientPoolFactory.INSTANCE.get();
CassandraClient client = pool.borrowClient("localhost", 9160);
// A load balanced version would look like this:
// CassandraClient client = pool.borrowClient(new String[] {"cas1:9160", "cas2:9160", "cas3:9160"});
try {
Keyspace keyspace = client.getKeyspace("Keyspace1");
ColumnPath columnPath = new ColumnPath("Standard1", null, bytes("网址"));
// insert
keyspace.insert("逖靖寒的世界", columnPath, bytes("http://gpcuster.cnblogs.com"));
// read
Column col = keyspace.getColumn("逖靖寒的世界", columnPath);
System.out.println("Read from cassandra: " + string(col.getValue()));
} finally {
// return client to pool. do it in a finally block to make sure it's executed
1 提供连接池。
2 提供错误处理:当操作失败的时候,Hector会根据系统信息(token map)自动连接另一个Cassandra Service。
3 编程接口容易使用。
4 支持JMX。
1 不支持多线程的环境。
2 keyspace封装过多(数据校验和数据重新封装),如果进行大量的数据操作,这里的消耗需要考虑。
3 错误处理不够人性化:如果所有的Cassandra Service都非常繁忙,那么经过多次操作失败后,最终的结果失败。
1 线程安全。
2 支持自动的多线程查询和插入,提高操作效率。
3 人性化的错误处理机制。
4 避免过多的封装。
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步