使用YCSB对HBase进行测试
YCSB介绍
YCSB(Yahoo! Cloud Serving Benchmark)是雅虎开源的一款通用的性能测试工具。
通过这个工具我们可以对各类NoSQL产品进行相关的性能测试,包括:
关于YCSB的相关说明可以参考:
YCSB与HBase自带的性能测试工具(PerformanceEvaluation)相比,好处在于:
- 扩展:进行性能测试的客户端不仅仅只是HBase一款产品,而且可以是HBase不同的版本。
- 灵活:进行性能测试的时候,可以选择进行测试的方式:read+write,read+scan等,还可以选择不同操作的频度与选取Key的方式。
- 监控:
- 进行性能测试的时候,可以实时显示测试进行的进度:
-
1340
sec:
751515
operations;
537.74
current ops/sec; [INSERT AverageLatency(ms)=
1.77
]
1350
sec:
755945
operations;
442.82
current ops/sec; [INSERT AverageLatency(ms)=
2.18
]
1360
sec:
761545
operations;
559.72
current ops/sec; [INSERT AverageLatency(ms)=
1.71
]
1370
sec:
767616
operations;
606.92
current ops/sec; [INSERT AverageLatency(ms)=
1.58
]
- 测试完成之后,会显示整体的测试情况:
-
[OVERALL], RunTime(ms),
1762019.0
[OVERALL], Throughput(ops/sec),
567.5307700995279
[INSERT], Operations,
1000000
[INSERT], AverageLatency(ms),
1.698302
[INSERT], MinLatency(ms),
0
[INSERT], MaxLatency(ms),
14048
[INSERT], 95thPercentileLatency(ms),
2
[INSERT], 99thPercentileLatency(ms),
3
[INSERT], Return=
0
,
1000000
[INSERT],
0
,
29
[INSERT],
1
,
433925
[INSERT],
2
,
549176
[INSERT],
3
,
10324
[INSERT],
4
,
3629
[INSERT],
5
,
1303
[INSERT],
6
,
454
[INSERT],
7
,
140
YCSB的不足在于:
自带的workload模型还是过于简单,不提供MR的形式进行测试,所以进行测试的时候如果希望开启多线程的方式会比较麻烦。
比如导入的时候开启多线程只能是启动多个导入进程,然后在不同的启动参数中指定“开始Key的值”。在进行Transaction测试的时候,只能在多台机器开启多个线程来操作。
使用YCSB测试HBase-0.90.4
从官网下载YCSB-0.1.3的源代码
http://github.com/brianfrankcooper/YCSB/tarball/0.1.3
编译YCSB
hdfs@hd0004-sw1 guopeng$ cd YCSB-0.1.3/ hdfs@hd0004-sw1 YCSB-0.1.3$ pwd /home/hdfs/guopeng/YCSB-0.1.3 hdfs@hd0004-sw1 YCSB-0.1.3$ ant Buildfile: /home/hdfs/guopeng/YCSB-0.1.3/build.xml compile: javac /home/hdfs/guopeng/YCSB-0.1.3/build.xml:50: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds makejar: BUILD SUCCESSFUL Total time: 0 seconds hdfs@hd0004-sw1 YCSB-0.1.3$
由于YCSB自带的HBase客户端代码有一些兼容性的问题,所以我们使用如下的代码替换YCSB中自带的文件(db/hbase/src/com/yahoo/ycsb/db/HBaseClient.java):
package com.yahoo.ycsb.db;
import java.io.IOException;
import java.util.ConcurrentModificationException;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import java.util.Vector;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
import com.yahoo.ycsb.DBException;
/**
* HBase client for YCSB framework
* @see http://blog.data-works.org
* @see http://gpcuster.cnblogs.com/
*/
public class HBaseClient extends com.yahoo.ycsb.DB {
private static final Configuration config = new Configuration();
static {
config.addResource("hbase-default.xml");
config.addResource("hbase-site.xml");
}
public boolean _debug = false;
public String _table = "";
public HTable _hTable = null;
public String _columnFamily = "";
public byte _columnFamilyBytes[];
public static final int Ok = 0;
public static final int ServerError = -1;
public static final int HttpError = -2;
public static final int NoMatchingRecord = -3;
public static final Object tableLock = new Object();
/**
* Initialize any state for this DB. Called once per DB instance; there is
* one DB instance per client thread.
*/
public void init() throws DBException {
if ((getProperties().getProperty("debug") != null)
&& (getProperties().getProperty("debug").compareTo("true") == 0)) {
_debug = true;
}
_columnFamily = getProperties().getProperty("columnfamily");
if (_columnFamily == null) {
System.err
.println("Error, must specify a columnfamily for HBase table");
throw new DBException("No columnfamily specified");
}
_columnFamilyBytes = Bytes.toBytes(_columnFamily);
// read hbase client settings.
for (Object key : getProperties().keySet()) {
String pKey = key.toString();
if (pKey.startsWith("hbase.")) {
String pValue = getProperties().getProperty(pKey);
if (pValue != null) {
config.set(pKey, pValue);
}
}
}
}
/**
* Cleanup any state for this DB. Called once per DB instance; there is one
* DB instance per client thread.
*/
public void cleanup() throws DBException {
try {
if (_hTable != null) {
_hTable.flushCommits();
}
} catch (IOException e) {
throw new DBException(e);
}
}
public void getHTable(String table) throws IOException {
synchronized (tableLock) {
_hTable = new HTable(config, table);
}
}
/**
* Read a record from the database. Each field/value pair from the result
* will be stored in a HashMap.
*
* @param table
* The name of the table
* @param key
* The record key of the record to read.
* @param fields
* The list of fields to read, or null for all of them
* @param result
* A HashMap of field/value pairs for the result
* @return Zero on success, a non-zero error code on error
*/
public int read(String table, String key, Set<String> fields,
HashMap<String, String> result) {
// if this is a "new" table, init HTable object. Else, use existing one
if (!_table.equals(table)) {
_hTable = null;
try {
getHTable(table);
_table = table;
} catch (IOException e) {
System.err.println("Error accessing HBase table: " + e);
return ServerError;
}
}
Result r = null;
try {
if (_debug) {
System.out.println("Doing read from HBase columnfamily "
+ _columnFamily);
System.out.println("Doing read for key: " + key);
}
Get g = new Get(Bytes.toBytes(key));
if (fields == null) {
g.addFamily(_columnFamilyBytes);
} else {
for (String field : fields) {
g.addColumn(_columnFamilyBytes, Bytes.toBytes(field));
}
}
r = _hTable.get(g);
} catch (IOException e) {
System.err.println("Error doing get: " + e);
return ServerError;
} catch (ConcurrentModificationException e) {
// do nothing for now...need to understand HBase concurrency model
// better
return ServerError;
}
for (KeyValue kv : r.raw()) {
result.put(Bytes.toString(kv.getQualifier()),
Bytes.toString(kv.getValue()));
if (_debug) {
System.out.println("Result for field: "
+ Bytes.toString(kv.getQualifier()) + " is: "
+ Bytes.toString(kv.getValue()));
}
}
return Ok;
}
/**
* Perform a range scan for a set of records in the database. Each
* field/value pair from the result will be stored in a HashMap.
*
* @param table
* The name of the table
* @param startkey
* The record key of the first record to read.
* @param recordcount
* The number of records to read
* @param fields
* The list of fields to read, or null for all of them
* @param result
* A Vector of HashMaps, where each HashMap is a set field/value
* pairs for one record
* @return Zero on success, a non-zero error code on error
*/
public int scan(String table, String startkey, int recordcount,
Set<String> fields, Vector<HashMap<String, String>> result) {
// if this is a "new" table, init HTable object. Else, use existing one
if (!_table.equals(table)) {
_hTable = null;
try {
getHTable(table);
_table = table;
} catch (IOException e) {
System.err.println("Error accessing HBase table: " + e);
return ServerError;
}
}
Scan s = new Scan(Bytes.toBytes(startkey));
// HBase has no record limit. Here, assume recordcount is small enough
// to bring back in one call.
// We get back recordcount records
s.setCaching(recordcount);
// add specified fields or else all fields
if (fields == null) {
s.addFamily(_columnFamilyBytes);
} else {
for (String field : fields) {
s.addColumn(_columnFamilyBytes, Bytes.toBytes(field));
}
}
// get results
ResultScanner scanner = null;
try {
scanner = _hTable.getScanner(s);
int numResults = 0;
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
// get row key
String key = Bytes.toString(rr.getRow());
if (_debug) {
System.out.println("Got scan result for key: " + key);
}
HashMap<String, String> rowResult = new HashMap<String, String>();
for (KeyValue kv : rr.raw()) {
rowResult.put(Bytes.toString(kv.getQualifier()),
Bytes.toString(kv.getValue()));
}
// add rowResult to result vector
result.add(rowResult);
numResults++;
if (numResults >= recordcount) // if hit recordcount, bail out
{
break;
}
} // done with row
}
catch (IOException e) {
if (_debug) {
System.out
.println("Error in getting/parsing scan result: " + e);
}
return ServerError;
}
finally {
scanner.close();
}
return Ok;
}
/**
* Update a record in the database. Any field/value pairs in the specified
* values HashMap will be written into the record with the specified record
* key, overwriting any existing values with the same field name.
*
* @param table
* The name of the table
* @param key
* The record key of the record to write
* @param values
* A HashMap of field/value pairs to update in the record
* @return Zero on success, a non-zero error code on error
*/
public int update(String table, String key, HashMap<String, String> values) {
// if this is a "new" table, init HTable object. Else, use existing one
if (!_table.equals(table)) {
_hTable = null;
try {
getHTable(table);
_table = table;
} catch (IOException e) {
System.err.println("Error accessing HBase table: " + e);
return ServerError;
}
}
if (_debug) {
System.out.println("Setting up put for key: " + key);
}
Put p = new Put(Bytes.toBytes(key));
for (Map.Entry<String, String> entry : values.entrySet()) {
if (_debug) {
System.out.println("Adding field/value " + entry.getKey() + "/"
+ entry.getValue() + " to put request");
}
p.add(_columnFamilyBytes, Bytes.toBytes(entry.getKey()),
Bytes.toBytes(entry.getValue()));
}
try {
_hTable.put(p);
} catch (IOException e) {
if (_debug) {
System.err.println("Error doing put: " + e);
}
return ServerError;
} catch (ConcurrentModificationException e) {
// do nothing for now...hope this is rare
return ServerError;
}
return Ok;
}
/**
* Insert a record in the database. Any field/value pairs in the specified
* values HashMap will be written into the record with the specified record
* key.
*
* @param table
* The name of the table
* @param key
* The record key of the record to insert.
* @param values
* A HashMap of field/value pairs to insert in the record
* @return Zero on success, a non-zero error code on error
*/
public int insert(String table, String key, HashMap<String, String> values) {
return update(table, key, values);
}
/**
* Delete a record from the database.
*
* @param table
* The name of the table
* @param key
* The record key of the record to delete.
* @return Zero on success, a non-zero error code on error
*/
public int delete(String table, String key) {
// if this is a "new" table, init HTable object. Else, use existing one
if (!_table.equals(table)) {
_hTable = null;
try {
getHTable(table);
_table = table;
} catch (IOException e) {
System.err.println("Error accessing HBase table: " + e);
return ServerError;
}
}
if (_debug) {
System.out.println("Doing delete for key: " + key);
}
Delete d = new Delete(Bytes.toBytes(key));
try {
_hTable.delete(d);
} catch (IOException e) {
if (_debug) {
System.err.println("Error doing delete: " + e);
}
return ServerError;
}
return Ok;
}
}
修改后的HBase客户端可以直接在命令行中指定测试需要使用的客户端参数,如zk的连接信息:-p hbase.zookeeper.quorum=hd0004-sw1.dc.sh-wgq.sdo.com,hd0001-sw1.dc.sh- wgq.sdo.com,客户端的本地缓存大小:-p hbase.client.write.buffer=100,等等。
然后拷贝编译和使用依赖的Jar包和配置信息。
[hdfs@hd0004-sw1 YCSB-0.1.3]$ cp ~/hbase-current/*.jar ~/hbase-current/lib/*.jar ~/hbase-current/conf/hbase-*.xml db/hbase/lib/ [hdfs@hd0004-sw1 YCSB-0.1.3]$
现在就可以编译HBase的客户端:
[hdfs@hd0004-sw1 YCSB-0.1.3]$ ant dbcompile-hbase Buildfile: /home/hdfs/guopeng/YCSB-0.1.3/build.xml compile: [javac] /home/hdfs/guopeng/YCSB-0.1.3/build.xml:50: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds makejar: dbcompile-hbase: dbcompile: [javac] /home/hdfs/guopeng/YCSB-0.1.3/build.xml:63: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds makejar: BUILD SUCCESSFUL Total time: 0 seconds [hdfs@hd0004-sw1 YCSB-0.1.3]$
最后要建立测试用的HBase表(usertable):
hbase(main):004:0> create 'usertable', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'} 0 row(s) in 1.2940 seconds
这样环境就准备好了。
然后使用下面的命令就可以开始导入需要测试的数据了:
java -cp build/ycsb.jar:db/hbase/lib/* com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=f1 -p recordcount=1000000 -p hbase.zookeeper.quorum=hd0004-sw1.dc.sh-wgq.sdo.com,hd0001-sw1.dc.sh-wgq.sdo.com,hd0003-sw1.dc.sh-wgq.sdo.com,hd0149-sw18.dc.sh-wgq.sdo.com,hd0165-sw13.dc.sh-wgq.sdo.com -s