四个部分分析:

    • 案例场景
    • 流程解析
    • 0.94-0.96实现方案分析
    • 模拟试验及分析

一、案例场景

转发微博

抱歉,此微博已被作者删除。查看帮助:http://t.cn/zWSudZc

| 转发| 收藏| 评论

本来是不同的短链码,结果删除后,会只在同一个token上操作,也就是 被=zWSudZc  

引发几个操作:

delete zWSudZc mid

decr zWSudZc  shareCount

引起的问题是发现写操作堵死在 zWSudZc这个rowKey上

微博feed如果采用HBase,以mid为rowKey,热门微博的操作也会面临这种问题。分析这个问题前我们先要了解HBase 如何保证写一致性:

  冲突预防:避免分布式的并发写操作,将对特定数据项的所有写操作路由到单个节点上(可以是全局主节点或者分区主节点)。为了避免冲突,数据库必须牺牲网络隔离情况下的可用性。这种方法常用于许多提供强一致性保证的系统(例如大多数关系数据库,HBase,MongoDB)。

可以做如下猜想,单节点更新时:

  • 写操作会lock住读锁
  • 写操作集中执行,排队等待耗时。

二、流程解析

checkAndPut append increment operation in HRegion (HBase 0.94.X)

    • startRegionOperation (lock.readLock().lock());
    • rowLock lock
    • updatesLock.readLock().lock()
    • mvcc begion
    • mvcc finish
    • updatesLock.unLock
    • closeRegionOperation
    • get scan
    • startRegionOperation
    • MultiVersionConsistencyControl.setThreadReadPoint(this.readPt);
    • closeRegionOperation

三种锁区别

region lock updatesLock 都是ReentrantReadWriteLock。ReentrantReadWriteLock 可多读,有写锁被占则阻塞其他所有操作。updatesLock 只在region flush时写锁被占用,region lock 没有出现writelock被占用情况,怀疑无用。rowlock  为MultiVersionConsistencyControl 中 ConcurrentHashMap<HashedBytes, CountDownLatch> 类型,变量名lockedRows  闭锁

MVCC  MultiVersionConsistencyControl

    • 管理memstore的读/写一致性。Use MVCC to make this set of increments/appends atomic to reads
    • 0.94  0.94.2 中是待实现。TODO  in  increment append checkAnd (少一次MVCC,后续流程会看到)
    • 0.96  realized 
    • put operation,目前项目用的比较多的操作
    • 0.94:  HRegion internalPut

三、0.94-0.96实现方案分析

0.94中

    • increment  append  checkAndPut都使用了行锁和mvcc,但put调用的internalPut没有使用行锁,只使用了mvcc
    • 流程:
    • startRegionOperation (lock.readLock().lock());
    • rowLock lock
    • updatesLock.readLock().lock()
    • mvcc begion
    • mvcc finish
    • updatesLock.unLock
    • closeRegionOperation

0.96:

流程:

    (1)  Acquire RowLock

    (1a) BeginMVCC + Finish MVCC

    (2)  Begin MVCC

    (3)  Do work

    (4)  Release RowLock

    (5)  Append to WAL

    (6)  Finish MVCC

wait for all prior MVCC transactions to finish - while we hold the row lock (so that we are guaranteed to see the latest state)

如果版本升级到0.96  由于MVCC的介入  increment操作可能更慢

0.96预计做的改进:

commiter也认为两次mvcc没必要 ,改进流程  https://issues.apache.org/jira/browse/HBASE-7263                                                          

(1)  Acquire RowLock

(1a) Grab+Release RowWriteLock (instead of BeginMVCC + Finish MVCC)

(1b) Grab RowReadLock (new step!)

(2)  Begin MVCC

(3)  Do work

(4)  Release RowLock

(5)  Append to WAL

(6)  Finish MVCC

(7)  Release RowReadLock (new step!)

另外也去掉了client端无用的分配lockid方法

四、模拟试验及分析

  • 构造模拟代码

HBaseInsertTest1类,  TestKeyValueSkipListSet为提取 HBase的KeyValueSkipListSet作为公有类,存储数据使用

package com.daodao.hbase;

import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl;
import org.apache.hadoop.hbase.util.Bytes;

import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;
import java.util.concurrent.locks.ReentrantReadWriteLock;

/**
 * Created with IntelliJ IDEA.
 *
 * @author guanpu
 *         Date: 13-1-9
 *         Time: 下午5:53
 *         分析0.94 insert操作性能
 */
public class HBaseInsertTest1 {
    volatile TestKeyValueSkipListSet kvset;
    final ReentrantReadWriteLock lock =
            new ReentrantReadWriteLock();
    final ReentrantReadWriteLock updatesLock =
            new ReentrantReadWriteLock();
    private final MultiVersionConsistencyControl mvcc =
            new MultiVersionConsistencyControl();
    private static AtomicInteger finishedCount;
    private static AtomicLong mvccTime = new AtomicLong(0l);
    private static AtomicLong rowlockTime = new AtomicLong(0l);
    private static AtomicLong lockTime = new AtomicLong(0l);
    private static AtomicLong updateLockTime = new AtomicLong(0l);
    private static AtomicLong insertTime = new AtomicLong(0l);
    private static AtomicLong releaseTime = new AtomicLong(0l);

    private final ConcurrentHashMap<String, CountDownLatch> lockedRows =
            new ConcurrentHashMap<String, CountDownLatch>();

    public HBaseInsertTest1() {
        kvset = new TestKeyValueSkipListSet(new KeyValue.KVComparator());
        finishedCount = new AtomicInteger(0);
    }

    class HBaseInsertTask implements Runnable {

        public void run() {
            for (int i = 0; i < 100000; i++) {
                String key = "key" + i;
                long time = System.nanoTime();
                MultiVersionConsistencyControl.WriteEntry localizedWriteEntry = null;
                try {


                    lock.readLock().lock();   // like startRegionOperation do
                    lockTime.set(lockTime.get() + (System.nanoTime() - time));

                    time = System.nanoTime();
                    Integer lid = getLock(key);     //get rowKey lock
                    lockTime.set(System.nanoTime() - time);

                    time = System.nanoTime();
                    updatesLock.readLock().lock();
                    updateLockTime.set(updateLockTime.get() + (System.nanoTime() - time));

                    time = System.nanoTime();
                    localizedWriteEntry = mvcc.beginMemstoreInsert();
                    mvccTime.set(mvccTime.get() + (System.nanoTime() - time));

                    time = System.nanoTime();
                    kvset.add(new KeyValue(Bytes.toBytes(key), Bytes.toBytes("f"), Bytes.toBytes("column"),
                            1l, Bytes.toBytes(1l)));
                    insertTime.set(insertTime.get() + (System.nanoTime() - time));

                    time = System.nanoTime();
                    mvcc.completeMemstoreInsert(localizedWriteEntry);
                    mvccTime.set(mvccTime.get() + (System.nanoTime() - time));
                } catch (Exception e) {
                    System.out.println(e);
                } finally {
                    time = System.nanoTime();
                    updatesLock.readLock().unlock();

                    CountDownLatch rowLatch = lockedRows.remove(key);
                    rowLatch.countDown();

                    lock.readLock().unlock();
                    releaseTime.set(releaseTime.get() + (System.nanoTime() - time));

                }
            }
            finishedCount.set(finishedCount.get() + 1);
            return;
        }

        private Integer getLock(String key) {
            CountDownLatch rowLatch = new CountDownLatch(1);

            // loop until we acquire the row lock (unless !waitForLock)
            while (true) {

                CountDownLatch existingLatch = lockedRows.putIfAbsent(key, rowLatch);
                if (existingLatch == null) {
                    break;
                } else {
                    try {
                        if (!existingLatch.await(30000,
                                TimeUnit.MILLISECONDS)) {
                            System.out.println("some thing wrong in waiting");
                            return null;
                        }
                    } catch (InterruptedException ie) {
                        // Empty
                    }
                }
            }
            return 1;
        }
    }

    private class DaodaoTestWatcher implements Runnable {

        @Override
        public void run() {
            long time = System.nanoTime();
            while (finishedCount.get() != 50) {

            }
            System.out.println("cost time:" + (System.nanoTime() - time) / 1000000000.0);
            System.out.println("cost time:  mvcc" + mvccTime.get() / 1000000000.0 / 50);
            System.out.println("cost time:  lock" + lockTime.get() / 1000000000.0 / 50);
            System.out.println("cost time:  update" + updateLockTime.get() / 1000000000.0 / 50);
            System.out.println("cost time:  rowlock" + rowlockTime.get() / 1000000000.0 / 50);
            System.out.println("cost time:  release" + releaseTime.get() / 1000000000.0 / 50);
        }
    }

    public void test() {
        ExecutorService executorService = Executors.newFixedThreadPool(200);
        for (int i = 0; i < 50; i++)
            executorService.execute(new HBaseInsertTask());
        executorService.execute(new DaodaoTestWatcher());


    }

    public static void main(String[] args) {
        new HBaseInsertTest1().test();
    }


}

 

耗时:

cost time:24.727145
cost time: mvcc22.98698292
cost time: lock0.0
cost time: update0.009690879999999999
cost time: rowlock0.0
cost time: release0.05001874

去掉mvcc

cost time:5.190751
cost time:  mvcc0.0073236
cost time:  lock0.0
cost time:  update0.017533220000000002
cost time:  rowlock0.0
cost time:  release1.3753079

0.96代码,在 updatesLock.readLock().lock(); 之后 增加:

                     time = System.nanoTime();
                    // wait for all prior MVCC transactions to finish - while we hold the row lock
                    // (so that we are guaranteed to see the latest state)
                    mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
                    mvccTime.set(mvccTime.get() + (System.nanoTime() - time));

耗时:

cost time:43.04134
cost time:  mvcc40.70520202
cost time:  lock0.0
cost time:  update0.00937416
cost time:  rowlock0.0
cost time:  release0.05023072

 

0.94中 increment  append  checkAndPut都使用了行锁和mvcc,但put调用的internalPut没有使用行锁,只使用了mvcc

优化方案:对于单版本服务,可以都更改为加行锁,去掉mvcc,写性能会获得进一步提升。

 

如果rowkey改为固定单个rowkey

0.94版本 耗时 (各个均为总耗时):

cost time:27.660935
cost time: mvcc3.888678
cost time: lock0.0
cost time: insert9.319777
cost time: update0.964697
cost time: rowlock0.0
cost time: release16.997803

但实际跑HBase插入时key变化耗时比不变key 快4倍,

跑standalone单机测试,两者速度基本相同。性能消耗应该在寻找region或网络传输,需要进一步验证。

总结:

    • region更新耗时主要集中在MVCC
    • 单版本的数据库,我认为可以去掉各种更新操作的MVCC,在修改操作中预先获取rowkey的写锁即可,避免全Region范围的MVCC
    • 从客户端到HBase的单rowkey 整体流程瓶颈 还需要进一步探索真实分布式环境下的状况。

 

----------------------------------------扩展----------------------------------

MySQL MVCC  by @曾经的阿飞(军伟)

MySQL5.6对与read-trasanction的优化,http://t.cn/zjnPhdq,将trx_list拆分成ro_trx_list和rw_trx_list,创建read-view只需对rw_trx_list进行snapshot,因此读事务不会影响到read-view的snapshot从而不会制约trx_sys_t::mutex,可以提高性能。@yangwm @慢半拍de刀刀 @启盼cobain @jolestar @蔚1984

mvcc 原理

1、与行级锁对应

行级锁 悲观锁

   R  W

R  y  n

W  n  n

MVCC

保存version

更新10 v,读取9 v

扩展知识:乐观锁

select -》  update  =》 再select看 是否有改动,如果有则rollback; 适用于冲突比较少的情况。

redis服务器端 是否也 实现了乐观锁。  ---- 待确认  单线程串行方式是否需要加锁?

2、 innodb mvcc

每行纪录有tx_id  rollback_point  两个字段去做控制,从而实现。

table  : row  c1 c2  tx_id  rollback_point                

rollback_point 指向上一个版本的纪录。

mysql 隔离级别  四种:read onCommit(读到没有提交的事务) 、read Committed(只能读到已提交的数据,从当前active transaction list中判断,从指针回溯)、 repeatable read(可重复读)、Serializable(串行化,所有语句加 select for update,后台加锁)

Read View 小于 active transaction 则正常读。  Read View有间隙 ,读到中间版本也时正确的。

 

 非Serializable 时,需要手动调用

 

@蔚1984 的 http://boneylw.sinaapp.com/?p=16  MVCC分析也可以对比阅读一下。

 

posted on 2013-01-16 17:24  架构师刀哥  阅读(1439)  评论(0编辑  收藏  举报