分布式交易系统的并发处理, 以及用Redis和Zookeeper实现分布式锁

交易系统

交易系统的数据结构

支付系统API通常需要一个“订单号”作为入参, 而实际调用API接口时使用到的往往不是真正意义的业务订单号, 而是交易订单号. 支付系统的API会使用“商户号+订单号”唯一的标准来设计, 对于商户方就需要做对应的逻辑来保证业务的一致性. 这里就引入了交易订单表, 一个业务订单在支付时会创建一条交易订单，这笔交易订单会关联业务订单，并将交易订单号发给支付系统, 根据结果处理资金账户数据和业务订单数据. 由于是调用远程接口, 有同步也有异步, 结果会出现各种各样的情况，如成功, 失败, 等待, 超时等等, 因此一笔业务订单可能对应多条交易订单，每一条交易订单会对应一个或多个请求结果.

交易的模式

交易主要有三个模式: 同步请求, 异步请求, 还有查询. 对于一些系统, 还有批量提交的请求方式, 这个可以归为异步请求这一类.

对于同步的交易, 可能会在发出请求后收到成功, 失败, 未知三种情况;
对于异步的交易, 可能会收到成功, 失败两种情况;
对于查询, 可能会收到成功, 失败, 未知三种情况, 和同步交易一样.

交易的结果处理

成功: 更新交易订单状态, 记录结果, 根据实际业务处理.
失败: 更新交易订单状态, 记录结果, 根据实际业务, 创建新的交易订单或者将业务订单置为失败.
未知: 不做操作, 等待异步通知, 或通过时间任务异步查询, 或加入队列进行异步查询.

需要注意的是, 对于有多种返回结果代码的支付系统, 一定要明确各个代码的归类, 属于"成功"和"失败"的代码不能出现偏差. 在通道方的结果代码有调整时, 要及时更新.

交易的事务控制

交易的事务应当仅仅局限于本地方法, 中间不能有远程调用, 因为远程接口不可控, 更不可能在事务失败时跟随本地调用一起回滚. 另外还可能导致本地资源一直被占用, 尤其是数据库连接.

并发问题

单机的并发可以通过synchronized或者Lock解决(全局一致性), 也可以通过乐观锁解决(最终一致性), 同时使用队列降低系统突发压力. 这个比较简单就不说明了.

对于分布式系统的并发, 可以通过以下途径解决:

乐观锁

乐观锁是通过数据库入库时, 校验数据版本的一致性来达到业务最终一致性的一种手段, 适用于单机分布式等各种环境, 好处是实现简单, 读性能非常好, 缺点也很明显, 在业务的交易链较长时, 一个回滚可能会导致整个上层交易失败, 这样的情况虽然能保证资金不出错, 但是系统的交易频率越高, 特别是同一个资金账户的交易频率越高, 乐观锁失败的概率就越大, 重试次数多了, 就会影响业务效率. 所以乐观锁只是一个基础, 仅保证业务数据不出错, 要解决并发问题还需要靠其他手段.

分布式锁

分布式锁可以基于db, redis, zookeeper等实现. 最简单的锁实现的是lock和unlock功能, 实际应用中, 还需要两个功能: 一个Reentrant 以实现同线程重入, 和一个Timeout 以实现在某个实例出现异常时, 不至于导致整个交易被永久挂起. 常见的是用Redis或Zookeeper做的分布式锁. 在资金账户数量很大, 交易较分散的情况下, 分布式锁起到了"微队列"的作用, 对处理效率的影响较小.

消息队列

消息队列常用于业务间和模块间的性能优化. 通过队列缓冲突发负载, 对请求进行排重减少实际交易的请求, 以及序列化请求避免资源冲突. 模式简单的业务可以使用Redis的rpush+blpop做简易队列, 需求较复杂的场景, 可以使用rabbitmq.

Redisson的分布式锁

借助Redisson的getLock和getReadWriteLock方法, 对同线程可重入, 可以设置锁超时, 可以设置取锁超时, 锁本身有默认30秒的超时

public class LockManagerImpl implements LockManager {
    private final Redisson redisson;

    public LockManagerImpl(ZookeeperManager zookeeperManager) {
        Map<String, ZookeeperValue> settings = zookeeperManager.load("/lock");
        Config config = new Config();
        config
                .useSingleServer()
                .setAddress(settings.get("address").getString("redis://127.0.0.1:6379"))
                .setTimeout(settings.get("timeout").getInteger(3000))
                .setPassword(settings.get("password").getString(null));
        redisson = (Redisson) Redisson.create(config);
    }

    public void init() {
        logger.debug("init()");
    }

    public void destroy() {
        logger.debug("destroy()");
    }

    @Override
    public Lock getLock(String key) {
        return redisson.getLock(key);
    }

    @Override
    public ReadWriteLock getReadWriteLock(String key) {
        return redisson.getReadWriteLock(key);
    }
}

Jedis实现的分布式锁

Redis版本小于2.6的, 借助 SETNX 命令, 只有当key不存在时才能set成功, 这只是一个简单的实现, 有超时, 但是不能同一线程重入. 具体代码参考 https://github.com/abelaska/jedis-lock

Redis版本2.6.0之后, 增加了内置LUA语言支持, 可以通过EVAL()和EVALSHA()执行LUA脚本. 通过LUA脚本实现的分布式锁, 可以更好的支持线程重入和超时的特性.

获取锁的代码

实际上是实现了一个spinlock, 不断尝试获取锁, 直至超时. 锁在Redis中体现为一个hash, hash的名称就是资源ID, hash的超时时间就是锁的超时时间, hash里的key是 uuid + 线程ID, value是当前线程重入的数量.

    // Note: index of key&argv starts from 1
    private static final String COMMAND_LOCK =
            "if (redis.call('exists', KEYS[1]) == 0) then " +
                "redis.call('hset', KEYS[1], ARGV[1], 1); " +
                "redis.call('pexpire', KEYS[1], ARGV[2]); " +
                "return 1; " +
            "end; " +
            "if (redis.call('hexists', KEYS[1], ARGV[1]) == 1) then " +
                "local counter = redis.call('hincrby', KEYS[1], ARGV[1], 1); " +
                "redis.call('pexpire', KEYS[1], ARGV[2]); " +
                "return counter; " +
            "end; " +
            "return nil;";

    public boolean acquire() {
        int timeout = acquiryTimeoutInMillis;
        while (timeout >= 0) {
            Object result = client.eval(COMMAND_LOCK, 1, lockKeyPath, getId(), lockExpiryInMillis + "");
            if (result == null) {
                timeout -= DEFAULT_ACQUIRY_RESOLUTION_MILLIS;
                try {
                    Thread.sleep(DEFAULT_ACQUIRY_RESOLUTION_MILLIS);
                } catch (InterruptedException e) {
                    // Do nothing
                }
            } else {
                this.counter = (Long)result;
                return true;
            }

        }
        return false;
    }

释放锁的代码

释放锁时会减少线程的重入数量, 当重入数量为0时, 才删除锁.

    private static final String COMMAND_UNLOCK =
            "if (redis.call('hexists', KEYS[1], ARGV[1]) == 0) then " +
                "return nil;" +
            "end; " +
            "local counter = redis.call('hincrby', KEYS[1], ARGV[1], -1); " +
            "if (counter > 0) then " +
                "redis.call('pexpire', KEYS[1], ARGV[2]); " +
                "return counter; " +
            "else " +
                "redis.call('del', KEYS[1]); " +
                "return 0; "+
            "end; " +
            "return nil;";

    public void release() {
        Object result = client.eval(COMMAND_UNLOCK, 1, lockKeyPath, getId(), lockExpiryInMillis + "");
        if (result == null) {
            this.counter = 0;
        } else {
            this.counter = (Long)result;
        }
    }

具体的代码参考 https://github.com/MiltonLai/jedis-lock

Zookeeper实现的分布式锁

利用了Zookeeper的Watcher机制. 在Zookeeper中节点类型使用 EPHEMERAL_SEQUENTIAL, 这种类型当客户端无效后会自动删除, 并且同名节点会通过后缀数字增长进行添加. 这样实际上维护了两个序列: 在Zookeeper中会保持一个同名但是后缀数字不断增长的序列, 而在本地是线程序列, 使用一个同步的lock对getChildren进行竞争. 每一个本地线程都会在zookeeper中创建一个带序列号的节点, 同时等待资源锁被释放, 当拿到资源锁时, 判断自己是不是top的那个节点, 如果不是就释放资源锁, 继续等待. 如果是就说明拿到业务锁了, 在业务执行完之后, 要调用unlock释放业务锁, 触发watcher事件. 如果拿到业务锁的线程中途退出了并未执行unlock, zookeeper在检查到客户节点退出后, 也会将对应的节点删除, 也会触发watcher事件.

public class DistributedLock {
    private final ZooKeeper zk;
    private final String lockBasePath;
    private final String lockName;
    private String lockPath;

    public DistributedLock(ZooKeeper zk, String lockBasePath, String lockName) {
        this.zk = zk;
        this.lockBasePath = lockBasePath;
        this.lockName = lockName;
    }

    public void lock() throws IOException {
        try {
            // lockPath will be different than (lockBasePath + "/" + lockName) becuase of the sequence number ZooKeeper appends
            lockPath = zk.create(lockBasePath + "/" + lockName, null, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
            final Object lock = new Object();
            // The requests in the same jvm will be blocked here waiting for wait() or notifyAll(). This will prevent missing notifications.
            synchronized(lock) {
                while(true) {
                    List<String> nodes = zk.getChildren(lockBasePath, new Watcher() {
                        @Override
                        public void process(WatchedEvent event) {
                            synchronized (lock) {
                                // When the brother nodes are changed, all waiting threads will be notified.
                                lock.notifyAll();
                            }
                        }
                    });
                    Collections.sort(nodes); // ZooKeeper node names can be sorted lexographically
                    if (lockPath.endsWith(nodes.get(0))) {
                        return;
                    } else {
                        // This will give up the lock and wait the next notification. When woken up, it will go through the WHILE block again
                        lock.wait();
                    }
                }
            }
        } catch (KeeperException e) {
            throw new IOException (e);
        } catch (InterruptedException e) {
            throw new IOException (e);
        }
    }

    public void unlock() throws IOException {
        try {
            // This will trigger the Watcher.process()
            zk.delete(lockPath, -1);
            lockPath = null;
        } catch (KeeperException e) {
            throw new IOException (e);
        } catch (InterruptedException e) {
            throw new IOException (e);
        }
    }
}

Jedis实现的队列

利用 Redis 的 LIST 类型数据的 RPUSH 和 BLPOP 方法实现消息的生产和消费

public long rpush(final String... value) {
    if (value == null) return -1;
    return (Long) execute((Jedis jedis) -> jedis.rpush(getId(), value));
}

public long rpushObject(final Object value) {
    if (value == null) return -1;
    return (Long) execute((Jedis jedis) -> jedis.rpush(getId().getBytes(), SerializeUtil.serialize(value)));
}

public long rpushObject(final Object... value) {
    if (value == null || value.length == 0) return -1;
    return (Long) execute((Jedis jedis) -> jedis.rpush(getId().getBytes(), SerializeUtil.serialize(value)));
}

public List<String> blpop(int timeout) {
    return (List<String>) execute((Jedis jedis)-> jedis.blpop(timeout, getId()));
}

public List<Object> blpopObject(int timeout) {
    return (List<Object>)execute((Jedis jedis) -> {
        List<Object> objects = new ArrayList<>();
        List<byte[]> bytesList = jedis.blpop(timeout, getId().getBytes());
        for (byte[] bytes : bytesList) {
            objects.add(SerializeUtil.unserialize(bytes));
        }
        return objects;
    });
}

业务中使用队列

@Override
public long lRpush(String id, String value) {
    return factory.getList(id).rpush(value);
}

@Override
public long lRpushObject(String id, Object value) {
    return factory.getList(id).rpushObject(value);
}

@Override
public List<String> lBlpop(String id, int timeout) {
    return factory.getList(id).blpop(timeout);
}

@Override
public List<Object> lBlpopObject(String id, int timeout) {
    return factory.getList(id).blpopObject(timeout);
}


/*
 * =========================================
 */

@Override
public long pushToQueue(int type, String id) {
    QueueItemDTO item = new QueueItemDTO(type, id);
    String value = JacksonUtils.compressObject(item);
    if (redisService.sIsMember(REDIS_SET_TRANS, value)) {
        logger.info("Item:{} exists in queue, skip.", value);
        return 0;
    }
    redisService.sAdd(REDIS_SET_TRANS, value);
    long size = redisService.lRpush(REDIS_QUEUE_TRANS, value);
    logger.info("Request:{} pushed to queue. size:{}", value, size);
    return size;
}

@Override
public QueueItemDTO readQueue() {
    List<String> list = redisService.lBlpop(REDIS_QUEUE_TRANS, 5);
    if (list != null && list.size() > 1) {
        logger.info("Queue:{}, pop:{}", list.get(0), list.get(1));
        redisService.sRemove(REDIS_SET_TRANS, list.get(1));
        return JacksonUtils.extractObject(list.get(1), QueueItemDTO.class);
    } else {
        return null;
    }
}

posted on 2018-10-18 10:20 Milton 阅读(978) 评论(0) 编辑收藏举报

刷新页面返回顶部

Milton