高性能队列Disruptor系列3--Disruptor的简单使用(译)
简单用法
下面以一个简单的例子来看看Disruptor的用法:生产者发送一个long型的消息,消费者接收消息并打印出来。
首先,我们定义一个Event:
public class LongEvent
{
private long value;
public void set(long value)
{
this.value = value;
}
}
为了使Disruptor对这些Event提前分配,我们需要创建一个EventFactory:
import com.lmax.disruptor.EventFactory;
public class LongEventFactory implements EventFactory<LongEvent>
{
public LongEvent newInstance()
{
return new LongEvent();
}
}
事件已经定义好了,我们需要创建一个消费者来处理这些消息。我们需要消费者在终端打印接收到的消息的值:
import com.lmax.disruptor.EventHandler;
public class LongEventHandler implements EventHandler<LongEvent>
{
public void onEvent(LongEvent event, long sequence, boolean endOfBatch)
{
System.out.println("Event: " + event);
}
}
我们需要创建一个事件源,我们假设数据来是来自一些I/O设备,比如网络或文件。
import com.lmax.disruptor.RingBuffer;
public class LongEventProducer
{
private final RingBuffer<LongEvent> ringBuffer;
public LongEventProducer(RingBuffer<LongEvent> ringBuffer)
{
this.ringBuffer = ringBuffer;
}
public void onData(ByteBuffer bb)
{
long sequence = ringBuffer.next(); // Grab the next sequence
try
{
LongEvent event = ringBuffer.get(sequence); // Get the entry in the Disruptor
// for the sequence
event.set(bb.getLong(0)); // Fill with data
}
finally
{
ringBuffer.publish(sequence);
}
}
}
显而易见的是,事件发布比使用简单队列更为复杂,这是事件预分配的缘故,如果2个生产者发布消息,即在RingBuffer中声明插槽发布可用数据,而且需要在try/finally块中发布。如果我们在RingBuffer中申请了一个插槽(RingBuffer.next()),那么我们必须发布这个Sequence,如果没有发布或者发布失败,那么Disruptor的将会failed,具体点来讲,在多生产者的情况下,这将导致消费者失速,而且除了重启没有其他办法可以解决了。
使用version3.0的Translator
Disruptor的version3.0给开发者提供了Lambda表达式风格的API,将RingBuffer的复杂性封装起来。所以,对于3.0以后的首选方法是通过API中发布事件的Translator部分来发布事件,例如:
import com.lmax.disruptor.RingBuffer;
import com.lmax.disruptor.EventTranslatorOneArg;
public class LongEventProducerWithTranslator
{
private final RingBuffer<LongEvent> ringBuffer;
public LongEventProducerWithTranslator(RingBuffer<LongEvent> ringBuffer)
{
this.ringBuffer = ringBuffer;
}
private static final EventTranslatorOneArg<LongEvent, ByteBuffer> TRANSLATOR =
new EventTranslatorOneArg<LongEvent, ByteBuffer>()
{
public void translateTo(LongEvent event, long sequence, ByteBuffer bb)
{
event.set(bb.getLong(0));
}
};
public void onData(ByteBuffer bb)
{
ringBuffer.publishEvent(TRANSLATOR, bb);
}
}
这种方法的另一个优点是可以将Translator代码拖到单独的类中,并方便对其进行单元测试。Disruptor提供了很多不同的接口(EventTranslator, EventTranslatorOneArg, EventTranslatorTwoArg等等)可以提供Translator。原因是可以表示为静态类或者非捕获的lambda作为参数通过Translator传递给RingBuffer。
最后一步是将所有的东西串联起来,可以手动的连接所有的组件,但是可能会有点复杂,因此可以通过DSL来简化构建,一些复杂的选项不是通过DSL提供的,但是可以适用于大多数情况。
import com.lmax.disruptor.dsl.Disruptor;
import com.lmax.disruptor.RingBuffer;
import java.nio.ByteBuffer;
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;
public class LongEventMain
{
public static void main(String[] args) throws Exception
{
// Executor that will be used to construct new threads for consumers
Executor executor = Executors.newCachedThreadPool();
// Specify the size of the ring buffer, must be power of 2.
int bufferSize = 1024;
// Construct the Disruptor
Disruptor<LongEvent> disruptor = new Disruptor<>(LongEvent::new, bufferSize, executor);
// Connect the handler
disruptor.handleEventsWith((event, sequence, endOfBatch) -> System.out.println("Event: " + event));
// Start the Disruptor, starts all threads running
disruptor.start();
// Get the ring buffer from the Disruptor to be used for publishing.
RingBuffer<LongEvent> ringBuffer = disruptor.getRingBuffer();
ByteBuffer bb = ByteBuffer.allocate(8);
for (long l = 0; true; l++)
{
bb.putLong(0, l);
ringBuffer.publishEvent((event, sequence, buffer) -> event.set(buffer.getLong(0)), bb);
Thread.sleep(1000);
}
}
}
注意不再需要一些类(例如处理程序、翻译程序),还需要注意lambda用于publishEvent()
指的是传入的参数,如果我们把代码写成:
ByteBuffer bb = ByteBuffer.allocate(8);
for (long l = 0; true; l++)
{
bb.putLong(0, l);
ringBuffer.publishEvent((event, sequence) -> event.set(bb.getLong(0)));
Thread.sleep(1000);
}
这将创建一个可捕获的lambda,这意味需要通过publishEvent()
实例化一个对象以保存ByteBuffer bb
,这将创建额外的垃圾,为了降低GC压力,则将调用传递给lambda的调用应该是首选。
方法的引用可以用lambda来代替,fashion的写法:
import com.lmax.disruptor.dsl.Disruptor;
import com.lmax.disruptor.RingBuffer;
import java.nio.ByteBuffer;
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;
public class LongEventMain
{
public static void handleEvent(LongEvent event, long sequence, boolean endOfBatch)
{
System.out.println(event);
}
public static void translate(LongEvent event, long sequence, ByteBuffer buffer)
{
event.set(buffer.getLong(0));
}
public static void main(String[] args) throws Exception
{
// Executor that will be used to construct new threads for consumers
Executor executor = Executors.newCachedThreadPool();
// Specify the size of the ring buffer, must be power of 2.
int bufferSize = 1024;
// Construct the Disruptor
Disruptor<LongEvent> disruptor = new Disruptor<>(LongEvent::new, bufferSize, executor);
// Connect the handler
disruptor.handleEventsWith(LongEventMain::handleEvent);
// Start the Disruptor, starts all threads running
disruptor.start();
// Get the ring buffer from the Disruptor to be used for publishing.
RingBuffer<LongEvent> ringBuffer = disruptor.getRingBuffer();
ByteBuffer bb = ByteBuffer.allocate(8);
for (long l = 0; true; l++)
{
bb.putLong(0, l);
ringBuffer.publishEvent(LongEventMain::translate, bb);
Thread.sleep(1000);
}
}
}
基本变种
使用上述方法在最常用的场景中已经够用了,但是如果你希望追求极致,还能够针对需要运行的硬件和软件,利用一些优化选项来提高性能。调优主要有两种方法:单或多生产者和可选等待策略。
单或多生产者
一个在concurrent系统提高性能的最好方式是单个Write的原则,这同样也适用于Disruptor,如果你在这种只有一个单线程的生产者发送Event的的Disruptor中,那么你能利用这个来获得额外的性能。
public class LongEventMain
{
public static void main(String[] args) throws Exception
{
//.....
// Construct the Disruptor with a SingleProducerSequencer
Disruptor<LongEvent> disruptor = new Disruptor(
factory, bufferSize, ProducerType.SINGLE, new BlockingWaitStrategy(), executor);
//.....
}
}
能有多少性能优势可以通过 OneToOne performance test测试,Tests运行在i7 Sandy Bridge MacBook Air。
多生产者:
Run 0, Disruptor=26,553,372 ops/sec
Run 1, Disruptor=28,727,377 ops/sec
Run 2, Disruptor=29,806,259 ops/sec
Run 3, Disruptor=29,717,682 ops/sec
Run 4, Disruptor=28,818,443 ops/sec
Run 5, Disruptor=29,103,608 ops/sec
Run 6, Disruptor=29,239,766 ops/sec
单生产者:
Run 0, Disruptor=89,365,504 ops/sec
Run 1, Disruptor=77,579,519 ops/sec
Run 2, Disruptor=78,678,206 ops/sec
Run 3, Disruptor=80,840,743 ops/sec
Run 4, Disruptor=81,037,277 ops/sec
Run 5, Disruptor=81,168,831 ops/sec
Run 6, Disruptor=81,699,346 ops/sec
选择等待策略
默认的Disruptor使用的等待策略是BlockingWaitStrategy(阻塞等待策略),阻塞等待策略内部使用的是典型的锁和Condition条件变量来处理线程唤醒,这是最慢的等待策略了,但是在CPU使用率上最保守而且能给予肯定的一致性行为。
休眠等待策略(SleepingWaitStrategy)
和BlockingWaitStrategy一样,为了保证CPU的使用率,不是通过一个简单的忙等待循环,而是使用一个叫LockSupport.parknanos(1)
在循环中,在典型的Linux系统中将暂停60µs,这样显然是有优势的,生产者线程不需要增加计数器,也不需要信号条件。但是在生产者和消费者之间移动事件的平均延迟事件会更高。休眠等待策略在不需要低延迟的情况下效果最好,但是对生成线程的影响是很小的,一个常见的用例是异步日志记录。
退出等待策略(YieldingWaitStrategy)
退出等待策略是两个等待策略中可以被用到低延迟的策略,消耗CPU来提高实时性。YieldingWaitStrategy会忙等待Sequence增加为适当的值。在循环体中Thread.yield()
将会允许其他线程运行,当需要非常高的性能和事件处理线程的的数量小于逻辑核心的总数时,这是推荐的等待策略,启用了超线程。
自旋等待策略(BusySpinWaitStrategy)
自旋等待策略是常见的等待策略,但是对部署环境也有很高的要求。自旋等待策略应该只能被用在处理线程数小于实际核数的时候。另外,超线程应该被关闭。
从RingBuffer清除对象
当通过Disruptor传递数据的时候,对象的存活寿命可能比预期的要长,为了避免这种情况发生,可能需要在处理完事件以后清除它。如果只有一个单事件处理程序,那么在一个处理程序中清除data就够了。如果有一个事件处理链,那么需要在链结束的地方利用特殊的处理程序来清除对象。
class ObjectEvent<T>
{
T val;
void clear()
{
val = null;
}
}
public class ClearingEventHandler<T> implements EventHandler<ObjectEvent<T>>
{
public void onEvent(ObjectEvent<T> event, long sequence, boolean endOfBatch)
{
// Failing to call clear here will result in the
// object associated with the event to live until
// it is overwritten once the ring buffer has wrapped
// around to the beginning.
event.clear();
}
}
public static void main(String[] args)
{
Disruptor<ObjectEvent<String>> disruptor = new Disruptor<>(
() -> ObjectEvent<String>(), bufferSize, executor);
disruptor
.handleEventsWith(new ProcessingEventHandler())
.then(new ClearingObjectHandler());
}
参考资料: