jeroMq示例之[4] [push-pull] 分布式处理

分布式处理

下面一个示例程序中,我们将使用ZMQ进行超级计算,也就是并行处理模型:

  • 任务分发器ventilator会分发大量可以并行计算的任务;
  • 有一组worker会处理这些任务;
  • 结果收集器sinker会在末端接收所有worker的处理结果,进行汇总。

现实中,worker可能散落在不同的计算机中,利用GPU(图像处理单元)进行复杂计算。下面是任务分发器的代码,它会生成100个任务,任务内容是让收到的worker延迟若干毫秒。

taskvent: Parallel task ventilator in java

package guide;

import java.util.Random;
import org.zeromq.ZMQ;

//
//  Task ventilator in Java
//  Binds PUSH socket to tcp://localhost:5557
//  Sends batch of tasks to workers via that socket
//
public class taskvent {

    public static void main (String[] args) throws Exception {
        ZMQ.Context context = ZMQ.context(1);

        //  Socket to send messages on
        ZMQ.Socket sender = context.socket(ZMQ.PUSH);
        sender.bind("tcp://*:5557");

        //  Socket to send messages on
        ZMQ.Socket sink = context.socket(ZMQ.PUSH);
        sink.connect("tcp://localhost:5558");

        System.out.println("Press Enter when the workers are ready: ");
        System.in.read();
        System.out.println("Sending tasks to workers\n");

        //  The first message is "0" and signals start of batch
        sink.send("0", 0);

        //  Initialize random number generator
        Random srandom = new Random(System.currentTimeMillis());

        //  Send 100 tasks
        int task_nbr;
        int total_msec = 0;     //  Total expected cost in msecs
        for (task_nbr = 0; task_nbr < 100; task_nbr++) {
            int workload;
            //  Random workload from 1 to 100msecs
            workload = srandom.nextInt(100) + 1;
            total_msec += workload;
            System.out.print(workload + ".");
            String string = String.format("%d", workload);
            sender.send(string, 0);
        }
        System.out.println("Total expected cost: " + total_msec + " msec");
        Thread.sleep(1000);              //  Give 0MQ time to deliver

        sink.close();
        sender.close();
        context.term();
    }
}

5

下面是worker的代码,它接受信息并延迟指定的毫秒数,并发送执行完毕的信号:

worker

package guide;

import org.zeromq.ZMQ;

//
//  Task worker in Java
//  Connects PULL socket to tcp://localhost:5557
//  Collects workloads from ventilator via that socket
//  Connects PUSH socket to tcp://localhost:5558
//  Sends results to sink via that socket
//
public class taskwork {

    public static void main (String[] args) throws Exception {
        ZMQ.Context context = ZMQ.context(1);

        //  Socket to receive messages on
        ZMQ.Socket receiver = context.socket(ZMQ.PULL);
        receiver.connect("tcp://localhost:5557");

        //  Socket to send messages to
        ZMQ.Socket sender = context.socket(ZMQ.PUSH);
        sender.connect("tcp://localhost:5558");

        //  Process tasks forever
        while (!Thread.currentThread ().isInterrupted ()) {
            String string = new String(receiver.recv(0), ZMQ.CHARSET).trim();
            long msec = Long.parseLong(string);
            //  Simple progress indicator for the viewer
            System.out.flush();
            System.out.print(string + '.');

            //  Do the work
            Thread.sleep(msec);

            //  Send results to sink
            sender.send(ZMQ.MESSAGE_SEPARATOR, 0);
        }
        sender.close();
        receiver.close();
        context.term();
    }
}

下面是结果收集器的代码。它会收集100个处理结果,并计算总的执行时间,让我们由此判别任务是否是并行计算的。

sink

package guide;

import org.zeromq.ZMQ;

//
//  Task sink in Java
//  Binds PULL socket to tcp://localhost:5558
//  Collects results from workers via that socket
//
public class tasksink {

    public static void main (String[] args) throws Exception {

        //  Prepare our context and socket
        ZMQ.Context context = ZMQ.context(1);
        ZMQ.Socket receiver = context.socket(ZMQ.PULL);
        receiver.bind("tcp://*:5558");

        //  Wait for start of batch
        String string = new String(receiver.recv(0), ZMQ.CHARSET);

        //  Start our clock now
        long tstart = System.currentTimeMillis();

        //  Process 100 confirmations
        int task_nbr;
        int total_msec = 0;     //  Total calculated cost in msecs
        for (task_nbr = 0; task_nbr < 100; task_nbr++) {
            string = new String(receiver.recv(0), ZMQ.CHARSET).trim();
            if ((task_nbr / 10) * 10 == task_nbr) {
                System.out.print(":");
            } else {
                System.out.print(".");
            }
        }
        //  Calculate and report duration of batch
        long tend = System.currentTimeMillis();

        System.out.println("\nTotal elapsed time: " + (tend - tstart) + " msec");
        receiver.close();
        context.term();
    }
}

一组任务的平均执行时间在5秒左右,以下是分别开始1个、2个、5个worker时的执行结果:

#   1 worker
Total elapsed time: 5311 msec
#   2 workers
Total elapsed time: 2421 msec
#   5 workers
Total elapsed time: 1177 msec

 

关于这段代码的几个细节:

  • worker上游和任务分发器相连,下端和结果收集器相连,这就意味着你可以开启任意多个worker。但若worker是绑定至端点的,而非连接至端点,那我们就需要准备更多的端点,并配置任务分发器和结果收集器。所以说,任务分发器和结果收集器是这个网络结构中较为稳定的部分,因此应该由它们绑定至端点,而非worker,因为它们较为动态。

  • 我们需要做一些同步的工作,ventilator中的等所有worker启动后按enter开始的逻辑,等待worker全部启动之后再分发任务。这点在ZMQ中很重要,且不易解决。zmq非常快,而连接套接字的动作会耗费一定的时间,因此当第一个worker连接成功时,它会一下收到很多任务。所以说,如果我们不进行同步,那这些任务根本就不会被并行地执行(全都给了第一个worker了)。你可以自己试验一下。

  • 任务分发器使用PUSH套接字向worker均匀地分发任务(假设所有的worker都已经连接上了),这种机制称为负载均衡,以后我们会见得更多。

  • 结果收集器的PULL套接字会均匀地从worker处收集消息,这种机制称为公平队列

6

管道模式也会出现慢连接的情况,让人误以为PUSH套接字没有进行负载均衡。如果你的程序中某个worker接收到了更多的请求,那是因为它的PULL套接字连接得比较快,从而在别的worker连接之前获取了额外的消息。

posted @ 2014-09-18 09:25  ScottGu  阅读(1014)  评论(0编辑  收藏  举报