Lisp

Programmer

博客园 首页 新随笔 联系 订阅 管理

I/O Concurrency

Recall timeline
  Time-lines for CPU, disk, network
  How can we use the system's resources more efficiently?

What we want is *I/O concurrency*
  Ability to overlap I/O wait with other useful work.
  In web server case, I/O wait mostly for net transfer to client.
  Could be disk I/O: compile 1st part of file while fetching 2nd part.
  Could be user interaction: emacs GC while waiting for you to type.

Performance benefits of I/O concurrency can be huge
  Suppose we're waiting for disk for client one, 10 milliseconds
  We can probably server 100 other clients from cache during that time!

Typical ways to get concurrency.
  This is about s/w structure.
  There are any number of potential structures.
  [list these quickly]
  0. (One process)
  1. Multiple processes
  2. One process, many threads
  3. Event-driven
  Depends on O/S facilities and type of application.
    Degree of interaction among different sub-tasks.

One process can be better than you think!
  O/S provides I/O concurrency transparently when it can
  O/S does read-ahead into cache, write-behind from buffer
    works for disk and network connections

I/O Concurrency with multiple processes
  Start a new UNIX process for each client connection / request
  Master processes hands out connections.
  Now plenty of work available to keep system busy
  Still simple:
    look at server_2() in handout.
    fork() after accept()
    Preserves original s/w structure.
  Isolated: bug for one client does not crash the whole server
    Most interaction hidden by O/S. E.g. lock the disk queue.
  If > 1 CPU, CPU concurrency as a side effect

We may also want *CPU concurrency*
  Make use of multiple CPUs on shared memory machine.
  Often I/O concurrency tools can be used to get CPU concurrency.
  Of course O/S designer had to work a lot harder...
  CPU concurrency much less important than I/O concurrency: 2x, not 100x
    In general, very hard to program to get good scaling.
    Usually easier to buy two separate computers, which we *will* talk about.

Multiple process problems
  Cost of starting a new process (fork()) may be high.
    New address space &c. 300 microseconds *min* on my computer.
  Processes are fairly isolated by default
    E.g. they do not share memory
    What if you want a web cache? Must be shared among processes.
    Or even just keep statistics?

Concurrency with threads
  Looks a bit like multiple processes
  But thread_fork() leaves address space alone
  So all threads share memory
  One stack per thread, inside process
  [picture: thread boxes inside process boxes]
  Seems simple -- still preserves single-process structure.
  Potentially easier to have e.g. shared web cache
    But programmer needs to know about some kind of locking.
  Also easier for one thread to corrupt another

There are some low-level but very important details that are hard to get right.
  What happens when a thread calls read()? Or some other blocking system call?
  Does the whole process block until disk I/O has finished?
  If you don't get this right, you don't get I/O concurrency.

Kernel-supported threads
  O/S kernel knows about each thread
  It knows a thread was just blocked, e.g. in disk read wait
    Can schedule another thread
  [picture: thread boxes dip down into the kernel]
  What does kernel need for this?
    Per-thread kernel stack.
    Per-thread tables (e.g. saved registers).
  Semantics:
    per-process resources: addr space, file descriptors
    per-thread resources: user stack, kernel stack, kernel state
  Kernel can schedule one thread per CPU
  This sounds like just what we want for our server
  BUT kernel threads are usually expensive, just like processes
    Kernel has to help create each thread
    Kernel has to help with each context switch?
      So it knows which thread took a fault...
    lock/unlock must go through kernel, but bad for them to be slow
  Many O/S do not provide kernel-supported threads, not portable

User-level threads
  Implemented purely inside program, kernel does not know
  User scheduler for threads inside the program
    In addition to kernel process scheduler
  [picture]
  User-level scheduler must:
    Know when a thread is making a blocking system call.
    Don't actually block, but switch to another thread.
    Know when I/O has completed so it can wake up original thread.
  Answer:
    thread library has fake read(), write(), accept(), &c system calls
    library knows how to *start* syscall operations without waiting
    library marks threads as waiting, switches to a runnable thread
    kernel notifies library of I/O completion and other events
      library marks waiting thread runnable
  read(){
    tell kernel to start read;
    mark thread as waiting for read;
    sched();
  }
  sched(){
    ask kernel for I/O completion events
      mark threads runnable
    find a runnable thread;
    restore registers and return;
  }
  Events we would like from kernel:
    new network connection
    data arrived on socket
    disk read completed
    client/socket ready to receive new data
  Like a miniature O/S inside the process

Problem: user-level threads need significant kernel support
  1. non-blocking system calls
  2. uniform event delivery mechanism

Typical O/S provides only partial support for event notification
   yes: new TCP connections, arriving TCP/pipe/tty data
   no: file-system operation completion

Similarly, not all system calls operations can be started w/o waiting
   yes: connect(), socket read(), write()
   no: open(), stat()
   maybe: disk read()

Why are non-blocking system calls hard in general?
  Typical system call implementation, inside the kernel:
  Can we just return to user program instead of wait_for_disk?
    No: how will kernel know where to continue?
    ie. should it run userspace code or continue in the kernel syscall?
  Big problem: keeping state for multi-step operations.

Options:
  Live with only partial support for user-level threads
  New operating system with totally different syscall interface.
    One system call per non-blocking sub-operation.
    So kernel doesn't need to keep state across multiple steps.
    e.g. lookup_one_path_component()
  Microkernel: no system calls, only messages to servers.
    and non-blocking communication
  Helper processes that block for you (Flash paper next week)

Threads are hard to program
  The point is to share data structures in one address space
  Thread *model* involves CPU concurrency even on a single CPU
    so programmer may need to use locks
    even if only goal was to overlap I/O wait
  But *events* usually occur one at a time
    could do CPU processing sequentially, overlap only the I/O waiting

Event-driven programming
  Suggested by user threads implementation
  Organize the s/w around arrival of events
  Write s/w in state-machine style
    When this event occurs, execute this function
  Library support to register interest in events
  The point: this preserves the serial natures of the events
    Programmer sees events/functions occuring one at a time

Recall simple web-like server?
/* s1 is a TCP socket from a client */
handle(int s1)
{
  char request[1024], buf[8192];
  int i = 0, n, fd;
  char c;

  /* Read the request (a file name) from the client. */
  while(read(s1, &c, 1) == 1 && c != '\n' && c != '\r'
        && i < sizeof(request)-1)
    request[i++] = c;
  request[i] = '\0';

  /* Open the file, send the contents to the client. */
  fd = open(request, 0);
  while((n = read(fd, buf, sizeof(buf))) > 0)
    write(s1, buf, n);
  close(fd);
  close(s1);
}

setup()
{
  int s;
  struct sockaddr_in sin;

  /* Allocate a TCP/IP socket. */
  s = socket(AF_INET, SOCK_STREAM, 0);
 
  /* Listen for connections on port 80 (http). */
  bzero(&sin, sizeof(sin));
  sin.sin_family = AF_INET;
  sin.sin_port = htons(80);
  bind(s, &sin, sizeof(sin));
  listen(s, 128);

  return(s);
}

server_1()
{
  int s, s1, addrlen;
  struct sockaddr_in from;

  /* create a TCP socket that listens for HTTP connections */
  s = setup();

  while(1){
    /* Wait for a new connection from a client. */
    addrlen = sizeof(from);
    s1 = accept(s, &from, &addrlen);

    /* Perform the client's request. */
    handle(s1);
  }
}

server_2()
{
  int s, s1, addrlen, status;
  struct sockaddr_in from;

  s = setup();

  while(1){
    /* Wait for a new connection from a client. */
    addrlen = sizeof(from);
    s1 = accept(s, &from, &addrlen);

    /* Create a new child process. */
    if(fork() == 0){
      /* Perform the client's request in the child process. */
      handle(s1);
      exit(0);
    }
    close(s1);

    /* Collect dead children, but don't wait for them. */
    waitpid(-1, &status, WNOHANG);
  }
}

main()
{
  server_1();
  /* server_2(); */
}
  Now we'll look at a client for the server
  Show how to write some asynchronous code
  Build up to libasync which you will and have been using for labs

int client (char *host, int port, char *filename);

int main (int argc, char *argv[])
{
  char *host, *filename;
  int port, r;

  assert (argc == 4);
  host = argv[1];
  port = atoi (argv[2]);
  filename = argv[3];

  r = client (host, port, filename);
  return r;
}

int client (char *host, int port, char *filename)
{
  int r, s;
  struct sockaddr_in sin;
  char buf[1024];

  /* Setup the socket */
  s = socket (AF_INET, SOCK_STREAM, 0);

  /* Make the connection */
  bzero (&sin, sizeof (sin));
  sin.sin_family = AF_INET;
  sin.sin_port = htons (port);
  inet_aton (host, &sin.sin_addr);

  connect (s, (struct sockaddr *) &sin, sizeof (sin));

  /* Write the filename */
  write (s, filename, strlen (filename));
  write (s, "\n", 1);

  /* Send the bytes that come back to stdout */
  while ((r = read (s, buf, sizeof (buf))) > 0)
    write (1, buf, r);

  /* Finish out */
  close (s);
  return 0;
}

Where does this block?
  connect: makes a tcp connection
  write(s): is remote side willing to take data?
  read(s): has data come back from the remote side?
  write(1): is terminal ready for output?

How to program in event style?
  Identify events and appropriate responses: state machine
    programmer has to know when something might block!
  Write a loop that handles incoming events (I/O events)
  [events.c Example 1]

select()
  Need a way to multiplex sockets
  Program must then interleave operations

  [write prototype on the board: nfds, reads, writes, excepts, time]
  Condition is actually "read() would not block"; might be EOF.
  select() blocks (if timeout > 0) to avoid wasteful polling.
    this is important; you *do* want to block in select().

Translate low-level system events into application level events
  Buffer net I/O, maintain individual application state
  Writing this event loop for each program is tedious
  [sketch implementation on the board]
  What if your program does the one thing in parallel?
    Have to partition up state for each client
    Need to maintain sets of file descriptors
  What if your program does many things?
    e.g. let's add DNS resolution
  Hard to be modular if event loop knows about all activities.
  And knows how to consult all state.

We would prefer abstraction...
  Use a library to provide main loop (e.g. libasync)
  Programmer provides "callbacks" to handle events

// initialize state
while (event = get event) {
  switch (event.type) {
    case readable:
      // decide what read action is appropriate
      read (event.fd);
      // update state
      break;
    case writable:
      // decide what write action is appropriate
      write (event.fd);
      // update state
      break;
   }
 }

//
// Example 2
// Top-level driver loop for an event-driven programming library.
//

list<when, callback> timeouts;
callback fds[...];

// call amain() from main()
amain() {
  while(1){
    select() for fds[] and earliest timeout;
    for each readable fd
      cb = fds[selread][fd].
      cb()
    for each writable fd
      cb =fds[selwrite][fd];
      cb ();
    if a timeout has expired
      cb = timeouts.pop()
      cb()
  }
}

// register cb to be called when fd is ready for op (selread or selwrite)
// Set to NULL to clear
fdcb(fd, op, cb) {
  fds[op][fd] = cb;
}

// register cb to be called at specified time
delaycb(when, cb){
  timeouts.push(list(when, cb));
}


Break up code into functions with non-blocking ops
  let the library handle the boring async stuff
  [prototypes in webclient_libasync.c]

It's unfortunately hard for async programs to maintain state
  [draw logical diagram of select loop and function calls]
  Ordinary programs, and threads, use variables.
    Which persist across function calls, and blocking operations.
    Since they are stored on the stack.
  Async programs can't keep state on the stack.
    Since each callback must return immediately.

How can they maintain state across calls?
  Use global variables
  Use the heap:
    Programmers package up state in a struct, malloc struct
    Each callback could take a void * (libevent)
    (In C++, can do this somewhat implicitly using an object.)
  This turns out to be hard to program
    No type safety
    Must declare structs for every set of state transfer
    User has to manage memory in potentially tricky cases
  libasync provides a form of closures
 
cb = wrap(fn, a, b) generates a closure.
  That is, a function pointer plus an environment of saved values.
      cb() calls fn(a, b)
  Also provides something like function currying.
  useful later on when callbacks do different things based on input
  Given a function with signature "R fn (A, B)":
    cb = wrap (fn) -> callback<R, A, B>::ref
  use it like this:
    cb (a, b)
  Or:
    wrap (fn, a) -> callback<R, B>::ref

Limited compared to Scheme closures:
  You must explicitly indicate what variables to keep around.
  Can only pass a certain number of arguments

How are callbacks implemented?
  See callback.h: one of the few commented files in libasync.
  templates to generate dynamic structs holding values
  templates provide type safety:
    R fn (A, B);
    cb = wrap (fn) -> callback<R, A, B>::ref cb (a, b)
    cb = wrap (fn, a) -> callback<R, B>::ref cb (b);
    cb = wrap (fn, a, b) -> callback<R>::ref cb ();

  callbacks are reference counted to simplify mem mgmt
    normally, arguments in the wrap would have been on stack
    now, values are stored in closures created by wrap().
    How do we know when we've used a callback the last time?
    That's why they're reference counted.

What is the result?

void start_connect (char *host, int port, char *filename);
void write_request (int s, char *filename);
void read_data (int s);
void write_data (int s, char *buf, int len);

int
main (int argc, char *argv[])
{
  char *host;
  int port;
  char *filename;
  int r;

  assert (argc == 4);
  host = argv[1];
  port = atoi (argv[2]);
  filename = argv[3];

  make_async (1);

  start_connect (host, port, filename);
  /* start_connect (host2, port2, filename2); */

  amain ();
}

void start_connect (char *host, int port, char *filename)
{
  int r, s;
  struct sockaddr_in sin;

  /* Setup the socket and make it asynchronous! */
  s = socket (AF_INET, SOCK_STREAM, 0);
  make_async (s);

  /* Make the connection; get ready for select */
  bzero (&sin, sizeof (sin));
  sin.sin_family = AF_INET;
  sin.sin_port = htons (port);
  inet_aton (host, &sin.sin_addr);
  connect (s, (struct sockaddr *) &sin, sizeof (sin));
  /* This no longer blocks! */

  fdcb (s, selwrite, wrap (write_request, s, filename));
}

void write_request (int s, char *filename)
{
  write (s, filename, strlen (filename));
  write (s, "\n", 1);

  fdcb (s, selwrite, NULL);
  fdcb (s, selread, wrap (read_data, s));
}

void read_data (int s)
{
  int r;
  /* char buf[1024]; */ /* WRONG! */
  char *buf = (char *) malloc (1024);
  r = read (s, buf, 1024);

  fdcb (s, selread, NULL);
  if (r > 0) {
    fdcb (1, selwrite, wrap (write_data, s, buf, r));
  } else {
    close (s);
    exit (0);
  }
}

void write_data (int s, char *buf, int len)
{
  write (1, buf, len);
  free (buf);
  fdcb (1, selwrite, NULL);
  fdcb (s, selread, wrap (read_data, s));
}

  what's the difference between filename and buf?

This is still somewhat tedious...
  Must handle memory allocation for strings
  Must manually buffer data to and from client
  Have to translate network read/writes into application level events

libasync provides some solutions:
  suio and aios handle raw and line-oriented i/o
  reference counted data (strings and general dynamic structs)
  asynchronous RPC library
  but you still have to do work like splitting your code up into functions
  loops can still be a pain

Event driven programming
  Achieve I/O concurrency for communication efficiently
  Threads give cpu *and* i/o concurrency
    Never quote clear when you'll context switch: cpu+i/o concurrency
  State machine style execution
    Lots of "threads": request handling state machines in parallel
    Single address space: no context switch overhead ==> efficient
    Have kernel notify us of I/O events that we can handle w/o blocking
  The point: this preserves the serial natures of the events
    Programmer sees events/functions occuring one at a time
    Simplifies locking (but when do you still need it?)

libasync handles most of the busywork
  [draw amain/select on board again]
  e.g. write-ability events are usually boring
  libarpc translates to events that the programmer might care about: rpcs

ccfs architecture:
  [draw block diagram on the board:
     OS [app, ccfs] --> blockserver <-- [ccfs, app] OS
                    \-> lockserver  <-/
  ]
  ccfs communicate through RPC: you'll be writing clients and servers
  [include names of RPCs on the little lines]
  real apps can be structured just like this: okws, chord/dhash

Synchronous RPC:
  [Example 1]
  [Sketch this on the board and use it to show evolution]

Making RPCs
  Already saw basic framework in Lab 1
  libarpc provides an rpc compiler: protocol.x -> .C and .h
    Provides (un)marshalling of structs into strings
    External Data Representation, XDR (rfc1832)
    [Example 2]
  libraries to help:
    handle the network (axprt: asynchronous transport)
    write clients (aclnt),
      aclnt handles all bookkeeping/formatting/etc for us:
      e.g. which cb gets called
    write servers (asrv/svccb)

Asynchronous RPC: needs a callback!
  [Example 3]
  Note:
  1. Need to split code into separate functions: need to declare prototypes
  2. "return values" passed in by aclnt as arguments: e.g. clnt_stat
  3. cb must keep track of where results will be stored.
  4. Actually must split everything that uses an async function!

How do we translate this into a stub function?
  Need to provide our own callback....
  [Example 4]
  ...translate RPC results/error into something the app can use.

Server side:
  Setup involves listening on a socket, allocating a server with dispatch cb

  [Example 5]
  dispatch (svccb *sbp):
    switch to dispatch on sbp->proc ();
    call sbp->reply (res);

  You must not block when handling proc ()
    you don't need to reply right away but blocking would be bad

Managing memory with svccb:
  Use getarg<type> to get pointer to argument, svccb managed
  Use getres<type> to get a pointer to a reply struct, svccb managed
  sbp->reply causes the sbp to get deleted.

Writing user-level NFS servers:
  classfsd code will allow you to mount a local NFS server w/o root
  nfsserv_udp handles tedious work, we register a dispatch function
  Similar to generic RPC server but use nfscall *, instead of svccb.
  Adds features like nc->error ()

You'll need to do multiple operations to handle each RPC
  [draw RPC issue timeline os->kernel->ccfs->lockserver/blockserver]
  Not unlike how we might operate:
    get an e-mail from friend: can you make it to my wedding?
    check class calendar on web, check research deadlines
    send IM to wife, research ticket prices, reply
  Or Amazon.com login...
  [Example 6]

An aside on locking:
  No locking etc needed usually: e.g. to increment a variable
  When do you need locking?
    When an operation involving multiple stages
  Be careful about callbacks that are supposed to happen "later"
    e.g. delaycb (send_grant);

Parallelism and loops
  [Example 7a]: synchronous code
  [Example 7b]: serialized and async
  [Example 7c]: parallelism but yet...
  [Example 7d]: better parallelism?

// Example 1
// Synchronous RPC
//
void fn () {
  get_args a; a.key = ...;
  get_result r;
  clnt_stat stat = call (BLOCK_GET, &a, &r); // blocks
  if (stat) {
    handle_error ();
    return;
  }
  printf ("%s\n", r->value);
  do_something_else ();
}

//
// Example 2
// What call does (conceptual "pseudocode")
//
int serverfd;
int
call (int proc, void *args, void *res) {
  rpc_msg m;
  m.xid = random ();
  m.call.prog = proc;
  m.call.args = args;
  str out = xdr2str (m); // libarpc marshalling code
  write (serverfd, out.cstr (), out.len ());

  char reply[1024]; // Block waiting for reply
  int len = read (serverfd, reply, sizeof (reply));
  rpc_msg r;
  if (str2xdr (r, str (reply, len))) { // unmarshalling
    assert (r.xid == m.xid);
    memcpy (r, r.resp.res, sizeof (r));
    return RPC_SUCCESS;
  }
  return RPC_FAILED;
}

//
// Example 3
// Asynchronous RPC
//
ptr<aclnt> c;
void fn () {
  get_args a;
  a.key = key
  ptr<get_result> r = New refcounted<get_result> ();
  c->call (BLOCK_GET, &a, r, wrap (use_results, key, r));
}

void use_results (str key, ptr<get_results> r, clnt_stat stat) {
  if (stat) {
    handle_error ();
  }
  printf ("%s\n", r->value);
  do_something_else ();
}

//
// Example 4
// Using asynchronous RPCs in context
//
void
blockdbc::get (str key, callback<void, bool, str>::ref cb)
{
  get_args a;
  a.key = key
  get_result *r = New refcounted<get_result> ();
  c->call (BLOCK_GET, &a, r, wrap (this, &get_helper, cb, r));
}
void
blockdbc::get_helper (callback<void, bool, str>::ref cb,
  ptr<get_results> r,
  clnt_stat stat)
{
  if (stat) {
    cb (false, "");
  } else {
    // XXX more or less
    cb (true, r->value);
  }
}

typedef callback<void>::ref cbv;
void fn ()
{
  db = New blockdbc (...);
  cb = wrap (do_something_else);
  db->get (key, wrap (use_results, cb, key));
}
void use_results (cbv cb, str key, bool ok, str data)
{
  assert (ok);
  warn << "key: " << key << "\n";
  warn << "data: " << data << "\n";
  cb ();
}

//
// Example 5
// Simple RPC dispatcher
//
int main ()
{
  int serverfd = setup (port);
  ptr<axprt> ax = axprt::alloc (serverfd);
  BS *bs = New BS ();
  ptr<asrv> srv = asrv::alloc (ax, block_prog_1, wrap (&bs::dispatch, bs));
  amain ();
}
void BS::dispatch(BS *bs, svccb *sbp)
{
  switch(sbp->proc()){
  case BLOCK_GET:
    {
      gets++;
      get_args *a = sbp->Xtmpl getarg<get_args>();
      bs->db->get(str(a->key.base(), a->key.size()),
                  wrap(bs, &BS::get_cb, sbp));
    }
    break;
  case BLOCK_PUT:
    // ...
    break;
  case BLOCK_REMOVE:
    // ...
    break;
  default:
    fprintf(stderr, "blockdbd: unknown RPC %d\n", sbp->proc());
    sbp->reject(PROC_UNAVAIL);
    break;
  }
}
void BS::get_cb(svccb *sbp, bool ok, str value)
{
  get_result *r = sbp->Xtmpl getres<get_result>();
  r->ok = ok;
  r->value = value;
  sbp->reply(r);
}
//
// Example 6: NFS create
//
void fs::nfs3_create (nfscall *nc)
{
  nfs_fh3 dir = nc->getarg ...;
  get_dir_block (dir, wrap (this, &fs::nfs3_create_cb1, nc));
}
void fs::nfs3_create_cb1 (nfscall *nc, bool ok, str dirblock)
{
  str name = nc->getarg ...;
  nfs_fh3 nfh;
  new_fh (&nfh);
  // update dirblock
  put_dir_block (dir, dirblock, wrap (this, &fs::nfs3_create_cb2, nc, nfh));
}
void fs::nfs3_create_cb2 (nfscall *nc, nfs_fh3 nfh, bool ok)
{
  put_fh (nfh, wrap (this, &fs::nfs3_create_cb3, nc, nfh));
}
void fs::nfs3_create_cb3 (nfscall *nc, nfs_fh3 nfh, bool ok)
{
  diropres3 *res = nc->getres<diropres3> ();
  // fillout res
  nc->reply (res);
}

//
// Example 7a: synchronous
//
void fn ()
{
  // ...
  for (i = 0; i < nblocks; i++) {
    str block = get (str (i));
    if (block == "XXX") {
      printf ("found!");
      break;
    }
  }
  do_something_else ();
}

//
// Example 7b: serially asynchronous
//
void fn ()
{
  cbv cb = wrap (do_something_else);
  db->get (str (i), wrap (helper, cb, i, nblocks));
}

void helper (cbv cb, int i, int nblocks, bool ok, str block)
{
  if (block == "XXX") {
    printf ("found!");
    cb ();
  } else {
    if (i + 1 < nblocks) {
      // tail "recurse"
      db->get (str (i+1), wrap (helper, cb, i+1, nblocks));
    } else {
      cb ();
    }
  }
}

//
// Example 7c: parallelism (with "bug")
//
void fn ()
{
  cbv cb = wrap (do_something_else);
  for (i = 0; i < nblocks; i++) {
    db->get (str (i), wrap (helper, cb));
  }
}

void helper (cbv cb, bool ok, str block)
{
  if (block == "XXX") {
    printf ("found!");
    cb ();
  }
}

//
// Example 7d: parallelism with shared state
//
void fn ()
{
  cbv cb = wrap (do_something_else);
  ptr<bool> done = New refcounted<bool> (false);
  for (i = 0; i < nblocks; i++) {
    db->get (str (i), wrap (helper, cb, done));
  }
}

void helper (cbv cb, ptr<bool> done, bool ok, str block)
{
  if (!*done && block == "XXX" && ) {
    printf ("found!");
    *done = true;
    cb ();
  }
}

Summary
  Events programming gives programmer a view that is roughly
  consistent with what happens.
  Can build abstractions to handle app level events
  Need to break up state and program flow
    but always know when there's a wait,
    and have good control over parallelism

posted on 2006-09-20 16:06  fei  阅读(442)  评论(0编辑  收藏  举报