kafka Network
Kafka network Processor
SocketServer.Processor
override def run() { startupComplete() try { while (isRunning) { try { // 从connetions queue 中取出 New connection 配置相关信息(register OP_READ),初始化配置 configureNewConnections() //处理所有响应client的 io write //(request -> process -> response queue -> queue poll -> record metrics // -> write NetworkInterface) processNewResponses() //查询就绪的Channel OP_READ 事件 poll() //处理客户端请求,添加到request task 到queue processCompletedReceives() //处理已经完成的发送 processCompletedSends() //处理client 断开 processDisconnected() } catch { case e: Throwable => processException("Processor got uncaught exception.", e) } } } finally { debug("Closing selector - processor " + id) swallowError(closeAll()) shutdownComplete() } }
client request process
class KafkaRequestHandler(id: Int, brokerId: Int, val aggregateIdleMeter: Meter, val totalHandlerThreads: Int, val requestChannel: RequestChannel, apis: KafkaApis, time: Time) extends Runnable with Logging { this.logIdent = "[Kafka Request Handler " + id + " on Broker " + brokerId + "], " private val latch = new CountDownLatch(1) def run() { while(true) { val startSelectTime = time.nanoseconds //从request queue 取出 task val req = requestChannel.receiveRequest(300) val endTime = time.nanoseconds val idleTime = endTime - startSelectTime aggregateIdleMeter.mark(idleTime / totalHandlerThreads) req match { case RequestChannel.ShutdownRequest => debug(s"Kafka request handler $id on broker $brokerId received shut down command") latch.countDown() return //client reqeust case request: RequestChannel.Request => try { request.requestDequeueTimeNanos = endTime trace(s"Kafka request handler $id on broker $brokerId handling request $request") //KafkaApis api handle, client 请求处理逻辑 apis.handle(request) } catch { case e: FatalExitError => latch.countDown() Exit.exit(e.statusCode) case e: Throwable => error("Exception when handling request", e) } finally { request.releaseBuffer() } case null => // continue } } }
index file
文件i/o的读操作,会先向文件设备发起读请求,然后驱动把请求要读的数据读取到文件的缓冲区中,这个缓冲区位于内核,然后再把这个缓冲区中的数据复制到程序虚拟地址空间中的一块区域中。
文件i/o的写操作,会向文件设备发起写请求,驱动把要写入的数据复制到程序的缓冲区中,位于用户空间,然后再把这个缓冲区的数据复制到文件的缓冲区中。
内存映射文件,是把位于硬盘中的文件看做是程序地址空间中一块区域对应的物理存储器,文件的数据就是这块区域内存中对应的数据,读写文件中的数据,直接对这块区域的地址操作,就可以,减少了内存复制的环节。
内存映射文件比起文件I/O操作,效率要高,而且文件越大,体现出来的差距越大。
index file 采用 mmap 提升效率:
abstract class AbstractIndex[K, V](@volatile var file: File, val baseOffset: Long, val maxIndexSize: Int = -1, val writable: Boolean) extends Logging { @volatile protected var mmap: MappedByteBuffer = { val newlyCreated = file.createNewFile() val raf = if (writable) new RandomAccessFile(file, "rw") else new RandomAccessFile(file, "r") try { /* 是否预先分配 memeory */ if(newlyCreated) { if(maxIndexSize < entrySize) throw new IllegalArgumentException("Invalid max index size: " + maxIndexSize) raf.setLength(roundDownToExactMultiple(maxIndexSize, entrySize)) } /* 内存映射文件 */ val len = raf.length() val idx = { if (writable) raf.getChannel.map(FileChannel.MapMode.READ_WRITE, 0, len) else raf.getChannel.map(FileChannel.MapMode.READ_ONLY, 0, len) } /* 设置 position 位置 */ if(newlyCreated) idx.position(0) else idx.position(roundDownToExactMultiple(idx.limit(), entrySize)) idx } finally { CoreUtils.swallow(raf.close()) } } }
index采用稀疏索引,减少占用空间:
@nonthreadsafe class LogSegment(val log: FileRecords, val index: OffsetIndex, val timeIndex: TimeIndex, val txnIndex: TransactionIndex, val baseOffset: Long, val indexIntervalBytes: Int, val rollJitterMs: Long, time: Time) extends Logging { @nonthreadsafe def append(firstOffset: Long, largestOffset: Long, largestTimestamp: Long, shallowOffsetOfMaxTimestamp: Long, records: MemoryRecords): Unit = { ... // 计算是否将索引写入index文件 if(bytesSinceLastIndexEntry > indexIntervalBytes) { index.append(firstOffset, physicalPosition) timeIndex.maybeAppend(maxTimestampSoFar, offsetOfMaxTimestamp) bytesSinceLastIndexEntry = 0 } bytesSinceLastIndexEntry += records.sizeInBytes } } }