记一次线上Logback阻塞导致接口调用超时的问题

故障场景

某个服务凌晨0点之后突然出现接口调用超时,进入监控系统发现多台服务器先后出现了一次 IO 100% 的情况 。

问题原因

云服务商磁盘水位过高以后进行了新增,之后进行了数据迁移,导致业务日志无法进行写入,进而产生大量线程的阻塞,导致上游调用瞬间产生大量的超时。

处置方案

之前一直以为Logback默认是异步写入,深入代码后发现依然是同步写入,遂后续将Lockback的日志的同步写入修改为异步写入。

Logback 同步的写入逻辑

logback文件输出类是RollingFileAppender,打印日志的方法是subAppend,rollover是关闭文件切换日志文件。

    /**
     * Implemented by delegating most of the rollover work to a rolling policy.
     */
    public void rollover() {
        lock.lock();
        try {
            // Note: This method needs to be synchronized because it needs exclusive
            // access while it closes and then re-opens the target file.
            //
            // make sure to close the hereto active log file! Renaming under windows
            // does not work for open files.
            this.closeOutputStream();
            attemptRollover();
            attemptOpenFile();
        } finally {
            lock.unlock();
        }
    }

    /**
    * This method differentiates RollingFileAppender from its super class.
    */
    @Override
    @Override
    protected void subAppend(E event) {
        // The roll-over check must precede actual writing. This is the
        // only correct behavior for time driven triggers.

        // We need to synchronize on triggeringPolicy so that only one rollover
        // occurs at a time
        synchronized (triggeringPolicy) {
            if (triggeringPolicy.isTriggeringEvent(currentlyActiveFile, event)) {
                rollover();
            }
        }

        super.subAppend(event);
    }

super.subAppend,代码在父类OutputStreamAppender中

    @Override
    protected void append(E eventObject) {
        if (!isStarted()) {
            return;
        }

        subAppend(eventObject);
    }
//......

    /**
     * Actual writing occurs here.
     * <p>
     * Most subclasses of <code>WriterAppender</code> will need to override this
     * method.
     * 
     * @since 0.9.0
     */
    protected void subAppend(E event) {
        if (!isStarted()) {
            return;
        }
        try {
            // this step avoids LBCLASSIC-139
            if (event instanceof DeferredProcessingAware) {
                ((DeferredProcessingAware) event).prepareForDeferredProcessing();
            }
            // the synchronization prevents the OutputStream from being closed while we
            // are writing. It also prevents multiple threads from entering the same
            // converter. Converters assume that they are in a synchronized block.
            // lock.lock();

            byte[] byteArray = this.encoder.encode(event);
            writeBytes(byteArray);

        } catch (IOException ioe) {
            // as soon as an exception occurs, move to non-started state
            // and add a single ErrorStatus to the SM.
            this.started = false;
            addStatus(new ErrorStatus("IO failure in appender", this, ioe));
        }
    }

// ......

    private void writeBytes(byte[] byteArray) throws IOException {
        if(byteArray == null || byteArray.length == 0)
            return;
        
        lock.lock();
        try {
            this.outputStream.write(byteArray);
            if (immediateFlush) {
                this.outputStream.flush();
            }
        } finally {
            lock.unlock();
        }
    }

跟关闭文件使用同一个锁,保证文件关闭切换不会跟写文件冲突。

异步写入

异步写入的类为 AsyncAppender,其基类为 AsyncAppenderBase ,主要的操作定义在 AsyncAppenderBase 中。
定义了阻塞队列、阻塞队列的大小,worker 等

BlockingQueue<E> blockingQueue;

    /**
     * The default buffer size.
     */
    public static final int DEFAULT_QUEUE_SIZE = 256;
    int queueSize = DEFAULT_QUEUE_SIZE;

    int appenderCount = 0;

    static final int UNDEFINED = -1;
    int discardingThreshold = UNDEFINED;
    boolean neverBlock = false;

    Worker worker = new Worker();

    /**
     * The default maximum queue flush time allowed during appender stop. If the 
     * worker takes longer than this time it will exit, discarding any remaining 
     * items in the queue
     */
    public static final int DEFAULT_MAX_FLUSH_TIME = 1000;
    int maxFlushTime = DEFAULT_MAX_FLUSH_TIME;

logback 设置为缓冲输出

<appender name="FILE" class="ch.qos.logback.core.FileAppender"> 
  <file>foo.log</file>
  <encoder>
    <pattern>%d %-5level [%thread] %logger{0}: %msg%n</pattern>
    <!-- this quadruples logging throughput -->
    <immediateFlush>false</immediateFlush>
  </encoder> 
</appender>

缓冲默认是8192也就是8K.

参考

posted @ 2020-12-07 14:23  紫气東來  阅读(2349)  评论(0编辑  收藏  举报