windows oplock

oplock只要是用于共享之类的网络环境下的文件访问,作用是用于多个网络客户端访问的时候,能保持文件数据一致(因为有缓存)

 

oplock的类型一般分为3个锁 level1 ,batch lock和level2

1.level1互斥锁,在修改文件内容的时候用到

2.batch lock类似 1,只是这个一般用于windows的脚本类型的文件

3.level2共享锁

一般获取锁的过程是

如一个client1 先调用createfile 获取到handle

然后deviceiocontrol发控制码(FSCTL_REQUEST_OPLOCK_LEVEL_1,FSCTL_REQUEST_BATCH_OPLOCK,FSCTL_REQUEST_OPLOCK_LEVEL_2)

当这个deviceiocontrol操作被pending的时候,说明就是成功获取到锁了(这种情况是文件系统下面pending了这个irp)

这时候client1就可以对这个文件进行相关的读写。

当这个deviceiocontrol操作完成也就是这个IRP被complete的时候,说明这个锁被break掉了(例如有client2 想操作这个文件,为了文件内存数据的一致性,需要先break掉,通知client1,让他刷缓存,即他当前的缓存已经无效了,如果他对数据有修改的话,让他赶紧把数据刷回文件服务器),这时候,client1刷完缓存后,再发送FSCTL_OPLOCK_BREAK_ACKNOWLEDGE或FSCTL_OPLOCK_BREAK_ACK_NO_2控制码,说明client1对文件的收尾工作完成了(不发ack控制码也行,close handle同样的效果)。到这一步完成后, client2才能对文件

进行后续的操作。所以如果一个文件被client1拿到oplock(level1吧)后,client2在调用createfile打开文件的时候,会一直卡在文件系统的create那里,在waitfor一个事件,这个在BAV里面碰到过,过了好久才返回(其实client2调用create的时候,就已经会break oplock了,不过需要等到client1 发acknowledge后 client2的create才返回)。出现这种情况对于client2有时候是不能接受的,因为有时候我client2并不知道有没oplock,而又不想被卡住,所以系统在createfile的时候,提供了一个标志FILE_COMPLETE_IF_OPLOCKED,当FSD发现有这个标志并文件被其它client oplock过了,FSD会立即返回一个状态码STATUS_OPLOCK_BREAK_IN_PROGRESS ,此时,说明正在break oplock,但还未完成,这时候client1的pending irp被完成了,可能client1正在刷缓存,在做收尾工作,那client2是怎么知道client1的收尾工作完成了呢?由于FSCTL_OPLOCK_BREAK_ACKNOWLEDGE或FSCTL_OPLOCK_BREAK_ACK_NO_2是client1直接发给文件服务器端的,client2肯定不知道啊,系统是提供一个这样的机制的,当client2收到STATUS_OPLOCK_BREAK_IN_PROGRESS 的返回值时,可以再给这个文件发送FSCTL_OPLOCK_BREAK_NOTIFY控制码,这个操作会被pending,当client1的收尾工作完成也就是他发了FSCTL_OPLOCK_BREAK_ACKNOWLEDGE或FSCTL_OPLOCK_BREAK_ACK_NO_2控制码后,这个pending FSCTL_OPLOCK_BREAK_NOTIFY的IRP就会被完成,然后返回,这时候client2就知道break oplock已经完成,可以安全操作文件了。

 

 

ps:对于createfile指定FILE_COMPLETE_IF_OPLOCKED的情况下,并不是只有返回STATUS_OPLOCK_BREAK_IN_PROGRESS才表示成功break oplock,有时候返回STATUS_SHARING_VIOLATION,同时IoStatue->Information=FILE_OPBATCH_BREAK_UNDERWAY也表示成功break,这种情况是发生在batch oplock和filter oplock的时候

 

PPS:

上面提到client2调用create时会break oplock,还有这里提到的也会break oplock

IRP_MJ_CLEANUP

IRP_MJ_CREATE

IRP_MJ_FILE_SYSTEM_CONTROL

IRP_MJ_FLUSH_BUFFERS

IRP_MJ_LOCK_CONTROL

IRP_MJ_READ

IRP_MJ_SET_INFORMATION

IRP_MJ_WRITE

\wrk-v1.2\base\ntos\fsrtl\oplock.c 的FsRtlCheckOplock函数针对这几个IRP有相应的处理,

可以看到对于后续client2直接拿到handle来发IRP READ的时候,会导致BreakToII,发IRP_MJ_WRITE则会BreakToNone

 

NTSTATUS
FsRtlCheckOplock (
    __in POPLOCK Oplock,
    __in PIRP Irp,
    __in_opt PVOID Context,
    __in_opt POPLOCK_WAIT_COMPLETE_ROUTINE CompletionRoutine,
    __in_opt POPLOCK_FS_PREPOST_IRP PostIrpRoutine
    )

/*++

Routine Description:

    This routine is called as a support routine from a file system.
    It is used to synchronize I/O requests with the current Oplock
    state of a file.  If the I/O operation will cause the Oplock to
    break, that action is initiated.  If the operation cannot continue
    until the Oplock break is complete, STATUS_PENDING is returned and
    the caller supplied routine is called.

Arguments:

    Oplock - Supplies a pointer to the non-opaque oplock structure for
             this file.

    Irp - Supplies a pointer to the Irp which declares the requested
          operation.

    Context - This value is passed as a parameter to the completion routine.

    CompletionRoutine - This is the routine which is called if this
                        Irp must wait for an Oplock to break.  This
                        is a synchronous operation if not specified
                        and we block in this thread waiting on
                        an event.

    PostIrpRoutine - This is the routine to call before we put anything
                     on our waiting Irp queue.

Return Value:

    STATUS_SUCCESS if we can complete the operation on exiting this thread.
    STATUS_PENDING if we return here but hold the Irp.
    STATUS_CANCELLED if the Irp is cancelled before we return.

--*/

{
    NTSTATUS Status = STATUS_SUCCESS;
    PNONOPAQUE_OPLOCK ThisOplock = *Oplock;

    PIO_STACK_LOCATION IrpSp;

    DebugTrace( +1, Dbg, "FsRtlCheckOplock:  Entered\n", 0 );
    DebugTrace(  0, Dbg, "Oplock    -> %08lx\n", Oplock );
    DebugTrace(  0, Dbg, "Irp       -> %08lx\n", Irp );

    //
    //  If there is no oplock structure or this is system I/O, we allow
    //  the operation to continue.  Otherwise we check the major function code.
    //

    if ((ThisOplock != NULL) &&
        !FlagOn( Irp->Flags, IRP_PAGING_IO )) {

        OPLOCK_STATE OplockState;
        PFILE_OBJECT OplockFileObject;

        BOOLEAN BreakToII;
        BOOLEAN BreakToNone;

        ULONG CreateDisposition;

        //
        //  Capture the file object first and then the oplock state to perform
        //  the unsafe checks below.  We capture the file object first in case
        //  there is an exclusive oplock break in progress.  Otherwise the oplock
        //  state may indicate break in progress but it could complete by
        //  the time we snap the file object.
        //

        OplockFileObject = ThisOplock->FileObject;
        OplockState = ThisOplock->OplockState;

        //
        //  Examine the Irp for the appropriate action provided there are
        //  current oplocks on the file.
        //

        if (OplockState != NoOplocksHeld) {

            BreakToII = FALSE;
            BreakToNone = FALSE;

            IrpSp = IoGetCurrentIrpStackLocation( Irp );

            //
            //  Determine whether we are going to BreakToII or BreakToNone.
            //

            switch (IrpSp->MajorFunction) {

            case IRP_MJ_CREATE :

                //
                //  If we are opening for attribute access only, we
                //  return status success.  Always break the oplock if this caller
                //  wants a filter oplock.
                //

                if (!FlagOn( IrpSp->Parameters.Create.SecurityContext->DesiredAccess,
                             ~(FILE_READ_ATTRIBUTES | FILE_WRITE_ATTRIBUTES | SYNCHRONIZE) ) &&
                    !FlagOn( IrpSp->Parameters.Create.Options, FILE_RESERVE_OPFILTER )) {

                    break;
                }

                //
                //  If there is a filter oplock granted and this create iS reading
                //  the file then don't break the oplock as long as we share
                //  for reads.
                //

                if (FlagOn( OplockState, FILTER_OPLOCK ) &&
                    !FlagOn( IrpSp->Parameters.Create.SecurityContext->DesiredAccess,
                             ~FILTER_OPLOCK_VALID_FLAGS ) &&
                    FlagOn( IrpSp->Parameters.Create.ShareAccess, FILE_SHARE_READ )) {

                    break;
                }

                //
                //  We we are superseding or overwriting, then break to none.
                //

                CreateDisposition = (IrpSp->Parameters.Create.Options >> 24) & 0x000000ff;

                if ((CreateDisposition == FILE_SUPERSEDE) ||
                    (CreateDisposition == FILE_OVERWRITE) ||
                    (CreateDisposition == FILE_OVERWRITE_IF) ||
                    FlagOn( IrpSp->Parameters.Create.Options, FILE_RESERVE_OPFILTER )) {

                    BreakToNone = TRUE;

                } else {

                    BreakToII = TRUE;
                }

                break;

            case IRP_MJ_READ :

                //
                //  If a filter oplock has been granted then do nothing.
                //  We will assume the oplock will have been broken
                //  if this create needed to do that.
                //

                if (!FlagOn( OplockState, FILTER_OPLOCK )) {

                    BreakToII = TRUE;
                }

                break;

            case IRP_MJ_FLUSH_BUFFERS :

                BreakToII = TRUE;
                break;

            case IRP_MJ_CLEANUP :

                FsRtlOplockCleanup( (PNONOPAQUE_OPLOCK) *Oplock,
                                    IrpSp );

                break;

            case IRP_MJ_LOCK_CONTROL :

                //
                //  If a filter oplock has been granted then do nothing.
                //  We will assume the oplock will have been broken
                //  if this create needed to do that.
                //

                if (FlagOn( OplockState, FILTER_OPLOCK )) {

                    break;
                }

            case IRP_MJ_WRITE :

                BreakToNone = TRUE;
                break;

            case IRP_MJ_SET_INFORMATION :

                //
                //  We are only interested in calls that shrink the file size
                //  or breaking batch oplocks for the rename case.
                //

                switch (IrpSp->Parameters.SetFile.FileInformationClass) {

                case FileEndOfFileInformation :

                    //
                    //  Break immediately if this is the lazy writer callback.
                    //

                    if (IrpSp->Parameters.SetFile.AdvanceOnly) {

                        break;
                    }

                case FileAllocationInformation :

                    BreakToNone = TRUE;
                    break;

                case FileRenameInformation :
                case FileLinkInformation :
                case FileShortNameInformation :

                    if (FlagOn( OplockState, BATCH_OPLOCK | FILTER_OPLOCK )) {

                        BreakToNone = TRUE;
                    }

                    break;
                }

            case IRP_MJ_FILE_SYSTEM_CONTROL :

                //
                //  We need to break to none if this is a zeroing operation.
                //

                if (IrpSp->Parameters.FileSystemControl.FsControlCode == FSCTL_SET_ZERO_DATA) {

                    BreakToNone = TRUE;
                }
            }

            if (BreakToII) {

                //
                //  If there are no outstanding oplocks or level II oplocks are held,
                //  we can return immediately.  If the first two tests fail then there
                //  is an exclusive oplock.  If the file objects match we allow the
                //  operation to continue.
                //

                if ((OplockState != OplockIIGranted) &&
                    (OplockFileObject != IrpSp->FileObject)) {

                    Status = FsRtlOplockBreakToII( (PNONOPAQUE_OPLOCK) *Oplock,
                                                    IrpSp,
                                                    Irp,
                                                    Context,
                                                    CompletionRoutine,
                                                    PostIrpRoutine );
                }

            } else if (BreakToNone) {

                //
                //  If there are no oplocks, we can return immediately.
                //  Otherwise if there is no level 2 oplock and this file
                //  object matches the owning file object then this write is
                //  on behalf of the owner of the oplock.
                //

                if ((OplockState == OplockIIGranted) ||
                    (OplockFileObject != IrpSp->FileObject)) {

                    Status = FsRtlOplockBreakToNone( (PNONOPAQUE_OPLOCK) *Oplock,
                                                      IrpSp,
                                                      Irp,
                                                      Context,
                                                      CompletionRoutine,
                                                      PostIrpRoutine );
                }
            }
        }
    }

    DebugTrace( -1, Dbg, "FsRtlCheckOplock:  Exit -> %08lx\n", Status );

    return Status;
}

  

跟进FsRtlOplockBreakToII或FsRtlOplockBreakToNone,其实里面导致的opbreak和irp create大体相同,只是irp create的情况下支持FILE_COMPLETE_IF_OPLOCKED跳出而已,所以看样子,如果client2获取handle后就发irp read,而client1还不发出acknowledge的话,这个irp read应该还会卡住

http://msdn.microsoft.com/en-us/library/cc308447.aspx 这个MSDN中有更具体的说明(如对于IRP READ,当current oplock是level2/filter oplock的时候,不会卡住)

    if ((IrpSp->MajorFunction == IRP_MJ_CREATE) &&
            FlagOn( IrpSp->Parameters.Create.Options, FILE_COMPLETE_IF_OPLOCKED )) {

            DebugTrace( 0, Dbg, "Don't block open\n", 0 );

            try_return( Status = STATUS_OPLOCK_BREAK_IN_PROGRESS );
        }

        //
        //  If we get here that means that this operation can't continue
        //  until the oplock break is complete.
        //
        //  FsRtlWaitOnIrp will release the mutex.
        //

        AcquiredMutex = FALSE;

        Status = FsRtlWaitOnIrp( Oplock,
                                 Irp,
                                 Context,
                                 CompletionRoutine,
                                 PostIrpRoutine,
                                 &Event );

  

http://www.osronline.com/article.cfm?id=90

https://files.cnblogs.com/kkindof/Understand_oplock_and_avoid_sharing_violation.zip

posted @ 2014-02-16 13:32  kkindof  阅读(1921)  评论(0)    收藏  举报