windows oplock
oplock只要是用于共享之类的网络环境下的文件访问,作用是用于多个网络客户端访问的时候,能保持文件数据一致(因为有缓存)
oplock的类型一般分为3个锁 level1 ,batch lock和level2
1.level1互斥锁,在修改文件内容的时候用到
2.batch lock类似 1,只是这个一般用于windows的脚本类型的文件
3.level2共享锁
一般获取锁的过程是
如一个client1 先调用createfile 获取到handle
然后deviceiocontrol发控制码(FSCTL_REQUEST_OPLOCK_LEVEL_1,FSCTL_REQUEST_BATCH_OPLOCK,FSCTL_REQUEST_OPLOCK_LEVEL_2)
当这个deviceiocontrol操作被pending的时候,说明就是成功获取到锁了(这种情况是文件系统下面pending了这个irp)
这时候client1就可以对这个文件进行相关的读写。
当这个deviceiocontrol操作完成也就是这个IRP被complete的时候,说明这个锁被break掉了(例如有client2 想操作这个文件,为了文件内存数据的一致性,需要先break掉,通知client1,让他刷缓存,即他当前的缓存已经无效了,如果他对数据有修改的话,让他赶紧把数据刷回文件服务器),这时候,client1刷完缓存后,再发送FSCTL_OPLOCK_BREAK_ACKNOWLEDGE或FSCTL_OPLOCK_BREAK_ACK_NO_2控制码,说明client1对文件的收尾工作完成了(不发ack控制码也行,close handle同样的效果)。到这一步完成后, client2才能对文件
进行后续的操作。所以如果一个文件被client1拿到oplock(level1吧)后,client2在调用createfile打开文件的时候,会一直卡在文件系统的create那里,在waitfor一个事件,这个在BAV里面碰到过,过了好久才返回(其实client2调用create的时候,就已经会break oplock了,不过需要等到client1 发acknowledge后 client2的create才返回)。出现这种情况对于client2有时候是不能接受的,因为有时候我client2并不知道有没oplock,而又不想被卡住,所以系统在createfile的时候,提供了一个标志FILE_COMPLETE_IF_OPLOCKED,当FSD发现有这个标志并文件被其它client oplock过了,FSD会立即返回一个状态码STATUS_OPLOCK_BREAK_IN_PROGRESS ,此时,说明正在break oplock,但还未完成,这时候client1的pending irp被完成了,可能client1正在刷缓存,在做收尾工作,那client2是怎么知道client1的收尾工作完成了呢?由于FSCTL_OPLOCK_BREAK_ACKNOWLEDGE或FSCTL_OPLOCK_BREAK_ACK_NO_2是client1直接发给文件服务器端的,client2肯定不知道啊,系统是提供一个这样的机制的,当client2收到STATUS_OPLOCK_BREAK_IN_PROGRESS 的返回值时,可以再给这个文件发送FSCTL_OPLOCK_BREAK_NOTIFY控制码,这个操作会被pending,当client1的收尾工作完成也就是他发了FSCTL_OPLOCK_BREAK_ACKNOWLEDGE或FSCTL_OPLOCK_BREAK_ACK_NO_2控制码后,这个pending FSCTL_OPLOCK_BREAK_NOTIFY的IRP就会被完成,然后返回,这时候client2就知道break oplock已经完成,可以安全操作文件了。
ps:对于createfile指定FILE_COMPLETE_IF_OPLOCKED的情况下,并不是只有返回STATUS_OPLOCK_BREAK_IN_PROGRESS才表示成功break oplock,有时候返回STATUS_SHARING_VIOLATION,同时IoStatue->Information=FILE_OPBATCH_BREAK_UNDERWAY也表示成功break,这种情况是发生在batch oplock和filter oplock的时候
PPS:
上面提到client2调用create时会break oplock,还有这里提到的也会break oplock
IRP_MJ_CLEANUP
IRP_MJ_CREATE
IRP_MJ_FILE_SYSTEM_CONTROL
IRP_MJ_FLUSH_BUFFERS
IRP_MJ_LOCK_CONTROL
IRP_MJ_READ
IRP_MJ_SET_INFORMATION
IRP_MJ_WRITE
\wrk-v1.2\base\ntos\fsrtl\oplock.c 的FsRtlCheckOplock函数针对这几个IRP有相应的处理,
可以看到对于后续client2直接拿到handle来发IRP READ的时候,会导致BreakToII,发IRP_MJ_WRITE则会BreakToNone
NTSTATUS
FsRtlCheckOplock (
__in POPLOCK Oplock,
__in PIRP Irp,
__in_opt PVOID Context,
__in_opt POPLOCK_WAIT_COMPLETE_ROUTINE CompletionRoutine,
__in_opt POPLOCK_FS_PREPOST_IRP PostIrpRoutine
)
/*++
Routine Description:
This routine is called as a support routine from a file system.
It is used to synchronize I/O requests with the current Oplock
state of a file. If the I/O operation will cause the Oplock to
break, that action is initiated. If the operation cannot continue
until the Oplock break is complete, STATUS_PENDING is returned and
the caller supplied routine is called.
Arguments:
Oplock - Supplies a pointer to the non-opaque oplock structure for
this file.
Irp - Supplies a pointer to the Irp which declares the requested
operation.
Context - This value is passed as a parameter to the completion routine.
CompletionRoutine - This is the routine which is called if this
Irp must wait for an Oplock to break. This
is a synchronous operation if not specified
and we block in this thread waiting on
an event.
PostIrpRoutine - This is the routine to call before we put anything
on our waiting Irp queue.
Return Value:
STATUS_SUCCESS if we can complete the operation on exiting this thread.
STATUS_PENDING if we return here but hold the Irp.
STATUS_CANCELLED if the Irp is cancelled before we return.
--*/
{
NTSTATUS Status = STATUS_SUCCESS;
PNONOPAQUE_OPLOCK ThisOplock = *Oplock;
PIO_STACK_LOCATION IrpSp;
DebugTrace( +1, Dbg, "FsRtlCheckOplock: Entered\n", 0 );
DebugTrace( 0, Dbg, "Oplock -> %08lx\n", Oplock );
DebugTrace( 0, Dbg, "Irp -> %08lx\n", Irp );
//
// If there is no oplock structure or this is system I/O, we allow
// the operation to continue. Otherwise we check the major function code.
//
if ((ThisOplock != NULL) &&
!FlagOn( Irp->Flags, IRP_PAGING_IO )) {
OPLOCK_STATE OplockState;
PFILE_OBJECT OplockFileObject;
BOOLEAN BreakToII;
BOOLEAN BreakToNone;
ULONG CreateDisposition;
//
// Capture the file object first and then the oplock state to perform
// the unsafe checks below. We capture the file object first in case
// there is an exclusive oplock break in progress. Otherwise the oplock
// state may indicate break in progress but it could complete by
// the time we snap the file object.
//
OplockFileObject = ThisOplock->FileObject;
OplockState = ThisOplock->OplockState;
//
// Examine the Irp for the appropriate action provided there are
// current oplocks on the file.
//
if (OplockState != NoOplocksHeld) {
BreakToII = FALSE;
BreakToNone = FALSE;
IrpSp = IoGetCurrentIrpStackLocation( Irp );
//
// Determine whether we are going to BreakToII or BreakToNone.
//
switch (IrpSp->MajorFunction) {
case IRP_MJ_CREATE :
//
// If we are opening for attribute access only, we
// return status success. Always break the oplock if this caller
// wants a filter oplock.
//
if (!FlagOn( IrpSp->Parameters.Create.SecurityContext->DesiredAccess,
~(FILE_READ_ATTRIBUTES | FILE_WRITE_ATTRIBUTES | SYNCHRONIZE) ) &&
!FlagOn( IrpSp->Parameters.Create.Options, FILE_RESERVE_OPFILTER )) {
break;
}
//
// If there is a filter oplock granted and this create iS reading
// the file then don't break the oplock as long as we share
// for reads.
//
if (FlagOn( OplockState, FILTER_OPLOCK ) &&
!FlagOn( IrpSp->Parameters.Create.SecurityContext->DesiredAccess,
~FILTER_OPLOCK_VALID_FLAGS ) &&
FlagOn( IrpSp->Parameters.Create.ShareAccess, FILE_SHARE_READ )) {
break;
}
//
// We we are superseding or overwriting, then break to none.
//
CreateDisposition = (IrpSp->Parameters.Create.Options >> 24) & 0x000000ff;
if ((CreateDisposition == FILE_SUPERSEDE) ||
(CreateDisposition == FILE_OVERWRITE) ||
(CreateDisposition == FILE_OVERWRITE_IF) ||
FlagOn( IrpSp->Parameters.Create.Options, FILE_RESERVE_OPFILTER )) {
BreakToNone = TRUE;
} else {
BreakToII = TRUE;
}
break;
case IRP_MJ_READ :
//
// If a filter oplock has been granted then do nothing.
// We will assume the oplock will have been broken
// if this create needed to do that.
//
if (!FlagOn( OplockState, FILTER_OPLOCK )) {
BreakToII = TRUE;
}
break;
case IRP_MJ_FLUSH_BUFFERS :
BreakToII = TRUE;
break;
case IRP_MJ_CLEANUP :
FsRtlOplockCleanup( (PNONOPAQUE_OPLOCK) *Oplock,
IrpSp );
break;
case IRP_MJ_LOCK_CONTROL :
//
// If a filter oplock has been granted then do nothing.
// We will assume the oplock will have been broken
// if this create needed to do that.
//
if (FlagOn( OplockState, FILTER_OPLOCK )) {
break;
}
case IRP_MJ_WRITE :
BreakToNone = TRUE;
break;
case IRP_MJ_SET_INFORMATION :
//
// We are only interested in calls that shrink the file size
// or breaking batch oplocks for the rename case.
//
switch (IrpSp->Parameters.SetFile.FileInformationClass) {
case FileEndOfFileInformation :
//
// Break immediately if this is the lazy writer callback.
//
if (IrpSp->Parameters.SetFile.AdvanceOnly) {
break;
}
case FileAllocationInformation :
BreakToNone = TRUE;
break;
case FileRenameInformation :
case FileLinkInformation :
case FileShortNameInformation :
if (FlagOn( OplockState, BATCH_OPLOCK | FILTER_OPLOCK )) {
BreakToNone = TRUE;
}
break;
}
case IRP_MJ_FILE_SYSTEM_CONTROL :
//
// We need to break to none if this is a zeroing operation.
//
if (IrpSp->Parameters.FileSystemControl.FsControlCode == FSCTL_SET_ZERO_DATA) {
BreakToNone = TRUE;
}
}
if (BreakToII) {
//
// If there are no outstanding oplocks or level II oplocks are held,
// we can return immediately. If the first two tests fail then there
// is an exclusive oplock. If the file objects match we allow the
// operation to continue.
//
if ((OplockState != OplockIIGranted) &&
(OplockFileObject != IrpSp->FileObject)) {
Status = FsRtlOplockBreakToII( (PNONOPAQUE_OPLOCK) *Oplock,
IrpSp,
Irp,
Context,
CompletionRoutine,
PostIrpRoutine );
}
} else if (BreakToNone) {
//
// If there are no oplocks, we can return immediately.
// Otherwise if there is no level 2 oplock and this file
// object matches the owning file object then this write is
// on behalf of the owner of the oplock.
//
if ((OplockState == OplockIIGranted) ||
(OplockFileObject != IrpSp->FileObject)) {
Status = FsRtlOplockBreakToNone( (PNONOPAQUE_OPLOCK) *Oplock,
IrpSp,
Irp,
Context,
CompletionRoutine,
PostIrpRoutine );
}
}
}
}
DebugTrace( -1, Dbg, "FsRtlCheckOplock: Exit -> %08lx\n", Status );
return Status;
}
跟进FsRtlOplockBreakToII或FsRtlOplockBreakToNone,其实里面导致的opbreak和irp create大体相同,只是irp create的情况下支持FILE_COMPLETE_IF_OPLOCKED跳出而已,所以看样子,如果client2获取handle后就发irp read,而client1还不发出acknowledge的话,这个irp read应该还会卡住
http://msdn.microsoft.com/en-us/library/cc308447.aspx 这个MSDN中有更具体的说明(如对于IRP READ,当current oplock是level2/filter oplock的时候,不会卡住)
if ((IrpSp->MajorFunction == IRP_MJ_CREATE) &&
FlagOn( IrpSp->Parameters.Create.Options, FILE_COMPLETE_IF_OPLOCKED )) {
DebugTrace( 0, Dbg, "Don't block open\n", 0 );
try_return( Status = STATUS_OPLOCK_BREAK_IN_PROGRESS );
}
//
// If we get here that means that this operation can't continue
// until the oplock break is complete.
//
// FsRtlWaitOnIrp will release the mutex.
//
AcquiredMutex = FALSE;
Status = FsRtlWaitOnIrp( Oplock,
Irp,
Context,
CompletionRoutine,
PostIrpRoutine,
&Event );
http://www.osronline.com/article.cfm?id=90
https://files.cnblogs.com/kkindof/Understand_oplock_and_avoid_sharing_violation.zip

浙公网安备 33010602011771号