DotLucene 的锁。。。
并发访问 Lucene 索引时,存在一些原则以保证索引文件的完整性,简而言之就是:
1.只读操作可以在任何时候进行,包括添加、删除、合并、优化索引文档的时候。
2.同一时刻,写入的操作只能通过单一的对象实例(IndexReader、IndexWriter)完成。
为了确保这些原则,Lucene采用了以文件系统为基础的锁——在系统的临时目录(Java为:java.io.tmpdir | .net 为 System.IO.Path.GetTempPath() )下创建索引文件对应的锁文件,其文件名的形式如下:
lucene-b55cb101e5b6f8eab981273d4d641ede-write.lock
lucene-b55cb101e5b6f8eab981273d4d641ede-write.commit
中间的部分是索引文件的路径通过 MD5 hash 出来的用以区别不同的索引文件。后边是锁的类型,其详细描述如下,偷懒不翻译了,hoho。
The write.lock file is used to keep processes from concurrently attempting to modify an index. More precisely, the write.lock is obtained by IndexWriter when IndexWriter is instantiated and kept until it’s closed. The same lock file is also obtained by IndexReader when it’s used for deleting Documents, undeleting them, or setting Field norms. As such, write.lock tends to lock the index for writing for longer periods of time.
The commit.lock is used whenever segments are being read or merged. It’s obtained by an IndexReader before it reads the segments file, which names all index segments, and it’s released only after IndexReader has opened and read all the referenced segments. IndexWriter also obtains the commit.lock right before it creates a new segments file and keeps it until it removes the index files that have been made obsolete by operations such as segment merges.
Thus, the commit.lock may be created more frequently than the write.lock, but it should never lock the index for long since during its existence index files are only opened or deleted and only a small segments file is written to disk.
这样,当某个进程或线程试图完成索引文件的写入操作时,首先需要确认对应的锁文件是否已经存在,如果存在则需要等待其他线程释放索引文件。
在 DotLucene 的实现中,注意红色部分的代码,Port 作者的意图似乎是当锁文件已经存在时,代码会抛出异常,并使方法返回 false 表示索引文件无法写入。但是 FileInfo.Create() 方法在文件已经存在的情况下并不会抛出异常,因此该实现永远也不会返回 false ,也就是 FSDirectory 的锁无效了。。。悲哀一下。。。
Lucene.Net.Store.FSDirectory.AnonymousClassLock
if(lockFile.Exists) return false; 看起来,通过判断文件是否存在来确定返回值才是符合原意的实现。
如此,客户端程序这么写的时候会引发异常,而原始实现版本什么也不会发生,但是在多线程访问的时候会发生什么就不好说料。。。
结论:
拿别人的代码玩玩,是不需要操什么心的;用来做项目的话嘛。。。谨慎、谨慎、再谨慎。。。
1.只读操作可以在任何时候进行,包括添加、删除、合并、优化索引文档的时候。
2.同一时刻,写入的操作只能通过单一的对象实例(IndexReader、IndexWriter)完成。
为了确保这些原则,Lucene采用了以文件系统为基础的锁——在系统的临时目录(Java为:java.io.tmpdir | .net 为 System.IO.Path.GetTempPath() )下创建索引文件对应的锁文件,其文件名的形式如下:
lucene-b55cb101e5b6f8eab981273d4d641ede-write.lock
lucene-b55cb101e5b6f8eab981273d4d641ede-write.commit
中间的部分是索引文件的路径通过 MD5 hash 出来的用以区别不同的索引文件。后边是锁的类型,其详细描述如下,偷懒不翻译了,hoho。
The write.lock file is used to keep processes from concurrently attempting to modify an index. More precisely, the write.lock is obtained by IndexWriter when IndexWriter is instantiated and kept until it’s closed. The same lock file is also obtained by IndexReader when it’s used for deleting Documents, undeleting them, or setting Field norms. As such, write.lock tends to lock the index for writing for longer periods of time.
The commit.lock is used whenever segments are being read or merged. It’s obtained by an IndexReader before it reads the segments file, which names all index segments, and it’s released only after IndexReader has opened and read all the referenced segments. IndexWriter also obtains the commit.lock right before it creates a new segments file and keeps it until it removes the index files that have been made obsolete by operations such as segment merges.
Thus, the commit.lock may be created more frequently than the write.lock, but it should never lock the index for long since during its existence index files are only opened or deleted and only a small segments file is written to disk.
这样,当某个进程或线程试图完成索引文件的写入操作时,首先需要确认对应的锁文件是否已经存在,如果存在则需要等待其他线程释放索引文件。
在 DotLucene 的实现中,注意红色部分的代码,Port 作者的意图似乎是当锁文件已经存在时,代码会抛出异常,并使方法返回 false 表示索引文件无法写入。但是 FileInfo.Create() 方法在文件已经存在的情况下并不会抛出异常,因此该实现永远也不会返回 false ,也就是 FSDirectory 的锁无效了。。。悲哀一下。。。
Lucene.Net.Store.FSDirectory.AnonymousClassLock
public override bool Obtain()
{
if (Lucene.Net.Store.FSDirectory.disableLocks)
return true;
bool tmpBool;
if (System.IO.File.Exists(Enclosing_Instance.lockDir.FullName))
tmpBool = true;
else
tmpBool = System.IO.Directory.Exists(Enclosing_Instance.lockDir.FullName);
if (!tmpBool)
{
try
{
System.IO.Directory.CreateDirectory(Enclosing_Instance.lockDir.FullName);
}
catch (Exception)
{
throw new System.IO.IOException("Cannot create lock directory: " + Enclosing_Instance.lockDir);
}
}
try
{
System.IO.FileStream createdFile = lockFile.Create();
createdFile.Close();
return true;
}
catch (Exception)
{
return false;
}
}
{
if (Lucene.Net.Store.FSDirectory.disableLocks)
return true;
bool tmpBool;
if (System.IO.File.Exists(Enclosing_Instance.lockDir.FullName))
tmpBool = true;
else
tmpBool = System.IO.Directory.Exists(Enclosing_Instance.lockDir.FullName);
if (!tmpBool)
{
try
{
System.IO.Directory.CreateDirectory(Enclosing_Instance.lockDir.FullName);
}
catch (Exception)
{
throw new System.IO.IOException("Cannot create lock directory: " + Enclosing_Instance.lockDir);
}
}
try
{
System.IO.FileStream createdFile = lockFile.Create();
createdFile.Close();
return true;
}
catch (Exception)
{
return false;
}
}
if(lockFile.Exists) return false; 看起来,通过判断文件是否存在来确定返回值才是符合原意的实现。
如此,客户端程序这么写的时候会引发异常,而原始实现版本什么也不会发生,但是在多线程访问的时候会发生什么就不好说料。。。
IndexWriter writer1 = null;
IndexWriter writer2 = null;
try
{
writer1 = new IndexWriter(@"Index", FullTextAnalyzer.Instance(), true);
writer2 = new IndexWriter(@"Index", FullTextAnalyzer.Instance(), true);
writer1.AddDocument(new Document());
writer2.AddDocument(new Document());
}
catch(Exception e)
{
writer1.Close();
throw e;
}
IndexWriter writer2 = null;
try
{
writer1 = new IndexWriter(@"Index", FullTextAnalyzer.Instance(), true);
writer2 = new IndexWriter(@"Index", FullTextAnalyzer.Instance(), true);
writer1.AddDocument(new Document());
writer2.AddDocument(new Document());
}
catch(Exception e)
{
writer1.Close();
throw e;
}
结论:
拿别人的代码玩玩,是不需要操什么心的;用来做项目的话嘛。。。谨慎、谨慎、再谨慎。。。