NLucene研究系列(9)－Store

Store

Store里主要是一些对文件的操作类。其主要目的是抽象出和平台文件系统无关的存储抽象，提供诸如目录服务（增、删文件）、输入流和输出流。
主要的类的结构为：抽象类Directory类，InputStream类和OutputStream类。
其中FSDirectory，RAMDirectory继承了Directory抽象类，
FSInputStream，RAMInputStream继承了InputStream抽象类，
FSOutputStream，RAMOutputStream继承了OutputStream抽象类。
其中FS开头的是以实际的文件系统为基础的，以RAM开头的是内存中的虚拟文件系统，虚拟文件的类为：RAMFile，包含的内容为：
Vector buffers = new Vector();
long length;
long lastModified = System.currentTimeMillis();
RAMFile中采用数组来表示文件的存储空间。在此的基础上，完成各项操作的实现，就形成了基于内存的虚拟文件系统。
下面具体分析这个内存的虚拟文件系统。
RAMDirectory类中：files信息存在一个Hashtable里
Hashtable files = new Hashtable();
创造文件的函数为
public final OutputStream createFile(String name)
{
RAMFile file = new RAMFile();
files.put(name, file);
return new RAMOutputStream(file);
}

即将name 和RAMFile存到一个Hashtable中。
打开内存文件的方式为：
public final InputStream openFile(String name)
{
RAMFile file = (RAMFile)files.get(name);
return new RAMInputStream(file);
}
其实就是用一个Hashtable来在内存中形成一个文件系统。
再来看看RAMInputStream，继承了InputStream类的方法，并重写了readInternal和seekInternal方法。
构造函数
public RAMInputStream(RAMFile f)
{
file = f;
length = file.length;
}很容易理解。
这里大量用到了抽象类InputStream的方法，先从Lucene自定义的数据类型看，Lucene定义一个Buffer_Size的长度为1024，并定义一bufferPosition指向其在缓冲区的位置。以下是4种数据类型基本的。

总之一个无论是InputStream还是OutputStream其缓冲区都是1024字节。存在一个byte类型的数组中，我们知道通过数组下标定位是几乎不需要时间的，所以速度上没有问题肯定操作定位很快。主要由硬盘的寻道时间所决定的。在内存中存取速度也是比较快的，省去了读取文件的时间，所以加载常用的信息在内存中是必要的，google光字典就存储了至少256M在内存中也充分说明了这一点。内存中的文件系统就看到这。Lucene这个也充分考虑了线程的安全性，从用Java的api可以看出，他用Hashtable而不用HashMap，用Vector而不用ArrayList，其还有个Lock的抽象类用于控制文件的读写。
下面来看看操作文件的读写类FSDirectory，FSInputStream和FSOutputStream。FSDirectory类主要是读取文件夹，其文件信息存在一Hashtable里面，存贮的形式为（File，FSDirectory）
create()函数为：
private synchronized void create() throws IOException
{
if (!directory.exists())
if (!directory.mkdirs())
throw new IOException(”Cannot create directory: ” + directory);
String[] files = directory.list(); // clear old files
for (int i = 0; i < files.length; i++) {
File file = new File(directory, files);
if (!file.delete())
throw new IOException(”Cannot delete ” + files);
}
读取文件夹的内容直接通过Hashtable读取，
public static FSDirectory getDirectory(File file, boolean create)
throws IOException {
file = new File(file.getCanonicalPath());
FSDirectory dir;
synchronized (DIRECTORIES) {
dir = (FSDirectory)DIRECTORIES.get(file); //先从HashTable里读取
if (dir == null) { //如果不存在
dir = new FSDirectory(file, create);
DIRECTORIES.put(file, dir);
} else if (create) {
dir.create();
}
}
return dir;
}
这个类主要是对File类进行操作，主要操作文件夹的创建和读取。FSInputStream类解析：通过私有内部类实现文件的随机读取，用到Java的IO类中的RandomAccessFile类
内部类为：
private class Descriptor extends RandomAccessFile
{
//private String name;
public long position;
public Descriptor(File file, String mode) throws IOException //mode为文件的读写模式
{
super(file, mode);
//name = file.toString();
//debug_printInfo(”OPEN”);
}
Descriptor file = null;//内部类的使用，
public FSInputStream(File path) throws IOException { //读取一个文件。
file = new Descriptor(path, “r”);
length = file.length();
}
/** 具体的读入方法*/
protected final void readInternal(byte[] b, int offset, int len)
throws IOException {
synchronized (file)
{
long position = getFilePointer(); //取得读的内容所在文件的位置
if (position != file.position) {
file.seek(position); //定位到位置
file.position = position;
}
int total = 0;
do {
int i = file.read(b, offset+total, len-total); //读取数据存到byte类型的数组b中
if (i == -1)
throw new IOException(”read past EOF”); //文件读完
file.position += i;
total += i;
} while (total < len);
}
}
FSOutputStream解析：
RandomAccessFile file = null;
然后读写打开文件。
public final void seek(long pos) throws IOException {
super.seek(pos);
file.seek(pos);
} //寻位置
public final void flushBuffer(byte[] b, int size) throws IOException {
file.write(b, 0, size);
} //在文件里写数据。
protected final void finalize() throws IOException
{
file.close(); // gc时关闭文件。
}

posted @ 2010-11-28 15:18 Anders.Lee 阅读(261) 评论(0) 编辑收藏举报

刷新页面返回顶部

From Zero To Hero

No Sweet Without Sweat

NLucene研究系列(9)－Store

Store

公告