上文中我们提到Recorder httpRecorder = Recorder.getHttpRecorder()对象封装了SOCKET连接的输出流和输入流,我们下面来看看Recorder类是怎么封装SOCKET的输入流和输出流的
Recorder类的重要成员如下,主要包括封装的输入流和输出流以及有序的字符序列(缓存到本地文件)
private RecordingInputStream ris = null; private RecordingOutputStream ros = null; /** * Backing file basename. * * Keep it around so can clean up backing files left on disk. */ private String backingFileBasename = null; /** * Backing file output stream suffix. */ private static final String RECORDING_OUTPUT_STREAM_SUFFIX = ".ros"; /** * Backing file input stream suffix. */ private static final String RECORDING_INPUT_STREAM_SUFFIX = ".ris"; /** * recording-input (ris) content character encoding. */ protected String characterEncoding = null; /** * Charset to use for CharSequence provision. Will be UTF-8 if no * encoding ever requested; a Charset matching above characterEncoding * if possible; ISO_8859 if above characterEncoding is unsatisfiable. * TODO: unify to UTF-8 for unspecified and bad-specified cases? * (current behavior is for consistency with our prior but perhaps not * optimal behavior) */ protected Charset charset = Charsets.UTF_8; /** whether recording-input (ris) message-body is chunked */ protected boolean inputIsChunked = false; /** recording-input (ris) entity content-encoding (eg gzip, deflate), if any */ protected String contentEncoding = null; private ReplayCharSequence replayCharSequence;
RecordingInputStream ris对象和RecordingOutputStream ros对象分别为SOCKET的输入流和输出流的装饰类,就流缓存到本地文件,里面用到了装饰模式,相关方法我就不分析了,不懂的读者可以参考java的输入流和输出流及装饰模式
构造方法用于初始化封装的输入流和输出流对象
/** * Create an HttpRecorder. * * @param tempDir Directory into which we drop backing files for * recorded input and output. * @param backingFilenameBase Backing filename base to which we'll append * suffices <code>ris</code> for recorded input stream and * <code>ros</code> for recorded output stream. * @param outBufferSize Size of output buffer to use. * @param inBufferSize Size of input buffer to use. */ public Recorder(File tempDir, String backingFilenameBase, int outBufferSize, int inBufferSize) { this(new File(ensure(tempDir), backingFilenameBase), outBufferSize, inBufferSize); } private static File ensure(File tempDir) { try { org.archive.util.FileUtils.ensureWriteableDirectory(tempDir); } catch (IOException e) { throw new IllegalStateException(e); } return tempDir; } public Recorder(File file, int outBufferSize, int inBufferSize) { super(); this.backingFileBasename = file.getAbsolutePath(); this.ris = new RecordingInputStream(inBufferSize, this.backingFileBasename + RECORDING_INPUT_STREAM_SUFFIX); this.ros = new RecordingOutputStream(outBufferSize, this.backingFileBasename + RECORDING_OUTPUT_STREAM_SUFFIX); }
装饰输入流和输出流的相关方法如下
/** * Wrap the provided stream with the internal RecordingInputStream * * open() throws an exception if RecordingInputStream is already open. * * @param is InputStream to wrap. * * @return The input stream wrapper which itself is an input stream. * Pass this in place of the passed stream so input can be recorded. * * @throws IOException */ public InputStream inputWrap(InputStream is) throws IOException { logger.fine(Thread.currentThread().getName() + " wrapping input"); // discard any state from previously-recorded input this.characterEncoding = null; this.inputIsChunked = false; this.contentEncoding = null; this.ris.open(is); return this.ris; } /** * Wrap the provided stream with the internal RecordingOutputStream * * open() throws an exception if RecordingOutputStream is already open. * * @param os The output stream to wrap. * * @return The output stream wrapper which is itself an output stream. * Pass this in place of the passed stream so output can be recorded. * * @throws IOException */ public OutputStream outputWrap(OutputStream os) throws IOException { this.ros.open(os); return this.ros; }
void close()方法用于关闭流
/** * Close all streams. */ public void close() { logger.fine(Thread.currentThread().getName() + " closing"); try { this.ris.close(); } catch (IOException e) { // TODO: Can we not let the exception out of here and report it // higher up in the caller? DevUtils.logger.log(Level.SEVERE, "close() ris" + DevUtils.extraInfo(), e); } try { this.ros.close(); } catch (IOException e) { DevUtils.logger.log(Level.SEVERE, "close() ros" + DevUtils.extraInfo(), e); } }
下面的成员和方法非常重要,用于在多线程里面设置和获取当前的Recorder对象(HttpClient组件里面的HttpConnection对象就是通过这里获取当前Recorder对象的)
static ThreadLocal<Recorder> currentRecorder = new ThreadLocal<Recorder>(); public static void setHttpRecorder(Recorder httpRecorder) { currentRecorder.set(httpRecorder); } /** * Get the current threads' HttpRecorder. * * @return This threads' HttpRecorder. Returns null if can't find a * HttpRecorder in current instance. */ public static Recorder getHttpRecorder() { return currentRecorder.get(); }
下面的方法围绕着获取ReplayCharSequence对象,均为实现有序的字符集相关,用于内容字符的解析
static Set<String> SUPPORTED_ENCODINGS = new HashSet<String>(); static { SUPPORTED_ENCODINGS.add("gzip"); SUPPORTED_ENCODINGS.add("x-gzip"); SUPPORTED_ENCODINGS.add("deflate"); SUPPORTED_ENCODINGS.add("identity"); SUPPORTED_ENCODINGS.add("none"); // unofficial but common } /** * @param contentEncoding declared content-encoding of input recording. */ public void setContentEncoding(String contentEncoding) { String lowerCoding = contentEncoding.toLowerCase(); if(!SUPPORTED_ENCODINGS.contains(contentEncoding.toLowerCase())) { throw new IllegalArgumentException("contentEncoding unsupported: "+contentEncoding); } this.contentEncoding = lowerCoding; } /** * @return Returns the characterEncoding. */ public String getContentEncoding() { return this.contentEncoding; } /** * @return A ReplayCharSequence. Caller may call * {@link ReplayCharSequence#close()} when finished. However, in * heritrix, the ReplayCharSequence is closed automatically when url * processing has finished; in that context it's preferable not * to close, so that processors can reuse the same instance. * @throws IOException * @see {@link #endReplays()} */ public ReplayCharSequence getContentReplayCharSequence() throws IOException { if (replayCharSequence == null || !replayCharSequence.isOpen() || !replayCharSequence.getCharset().equals(charset)) { if(replayCharSequence!=null && replayCharSequence.isOpen()) { // existing sequence must not have matched now-configured Charset; close replayCharSequence.close(); } replayCharSequence = getContentReplayCharSequence(this.charset); } return replayCharSequence; } /** * @param characterEncoding Encoding of recorded stream. * @return A ReplayCharSequence Will return null if an IOException. Call * close on returned RCS when done. * @throws IOException */ public ReplayCharSequence getContentReplayCharSequence(Charset requestedCharset) throws IOException { // raw data overflows to disk; use temp file InputStream ris = getContentReplayInputStream(); ReplayCharSequence rcs = new GenericReplayCharSequence( ris, calcRecommendedCharBufferSize(this.getRecordedInput()), this.backingFileBasename + RECORDING_OUTPUT_STREAM_SUFFIX, requestedCharset); ris.close(); return rcs; } /** * Calculate a recommended size for an in-memory decoded-character buffer * of this content. We seek a size that is itself no larger (in 2-byte chars) * than the memory already used by the RecordingInputStream's internal raw * byte buffer, and also no larger than likely necessary. So, we take the * minimum of the actual recorded byte size and the RecordingInputStream's * max buffer size. * * @param inStream * @return int length for in-memory decoded-character buffer */ static protected int calcRecommendedCharBufferSize(RecordingInputStream inStream) { return (int) Math.min(inStream.getRecordedBufferLength()/2, inStream.getSize()); } /** * Get a raw replay of all recorded data (including, for example, HTTP * protocol headers) * * @return A replay input stream. * @throws IOException */ public ReplayInputStream getReplayInputStream() throws IOException { return getRecordedInput().getReplayInputStream(); } /** * Get a raw replay of the 'message-body'. For the common case of * HTTP, this is the raw, possibly chunked-transfer-encoded message * contents not including the leading headers. * * @return A replay input stream. * @throws IOException */ public ReplayInputStream getMessageBodyReplayInputStream() throws IOException { return getRecordedInput().getMessageBodyReplayInputStream(); } /** * Get a raw replay of the 'entity'. For the common case of * HTTP, this is the message-body after any (usually-unnecessary) * transfer-decoding but before any content-encoding (eg gzip) decoding * * @return A replay input stream. * @throws IOException */ public InputStream getEntityReplayInputStream() throws IOException { if(inputIsChunked) { return new ChunkedInputStream(getRecordedInput().getMessageBodyReplayInputStream()); } else { return getRecordedInput().getMessageBodyReplayInputStream(); } } /** * Get a replay cued up for the 'content' (after all leading headers) * * @return A replay input stream. * @throws IOException */ public InputStream getContentReplayInputStream() throws IOException { InputStream entityStream = getEntityReplayInputStream(); if(StringUtils.isEmpty(contentEncoding)) { return entityStream; } else if ("gzip".equalsIgnoreCase(contentEncoding) || "x-gzip".equalsIgnoreCase(contentEncoding)) { try { return new GZIPInputStream(entityStream); } catch (IOException ioe) { logger.log(Level.WARNING,"gzip problem; using raw entity instead",ioe); IOUtils.closeQuietly(entityStream); // close partially-read stream return getEntityReplayInputStream(); } } else if ("deflate".equalsIgnoreCase(contentEncoding)) { return new DeflaterInputStream(entityStream); } else if ("identity".equalsIgnoreCase(contentEncoding) || "none".equalsIgnoreCase(contentEncoding)) { return entityStream; } else { // shouldn't be reached given check on setContentEncoding logger.log(Level.INFO,"Unknown content-encoding '"+contentEncoding+"' declared; using raw entity instead"); return entityStream; } }
---------------------------------------------------------------------------
本系列Heritrix 3.1.0 源码解析系本人原创
转载请注明出处 博客园 刺猬的温驯
本文链接 http://www.cnblogs.com/chenying99/archive/2013/04/28/3048392.html