问题一:
使用webclient抓取网页时报错:(GZIPInputStream.java:207) atjava.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197) atjava.util.zip.GZIPInp
使用webclient抓取网页时报错:
at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:264)
at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:254)
at java.util.zip.GZIPInputStream.readUInt(GZIPInputStream.java:246)
at java.util.zip.GZIPInputStream.readTrailer(GZIPInputStream.java:218)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:118)
at org.apache.http.client.entity.LazyDecompressingInputStream.read(LazyDecompressingInputStream.java:73)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.Reader.read(Reader.java:140)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2001)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1980)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:1957)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:1907)
at org.apache.commons.io.IOUtils.toString(IOUtils.java:778)
at org.apache.commons.io.IOUtils.toString(IOUtils.java:803)
at core.downloader.HttpClientDownloader.getContent(HttpClientDownloader.java:283)
添加头信息
addRequestHeader("Accept-Encoding", "");
或者.addHeader("Accept-Encoding", "\t")
或者.addHeader("Accept-Encoding", "\n")
后不会抛出EOFException异常。
问题二:
java.util.zip.ZipException:
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:165)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:79)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:91)
at org.apache.http.client.protocol.ResponseContentEncoding$1.create(ResponseContentEncoding.java:67)
at org.apache.http.client.entity.LazyDecompressingInputStream.initWrapper(LazyDecompressingInputStream.java:54)
at org.apache.http.client.entity.LazyDecompressingInputStream.read(LazyDecompressingInputStream.java:66)
设置好User-Agent参数,问题就不出现了。其他人说是添加Accept、Accept-Encoding参数