读后笔记 -- Java核心技术(第11版 卷 II) Chapter2 输入与输出
2.1 输入 / 输出流
No relationship with java.util.stream.
2.1.1-2.1.3 读写字节
1) Easiest to use static methods from the java.nio.file.Files class:
1 Path path = Path.of(filenameString); // better than Paths.get(),其实 Paths.get() 调用的就是 Path.of() 2 InputStream in = Files.newInputStream(path); 3 OutputStream out = Files.newOutputStream(path);
2) Get an input stream from any URL:
1 URL url = new URL("http://horstmann.com/index.html"); 2 InputStream in = url.openStream();
3) Get an input stream from a byte[] array or write to a byte[] array:
// get an input stream from a byte[] array byte[] bytes = ...; InputStream in = new ByteArrayInputStream(bytes); // Conversely, you can write to a ByteArrayOutputStream and then collect the bytes: ByteArrayOutputStream out = new ByteArrayOutputStream(); Write to out byte[] bytes = out.toByteArray();
4) The read method returns a single byte (as an int) or -1 at the end of input:
1 InputStream in = ...; 2 int b = in.read(); 3 if (b != -1) { byte value = (byte) b; ...}
It is more common to read bytes in bulk:
1 byte[] bytes = ...; 2 int len = in.read(bytes);
5) No method for reading all bytes from a stream. Here is one solution:
1 ByteArrayOutputStream out = new ByteArrayOutputStream(); 2 byte[] bytes = new byte[1024]; 3 while ((len = in.read(bytes)) != -1) {out.write(bytes, 0 , len);} // -1: end of the input stream。该方法可以一次读写指定长度的 bytes[] 4 bytes = out.toByteArray();
For files, just call:
1 byte[] bytes = Files.readAllBytes(path); // from Java 9
6) You can write one byte or bytes from an array:
1 OutputStream out = ...; 2 int b = ...; 3 out.write(b); // one byte
4 byte[] bytes = ...; 5 out.write(bytes); // btyes from an array 6 out.write(bytes, start, length);
7) When writing to a stream, close it when you are done:
Or better, use a try-with-resources block (resource will be automatically closed):
1 try (OutputStream out = ...) { 2 out.write(bytes); 3 }
8) To save an input stream to a file, call:
1 Files.copy(in, path, StandardCopyOption.REPLACE_EXISTING);
Java 9/10 new feature:
- 1. There is finally a method to read all bytes from an input stream(解决了上面 4) 的限制): byte[] bytes = url.openStream().readAllBytes();
- There is also readNBytes.
- 2. InputStream.transferTo(OutputStream) transfer all bytes from an input stream to an output stream.
- 3. Java 10: Reader.transferTo(Writer)
- 4. Java 10: Character sets in PrintWriter, Scanner, etc. can be specified as Charset instead of String. new Scanner(path, StandCharsets.UTF_8)
- 5. Scanner.tokens gets a stream of tokens, similiar to Pattern.splitAsStream from Java 8: Stream<String> tokens = new Scanner(path).useDelimiter("\\s*,\\s*).tokens();
2.1.4 读写文本文件
1. Summary:
- InputStream/Outputstream process bytes.
- Text files contain characters.
- Java uses Unicode for characters.
- Readers/Writers convert between bytes and characters.
- Always specify the character encoding. Use StandardCharsets.UTF_8 for Charset parameters, "UTF-8" for string parameters.
2. You can obtain a Reader for any input stream:
1 InputStream inStream = ...; 2 Reader in = new InputStreamReader(inStream, charset);
The read methods reads one char value, it's too low-level for most purposes.
1) You can read a short file into a string:
1 String content = new String(Files.readAllBytes(path), charset); // Files.readAllBytes(path) returns byte[], then call new String() to convert to String
2) You can get all lines as a list or stream:
1 List<String> lines = Files.readAllLines(path, charset); 2 3 try (Stream<String> lines = Files.lines(path, charset)) { 4 ... 5 }
3. Use a Scanner to split input into numbers, words, and so on:
Scanner in = new Scanner(path, "UTF-8"); while (in.hasNextDouble()) { double value = in.nextDouble(); ... } // To read words, set the delimeter to any sequence of non-letters (sample in textFile\ScannerTest.java): // method1: in.useDelimiter in.useDelimiter("\\PL+"); while (in.hasNext()) { String word = in.next(); ... } // method2: in.tokens() Stream<String> words = in.tokens();
4. To write to a file, make one of these calls as following. Then call out.print, out.println, or out.printf to produce output.
1 PrintWriter out = new PrintWriter(Files.newBufferedWriter(path, charset)); 2 3 PrintWriter out = new PrintWriter(filenameString, charsetString);
// write data to file
Remeber to close the file: try (PrintWriter out = ... ) {...}
If you already have the entire output in a string, or a collection of lines, call:
1 Files.write(path, contentString.getBytes(charset)); 2 Files.write(path, lines, charset);
You can also append output to a file:
1 Files.write(path, lines, charset, StandardOpenOption.APPEND);
5. Sometimes, a library method wants a Writer object. Example:
1 Throwable.printStackTrace(PrintWriter out)
If you want to capture the output in a string, not a file, use a StringWriter:
1 StringWriter writer = new StringWriter(); // StringWriter 是将一个字符发送到字符串,而不是磁盘文件。另外,StringWriter 本身没有 print 方法,所以需要将其包装到 PrinterWriter 中 2 throwable.printStackTrace(new PrintWriter(writer));
Now you can process the stack trace as string:
1 String stackTrace = writer.toString();
- Files.readAllBytes()、Files.readString()、Files.readAllLines()、Files.writeString()、Files.write()、
InputStream in = Files.newInputStream(path); OutputStream out = Files.newOutputStream(path); Reader in = Files.newBufferedReader(path, charset); // 返回 BufferReader(),BufferReader类扩展了Reader类 Writer out = Files.newBufferedWriter(path, charset);
2.2/2.5 读写二进制数据
1. 处理二进制文件
DataInput / DataOutput interfaces have methods readInt / writeInt, readDouble / writeDouble, and so on.
Can wrap any stream into a DataInputStream / DataOutputStream:
1 DataInput in = new DataInputStream(new FileInputStream(path)); 2 DataOutput out = new DataOutputStream(new FileOutputStream(path));
Reading / writing stream data is sequential.
2. 随机访问文件
2.1 方式一: RandomAccessFile (section 2.2.2)
"Random access file": You can jump to any file position and start reading/writing. Open with "r" for reading or "rw" for writing:
1 RandomAccessFile file = new RandomAccessFile(filenameString, "rw");
The getFilePointer method yields the current position (as a long).
The seek method moves to a new position. Example: Increment an integer that you just read:
1 int value = file.readInt(); 2 file.seek(file.getFilePointer() - 4); // 第1句读取一个整数,此时位置偏移。此时读取当前位置 - 4(整数长度),即回到了刚才的位置 3 file.writeInt(value + 1);
2.2 方式二:内存映射文件 Memory-Mapped Files(section 2.5)
A memory-mapped file provides very efficient random access for large files. (Uses operating system mechanism for virtal memory.)
// step1: Get a channel for the file: FileChannel channel = FileChannel.open(path, StandardOpenOption.READ, StandOpenOption.WRITE); // step2: Map an area of the file (or all of it) into memory: ByteBuffer buff = channel.map(FileChannel.MapMode.READ_WRITE, 0, channel.size()); // step3: You use methods get, getInt, getDouble, and so on to read, and the equivalent put methods to write: int position = ...; int value = buffer.getInt(position); buffer.put(position, value + 1);
The file is updated at some point, and certainly when the channel is closed (can use with try-with-resources).
2.4 操作文件(创建、访问、删除文件和目录): Path, Files
1. Working with Path
Path objects specify abstract path names (which may not currently exist on disk). Sequence of directory names, optionally followed by a file name. First component may be a root component such as / or C:\.
Use Paths.get / Path.of to create paths:
1 Path absolute = Paths.get("/", "home", "cay"); // start with root 2 Path relative = Paths.get("myapp", "conf", "user.properties");
Path separator / or \ is suppiled for the default file system. If you know which platform your program is running, you can provide a string with separators:
1 Path homeDirectory = Paths.get("/home/cay");
1.1. The call p.resolve(q) computes "p then q". If q is absolute, that's just q, otherwiszie, first follow p, then follow q:
1 Path workPath = homeDirectory.resolve("myapp/work");
1.2. The oppostie of resolve is relativize, yielding "how to get from p to q".
1 Paths.get("/home/cay").relativize(Paths.get("/home/fred/myapp")) 2 // yields "../fred/myapp"
1.3. normalize removes . or directory/../ and other redundancies.
1.4. toAbsolutePath makes a path absolute.
2. Taking Paths Apart
Utility methods to get at the most important parts:
1 Path p = Paths.get("/home", "cay", "myapp.properties");
2 Path parent = p.getParent(); // The path /home/cay 3 Path file = p.getFileName(); // The last element, myapp.properties 4 Path root = p.getRoot(); // The initial segment / (null for a relative path) 5 Path first = p.getName(0); // The first element, home 6 Path dir = p.subpath(1, p.getNameCount()); // All but the first element, cay/myapp.properties
You can iterate over the components:
1 for (Path component : path) { 2 ... 3 }
To interoperate with legacy File class, use:
1 File file = path.toFile(); 2 Path path = file.toPath();
3. Files
2.4.3 To create a new directory, call:
1 Files.createDirectory(path); // All but the last component must exist。仅创建下一级目录 2 Files.createDirectories(path); // Missing components are created. 创建路径中的中间目录即可创建多级目录
You can create an empty file, If the file exists, an exception occurs. Check and creation are atomic.
1 Files.createFile(path);
Convencience methods for creating temporary files:
1 Path tempFile = Files.createTempFile(dir, prefix, suffix); 2 Path tempFile = Files.createTempFile(prefix, suffix); 3 4 Path tempDir = Files.createTempDirectory(dir, prefix); 5 Path tempDir = Files.createTempDirectory(prefix);
Files.createTempFile(null, ".txt") might return a path such as /tmp/1234405522364837194.txt.
Files.exists(path) checks whether a path currently exists.
Use Files.isDirectory(path), Files.isRegularFile(path), Files.isSymbolicLink(path) to find out whether the path is directory, file, or symlink. More infor: isHidden, isExecutable, isReadable, isWritable of the Files class.
Files.size(path) reports the file size as a long value.
2.4.4 Use the copy or move method:
1 Files.copy(fromPath, toPath); 2 Files.move(fromPath, toPath);
Can define behavior with copy options:
1 Files.copy(fromPath, toPath, StandardCopyOption.REPLACE_EXISTING, StandardCopyOption.COPY_ATTRIBUTES); 2 Files.move(fromPath, toPath, StandardCopyOption.ATOMIC_MOVE);
Delete a file like this:
1 Files.delete(path); // throws exception if path doesn't exist 2 boolen deleted = Files.deleteIfExists(path);
2.4.6 Files.list(dirpath) yields a Stream<Path> of the directory entries. The directory is read lazily -- efficient for huge directories. Be sure to close the stream. (Files.list 不会进入子目录)
1 try (Stream<Path> entries = Files.list(pathToDirectory)) {...}
Call Files.walk(dirpath) to visit all descendants of subdirectories as well. Descendants are visited in depth-first order.
1 try (Stream<Path> entries = Files.walk(pathToRoot)) {
2 entries.foreach(System.out.println);
3 }
If you filter results by file attributes (size, creation time, and so on), use find instead of walk for greater efficiency:
1 Files.find(path, maxDepth, (path, attr) -> attr.size() > 10000)
Use Files.walk to copy a directory tree: // JDK 目前没有提供方法来实现复制目录
Files.walk(source).forEach(p -> { try { Path q = target.resolve(source.relativize(p)); if (Files.isDirectory(p)) Files.createDirectory(q); else Files.copy(p, q); catch (IOException ex) { throw new UncheckedIOException(ex); } });
Unfortunately, this approach doesn't work for deleting a directory tree. Need to vist children before deleting the parent. => Use FileVisitor instead:
// Delete the directory tree starting at root
1 Files.walkFileTree(root, new SimpleFileVisitor<Path>() { 2 public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException { 3 Files.delete(file); 4 return FileVisitResult.CONTINUE; 5 } 6 public FileVisitResult postVisitDirectory(Path dir, IOException ex) throws IOException { 7 if (ex != null) throw ex; 8 Files.delete(dir); 9 return FileVisitResult.CONTINUE; 10 } 11 });
Paths class looks up paths in the default file system.
4. ZIP file sytem
Can have file system for the files in a ZIP archive:
1 FileSystem zipfs = FileSystems.newFileSystem(Paths.get(zipname), (ClassLoader) null);
Copy out a file if you know its name:
1 Files.copy(zipfs.getPath(sourceName), targetPath);
To list all files in an archive, walk the file tree:
1 Files.walk(zipfs.getPath("/", forEach(p -> { Process p });
Here is the magic incantation for creating a zip file:
1 Path zipPath = Paths.get("myfile.zip); 2 URI uri = new URI("jar", zipPath.toUri().toString(), null); // uri: jar:file:///C:/Users/xxxxxx/IdeaProjects/trunk/lessonlearn_coreJava/1.zip 3 // Constructs the URI jar:file://myfile.zip 4 try (FileSystem zipfs = FileSystems.newFileSystem(uri, Collections.singletonMap("create", "true"))) { 5 // To add files, copy them into the ZIP file system 6 Files.copy(sourcePath, zipfs.getPath("/").resolve(targetPath)); 7 }
5. Java 11 新特性
- String.lines yields a stream of all lines in a string;
- String.strip trims Unicode whitespace;
- Path.of does the same as Paths.get -- more consistent and shorter;
- Files.readString reads a file into a string;
- OutputStream nullOutputStream() provides a null stream;
- Analogous methods for InputStream, Reader, Writer;
2.x 处理 互联网上的数据
You can read data from a given URL. That gets you the contents of the URL(from the GET request).
1 URL url = new URL("http://hostmann.com/index.html"); 2 InputStream in = url.openStream();
Sometimes, you need to use the URLConnection class for more complex cases:
- Making a POST request
- Setting request headers
- Reading response headers
// 1. Get an URLConnection object: URLConnection connection = url.openConnection(); // 2. Set request properties: connection.setRequestProperty("Accept-Charset", "UTF-8, ISO-8859-1"); // 3. Send data to the server: connection.setDoOutput(true); try (OutputStream out = connection.getOutputStream()) { Write to out } // 4. Read the response headers: connection.connect(); // If you skipped step 3 Map<String, List<String>> headers = connection.getHeaderFields(); // 5. Read the response: try (InputStream in = connection.getInputStream()) { Read from in }
When writing to a HttpURLConnection, the default encoding is application/x-www-form-urlencoded. But you still need to encode the name/value pairs.
Suppose POST data are given in a map:
URLConnection connection = url.openConnection(); connection.setDoOutput(true); try (Writer out = new OutputStreamWriter(connection.getOutputStream(), StandardCharsets.UTF_8)) { boolean first = true; for (Map.Entry<String, String> entry : postData.entrySet()) { if (first) first = false; else out.write("&"); out.write(URLEncoder.encode(entry.getKey(), "UTF-8"); out.write("="); out.write(URLEncoder.encode(entry.getValue(), "UTF-8"); } }
Java 9 HttpClient:
// Build a client: HttpClient client = HttpClient.newBuilder() .fllowRedirects(HttpClient.Redirect.ALWAYS) .build(); // Build a request: HttpRequest request = HttpRequest.newBuilder() .uri(URI.create("http://horstmann.com")) .GET() .build(); // Get and handle response: HttpResponse<String> reponse = client.send(request, HttpResponse.BodyHandlers.ofString()); // Asynchronous processing: Client.sendAsync(request, HttpResponse.BodyHandlers.ofString()) .completeOnTimeout("<html></html>", 10, TimeUnit.SECONDS) .thenAccept(response -> Process response.body());
2.7 正则表达式
2.7.1 基本语法
Regualr expressions (regex) specify string patterns.
- The regex [Jj]e?a.+ matches Java and jealous but not Jim or ja
- Special characters . * + ? { | ( ) [ \ ^ $
- . matches any character, * is 0 or more, + 1 or more, ? 0 or 1 repetition
- Use braces for other multiplicities such as {2, 4}
- | denotes alternatives: (Java|Scala)
- () are used for grouping
- [...] delimit character classes, such as [A-Za-z]
- Useful predefined character classes such as \s (space), \pL (Unicode letters), completements(补集,即与前面相反) \S, \PL
- ^ and $ match the beginning and end of input
- Escape special character with \ to match them literally
- Caution: Must double-escape \ in Java strings
Two principal ways to use a regex:
- 应用一:Find all matches within a string;
- 应用二:Find whether the entire string matches
应用一:This loop iterates over all matches of a regex in a string:
1 Pattern pattern = Pattern.compile(regexString); 2 Matcher matcher = pattern.matcher(input); 3 while (matcher.find()) { 4 String match = matcher.group(); 5 ... 6 }
Use matcher.start(), matcher.end() to get the position of the current match in the string.
应用二:Use the matches method to check wheter a string matches a regex:
1 String regex = "[12]?[0-9]:[0-5][0-9][ap]m"; 2 if (Pattern.matches(regex, input)) { ... }
Compile the regex if you need it repeatedly:
1 Pattern pattern = Pattern.compile(regex); 2 Matcher matcher = patter.matcher(input); 3 if (matcher.matches()) ...
Can turn the pattern into a predicate:
1 Stream<String> result = streamOfStrings.filter(pattern.asPredicate());
Use groups to match subexpressions. Group index values start with 1.
// Example: Match records such as: Blackwell Toaster USD29.95 // 1. Regex with groups: // step1: notes: \p{Alnum} 是预定义字符类,等同于 [A-Za-z0-9] (\p{Alnum}+(\s+\p{Alnum}+)*)\s+([A-Z]{3})([0-9.]*) // step2: Use the group method to get at each group" Matcher matcher = pattern.matcher(input); if (matcher.matches()) { item = matcher.group(1); // Blackwell Toaster currency = matcher.group(3); // USD price = matcher.group(4); // 29.95 } // 2. Clearer with named groups: (?<item>\p{Alnum}+(\s+\p{Alnum}+)*)\s+(?<currency>[A-Z]{3})(?<price>[0-9.]*) // then you retrive items by name: item = matcher.group("item");
2.7.4 分隔符分割
1 // Specify the delimiter as a regex: 2 Pattern commas = Pattern.compile("\\s*,\\s*"); 3 String[] tokens = commas.split(input); // String "1, 2, 3" truns into array ["1", "2", "3"] 4 5 // Fetch result lazily for large inputs: 6 Stream<String> tokens = commas.splitAsStream(input); 7 8 // If you don't care about efficiency, just use the String.split method: 9 String[] tokens = input.split("\\s*,\\s*");
2.7.5 替换匹配
// To replace all matches, can replaceAll on the matcher Matcher matcher = commas.matcher(input); String result = matcher.replaceAll(", "); // If you don't care about efficiency, just use the String.replaceAll method: String result = input.replaceAll("\s*,\s*", ", "); // Group numbers $n or names $name are replaced with the captured group: String result = "3:45".replaceAll( "(\\d{1,2}):(?<minutes>\\d{2})", "$1 hours and ${minutes} minutes");
Java 9/10 关于 正则表达式的改进:
1) Matcher.stream and Scanner.findAll gets a stream of match results:
1 Pattern pattern = Pattern.compile("[^,]"); 2 Stream<String> matches = pattern.match(str).results().map(MatchResult::group); 3 4 matches = new Scanner(path).findAll(pattern).map(MatchResult::group);
2) Matcher.replaceFirst / replaceAll now have a version with a replacement function:
1 String result = Pattern.compile("\\pL{4,}") 2 .matcher("Mary had a little lamb) 3 .replaceAll(m -> m.group().toUpperCase()); 4 // yields "MARY had a LITTLE LAMB"
2.3 序列化
- 存储相同类型的数据 => 可用固定长度的记录格式 (如示例 randomAccess\Employee.java,需要定义固定长度的变量)
- 对象 => 序列化(如示例 objectStream\Employee.java,需要实现 Serializable)
Serialization :an object -> a sequence of bytes. Deserailization:a sequence of bytes -> an object.
Useful for sending objects to a different computer and short-term storage (e.g. cache). Not intended for long-term storage.
Participating classes implement the serializable marker interface:
public class Employee implements Serializable { ... }
// 1. 输出流 // 1.1 Construct an ObjectOutputStream object: ObjectOutputStream out = new ObjectOutputStream(Files.newOutputStream(path)); // 1.2 Call the writeObject method: Employee peter = new Employee("Peter", 90000); Employee paul = new Manager("Paul", 180000); out.writeObject(peter); out.writeObject(paul); // 2. 输入流 // 2.1 Construct an ObjectInputStream object: ObjectInputStream in = new ObjectInputStream(Files.newInputStream(path)); // 对于 Employee 类,其包含字符串和浮点数,这些都是可串行化的 // 2.2 Retrieve the objects in the same order as they were saved: Employee e1 = (Employee) in.readObject(); Employee e2 = (Employee) in.readObject();
使用 writeObject 方法写这些对象,要想正常工作,需要满足两个条件:
- 1. 这个类需要实现 Serializable 接口;
- 2. 这个类的所有实例变量也必须是可串行化的;
// Consider this network of objects, 一个对象被多个对象共享时: 需要保存这样的对象网络 Employee peter = new Employee("Peter", 40000); Manager paul = new Manager("Paul", 105000); Manager mary = new Manager("Mary", 180000); paul.setAdmin(peter); mary.setAdmin(peter); ObjectOutputStream out = new ObjectOutputStream(Files.newOutputStream(path)); out.writeObject(peter); out.writeObject(paul); out.writeObject(mary);
- 对遇到的每一个对象引用都关联一个序列号(serial number);
- 对于每一个对象,当第一次遇到时,保存其对象数据到输出流中;
- 如某个对象之前被保存过,只写出“与之前保存过的序列号为 x 的对象相同”
- 对于对象输入流中的对象,在第一次遇到其序列号时,构建它,并使用流中数据来初始化它,然后记录这个顺序号和新对象之间的关联;
- 当遇到“与之前保存过的序列号为 x 的对象相同”这一标记,获取与这个序列号相关联的对象引用;
Declare fields that shouldn't be serialized with the transient modifier.
You can take over serialization of fields by implementing the readObject / writeObject methods. (Useful for saving instances of non-serializable classes.)
You can delegate serialization and deserialization to a proxy by implementing the readResolve/writeReplace methods. (Useful in rare cases when object identity needs to be preserved.)
You can declare multiple versions of serializations.
- Default serialVersionUID is obtained by hashing fields names and types.
- If the serialVersionUID changes, readObject throws an exception.
- You can declare your own version ID and implement deserialization to conside multiple versions. private static final long serialVersionUID = 2L; // Version 2
- Complex and raraly useful.
序列化的其他参考: https://www.cnblogs.com/bruce-he/p/17098132.html
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· 上周热点回顾(2.24-3.2)
2018-01-17 Python3 _ 读取大文件