The original byte-oriented library was supplemented with char-oriented, Unicode-based I/O classes.
It’s rather important to understand the evolution of the I/O library.
The File class
"FilePath" would have been a better name for the class.
It can represent either the name of a particular file or the names of a set of files in a directory.
If you call list( ) with no arguments, you’ll get the full list that the File object contains.
If you want a restricted list, then you use a "directory filter," which is a class that tells how to select the File objects for display.
How anonymous inner classes allow the creation of specific, one-off classes to solve problems.
You can also use a File object to create a new directory or an entire directory path if it doesn’t exist.
Input and output
stream represents any data source or sink as an object capable of producing or receiving pieces of data.
The stream hides the details of what happens to the data inside the actual I/O device.
You’ll rarely create your stream object by using a single class, but instead will layer multiple objects together to provide your desired functionality.
The fact that you create more than one object to produce a single stream is the primary reason that Java’s I/O library is confusing.
Types
InputStream’s job is to represent classes that produce input from different sources. These sources can be: An array of bytes, A String obj ect, A file, A pipe, A sequence of other streams, and Other sources, such as an Internet connection.
For OutputStream, This category includes the classes that decide where your output will go: an array of bytes, a file, or a pipe.
Adding attributes and useful interfaces
The reason for the existence of the "filter" classes in the Java I/O library is that the abstract "filter" class is the base class for all the decorators.
The reason that the Java I/O library is awkward to use is that you must create many classes in order to get the single I/O object that you want.
The classes that provide the decorator interface to control a particular InputStream or OutputStream are the FilterlnputStream and FilterOutputStream.
Reading from an InputStream with FilterlnputStream
DatalnputStream allows you to read different types of primitive data as well as String objects. This, along with its companion DataOutputStream, allows you to move primitive data from one place to another via a stream.
The remaining FilterlnputStream classes modify the way an InputStream behaves internally.
You’ll need to buffer your input almost every time, regardless of the I/O device you’re connecting to, so it would have made more sense for the I/O library to have a special case (or simply a method call) for unbuffered input rather than buffered input.
Writing to an OutputStream with FilterOutputStream
DataOutputStream formats each of the primitive types and String objects onto a stream in such a way that any DatalnputStream, on any machine, can read them.
The original intent of PrintStream was to print all of the primitive data types and String objects in a viewable format.
PrintStream doesn’t internationalize properly and doesn’t handle line breaks in a platform-independent way.
You’ll probably always want to use BufferedOutputStream when doing output.
Readers & Writers
The InputStream and OutputStream classes still provide valuable functionality in the form of byte-oriented I/O, whereas the Reader and Writer classes provide Unicode-compliant, character-based I/O.
There are times when you must use classes from the "byte" hierarchy in combination with classes in the "character" hierarchy.
InputStreamReader converts an InputStream to a Reader, and OutputStreamWriter converts an OutputStream to a Writer.
Since Unicode is used for internationalization, the Reader and Writer hierarchies were added to support Unicode in all I/O operations.
Sources and sinks of data
There are some places where the byte-oriented InputStreams and OutputStreams are the correct solution.
The most sensible approach to take is to try to use the Reader and Writer classes whenever you can.
Off by itself: RandomAccessFile
RandomAccessFile is used for files containing records of known size so that you can move from one record to another using seek( ), then read or change the records.
RandomAccessFile is not part of the InputStream or OutputStream hierarchy.
It happens to implement the DataInput and DataOutput interfaces, which are also implemented by DataInputStream and DataOutputStream.
It’s a completely separate class, written from scratch, with all of its own (mostly native) methods.
The constructors require a second argument indicating whether you are just randomly reading ("r") or reading and writing ("rw").
Typical uses of I/O streams
Buffered input file
For speed, you’ll want that file to be buffered so you give the resulting reference to the constructor for a BufferedReader.
Formatted memory input
You must use all InputStream classes rather than Reader classes.
You can use the available( ) method to find out how many more characters are available.
Note that available( ) works differently depending on what sort of medium you’re reading from.
The use of exceptions for control flow is considered a misuse of that feature.
Basic file output
It’s trivial to keep track of your own line numbers.
If you don’t call close( ) for all your output files, you might discover that the buffers don’t get flushed, so the file will be incomplete.
Java SE5 added a helper constructor to PrintWriter so that you don’t have to do all the decoration by hand every time you want to create a text file and write to it.
Other commonly written tasks were not given shortcuts, so typical I/O will still involve a lot of redundant text.
Storing and recovering data
However, to output data for recovery by another stream, you use a DataOutputStream to write the data and a DataInputStream to recover the data.
If you use a DataOutputStream to write the data, then Java guarantees that you can accurately recover the data using a DataInputStream— regardless of what different platforms write and read the data.
If you read a string written with writeUTF( ) using a non-Java program, you must write special code in order to read the string properly.
For any of the reading methods to work correctly, you must know the exact placement of the data item in the stream.
Reading and writing random-access files
RandomAccessFile has specific methods to read and write primitives and UTF-8 strings. Here’s an example.
It doesn’t support decoration, so you cannot combine it with any of the aspects of the InputStream and OutputStream subclasses.
You may want to consider using nio memory-mapped files instead of RandomAccessFile.
File reading & writing utilities
It makes sense to add helper classes to your library that will easily perform these basic tasks for you.
The java.util.Scanner class is primarily designed for creating programming-language scanners or "little languages."
Standard I/O
The term standard I/O refers to the Unix concept of a single stream of information that is used by a program.
The value of standard I/O is that programs can easily be chained together, and one program’s standard output can become the standard input for another program.
Reading from standard input
Following the standard I/O model, Java has System.in, System.out, and System.err.
System.in is a raw InputStream with no wrapping.
System.in must be wrapped before you can read from it.
Note that System.in should usually be buffered, as with most streams.
Changing System.out to a PrintWriter
PrintWriter has a constructor that takes an OutputStream as an argument. You can convert System.out into a PrintWriter using that constructor.
It’s important to use the two-argument version of the PrintWriter constructor and to set the second argument to true in order to enable automatic flushing.
Redirecting standard I/O
Redirecting output is especially useful if you suddenly start creating a large amount of output on your screen, and it’s scrolling past faster than you can read it.
Redirecting input is valuable for a command-line program in which you want to test a particular user-input sequence repeatedly.
Process control
A common task is to run a program and send the resulting output to the console.
New I/O
The "old" I/O packages have been reimplemented using nio in order to take advantage of this speed increase, so you will benefit even if you don’t explicitly write code with nio.
The speed comes from using structures that are closer to the operating system’s way of performing I/O.
You don’t interact directly with the channel; you interact with the buffer and send the buffer into the channel.
The only kind of buffer that communicates directly with a channel is a ByteBuffer—that is, a buffer that holds raw bytes.
It’s fairly low-level, precisely because this makes a more efficient mapping with most operating systems.
A channel is fairly basic: You can hand it a ByteBuffer for reading or writing, and you can lock regions of the file for exclusive access.
The goal of nio is to rapidly move large amounts of data, so the size of the ByteBuffer should be significant.
You must experiment with your working application to discover whether direct buffers will buy you any advantage in speed.
Special methods transferTo( ) and transferFrom( ) allow you to connect one channel directly to another.
Converting data
A ByteBuffer can be viewed as a CharBuffer with the asCharBuffer( ) method.
The buffer contains plain bytes, and to turn these into characters, we must either encode them as we put them in or decode them as they come out of the buffer.
Fetching primitives
Although a ByteBuffer only holds bytes, it contains methods to produce each of the different types of primitive values from the bytes it contains.
View buffers
A "view buffer" allows you to look at an underlying ByteBuffer through the window of a particular primitive type.
Any changes you make to the view are reflected in modifications to the data in the ByteBuffer.
Once the underlying ByteBuffer is filled with ints or some other primitive type via a view buffer, then that ByteBuffer can be written directly to a channel.
Endians
"Big endian" places the most significant byte in the lowest memory address.
"little endian" places the most significant byte in the highest memory address.
A ByteBuffer stores data in big endian form, and data sent over a network always uses big endian order.
Data manipulation with buffers
Note that ByteBuffer is the only way to move data into and out of channels, and that you can only create a standalone primitive-typed buffer, or get one from a ByteBuffer using an "as" method.
Buffer details
A Buffer consists of data and four indexes to access and manipulate this data efficiently: mark, position, limit and capacity.
The goal is always to manipulate a ByteBuffer, since that is what interacts with a channel.
Memory-mapped files
With a memory-mapped file, you can pretend that the entire file is in memory and that you can access it by simply treating it as a very large array.
The file appears to be accessible all at once because only portions of it are brought into memory, and other parts are swapped out.
File locking
File locking allows you to synchronize access to a file as a shared resource.
Two threads that contend for the same file may be in different JVMs, or one may be a Java thread and the other some native thread in the operating system.
The file locks are visible to other operating system processes because Java file locking maps directly to the native operating system locking facility.
Although the zero-argument locking methods adapt to changes in the size of a file, locks with a fixed size do not change if the file size changes.
The zero-argument locking methods lock the entire file, even if it grows.
Support for exclusive or shared locks must be provided by the underlying operating system.
File mapping is typically used for very large files. You may need to lock portions of such a large file so that other processes may modify unlocked parts of the file.
The locks are automatically released when the JVM exits, or the channel on which it was acquired is closed, but you can also explicitly call release( ).
Compression
The compression library works with bytes, not characters.
You can use InputStreamReader and OutputStreamWriter to provide easy conversion between one type and another.
Simple compression with GZIP
You simply wrap your output stream in a GZIPOutputStream or ZipOutputStream, and your input stream in a GZIPInputStream or ZipInputStream.
Multifile storage with Zip
There’s even a separate class to make the process of reading a Zip file easy.
The library uses the standard Zip format so that it works seamlessly with all the Zip tools currently downloadable on the Internet.
However, even though the Zip format has a way to set a password, this is not supported in Java’s Zip library.
You are not limited to files when using the GZIP or Zip libraries— you can compress anything, including data to be sent through a network connection.
Java ARchives (JARs)
JAR files are cross-platform, so you don’t need to worry about platform issues.
You can also include audio and image files as well as class files.
By combining all of the files for a particular applet into a single JAR file, only one server request is necessary and the transfer is faster because of compression.
A JAR file consists of a single file containing a collection of zipped files along with a "manifest" that describes them.
You can’t add or update files to an existing JAR file; you can create JAR files only from scratch. Also, you can’t move files into a JAR file, erasing them as they are moved.
Object serialization
There are situations in which it would be incredibly useful if an object could exist and hold its information even while the program wasn’t running.
Java’s object serialization allows you to take any object that implements the Serializable interface and turn it into a sequence of bytes that can later be fully restored to regenerate the original object.
The serialization mechanism automatically compensates for differences in operating systems.
You must explicitly serialize and deserialize the objects in your program.
Java’s Remote Method Invocation (RMI) allows objects that live on other machines to behave as if they live on your machine.
Object serialization is also necessary for JavaBeans.
Even Class objects can be serialized.
Object serialization is byte-oriented, and thus uses the InputStream and OutputStream hierarchies.
Once the ObjectOutputStream is created from some other stream, writeObject( ) serializes the object.
You can also write all the primitive data types using the same methods as DataOutputStream.
Note that no constructor, not even the default constructor, is called in the process of deserializing a Serializable object.
Controlling serialization
You don’t want to serialize portions of your object, or perhaps it just doesn’t make sense for one subobject to be serialized if that part needs to be created anew when the object is recovered.
This is different from recovering a Serializable object, in which the object is constructed entirely from its stored bits, with no constructor calls.
With an Externalizable object, all the normal default construction behavior occurs.
If you are inheriting from an Externalizable object, you’ll typically call the base-class versions of writeExternal( ) and readExternal( ) to provide proper storage and retrieval of the base-class components.
The transient keyword
There might be a particular subobject that you don’t want Java’s serialization mechanism to automatically save and restore.
One way to prevent sensitive parts of your object from being serialized is to implement your class as Externalizable.
Then nothing is automatically serialized, and you can explicitly serialize only the necessary parts inside writeExternal( ).
You can turn off serialization on a field-by-field basis using the transient keyword.
Since Externalizable objects do not store any of their fields by default, the transient keyword is for use with v objects only.
An alternative to Externalizable
You can implement the Serializable interface and add methods called writeObject( ) and readObject( ) that will automatically be called when the object is serialized and deserialized, respectively.
They are defined as private, which means they are to be called only by other members of this class. However, you don’t actually call them from other members of this class, but instead the writeObject( ) and readObject( ) methods of the ObjectOutputStream and ObjectInputStream objects call your object’s writeObject( ) and readObject( ) methods.
If you use the default mechanism to write the non-transient parts of your object, you must call defaultWriteObject( ) as the first operation in writeObject( ), and defaultReadObject( ) as the first operation in readObject( ).
Using persistence
It’s possible to use object serialization to and from a byte array as a way of doing a "deep copy" of any object that’s Serializable.
As long as you’re serializing everything to a single stream, you’ll recover the same web of objects that you wrote, with no accidental duplication of objects.
The objects will be written in whatever state they are in at the time you serialize them.
The safest thing to do if you want to save the state of a system is to serialize as an "atomic" operation.
Class is Serializable, so it should be easy to store the static fields by simply serializing the Class object.
Even though class Class is Serializable, it doesn’t do what you expect. So if you want to serialize statics, you must do it yourself.
XML
An important limitation of object serialization is that it is a Java-only solution: Only Java programs can deserialize such objects.
A more interoperable solution is to convert data to XML format, which allows it to be consumed by a large variety of platforms and languages.
This requires that you know ahead of time the exact structure of your XML file, but this is often true with these kinds of problems.
It’s also possible for you to write more complex code that will explore the XML document rather than making assumptions about it, for cases when you have less concrete information about the incoming XML structure.
Preferences
Its use is restricted to small and limited data sets—you can only hold primitives and Strings, and the length of each stored String can’t be longer than 8K.
The Preferences API is designed to store and retrieve user preferences and program-configuration settings.
Preferences are key-value sets (like Maps) stored in a hierarchy of nodes.
You don’t need to use the current class as the node identifier, but that’s the usual practice.