- The basics:
InputStream
andOutputStream
- Working with text:
Reader
andWriter
- Diversion: Unicode
- Back to
Reader
andWriter
- When to use
InputStream
versusReader
- Wrappers: BufferedReaders, PrintWriters, et al.
- Exceptions
The basics: InputStream
and OutputStream
The basic I/O types in Java are
java.io.InputStream
and java.io.OutputStream
. These are abstract classes, which represent [respectively] something you can read bytes from, and something you can write bytes to. Specifically, the read[]
method of InputStream
returns a single byte, and the write[int b]
method of OutputStream
writes a single byte. There are also variants of read[]
and write[]
that handle multiple bytes at once, using a byte[]
, but those can be defined in terms of the single-byte versions.
Since InputStream
and OutputStream
are abstract, you can’t directly create an object like this:
// Error: Can't instantiate abstract class InputStream
InputStream is = new InputStream[];
Instead, you
create an instance of one of the concrete subtypes of InputStream
. For example, to read from a file, you can use java.io.FileInputStream
:
// This will work, if myfile.txt exists, in the same folder where you call
// `java MyMainClass` from
InputStream is = new FileInputStream["myfile.txt"];
There’s also java.io.FileOutputStream
, which is similar:
// This create a new file called "hello.txt" in the current directory.
OutputStream os = new FileOutputStream["hello.txt"];
os.write[new byte[] {
'H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd', '\n' }];
os.close[];
The last line is important: you should always close an
InputStream
or OutputStream
when you’re done with it. Failing to close a stream can lead to lost data. Make sure you either call close[]
, or [if you’re so inclined] you can read about the try-with-resources construct that will call close[]
for you.
Working with text: Reader
and Writer
The above example shows a really painful way to create a “Hello, world” file. You might wonder why I didn’t write
os.write["Hello, world\n"];
Go try it, and see what happens. To understand the error message, understand that String
is not like byte[]
; it’s closer to char[]
. In Java, char
is two bytes [unlike C++, where generally char
is one byte].
As long as we only use ASCII text so far [so: English letters, basic punctuation, no accents], we can use one byte per character [in fact, ASCII strings only need 7 bits per character, not the full 8 bits per byte]. But there are many other characters: accented letters [common in European languages], ideographs [such as in Japanese or Chinese text], more advanced punctuation [like “smart” quotes or the ellipsis: ‘…’], emoji [like ☺], symbols [like ∫ or ♩], and so on.
Diversion: Unicode
Click here to skip the diversion on Unicode and character encodings.
The modern way of handling character encoding is Unicode. According to Wikipedia, there are currently about 137 thousand Unicode characters; the Unicode standard can potentially encode about 1 million characters.
Java uses an
encoding called UTF-16, which encodes the 137 thousand Unicode characters into two-byte chars. Obviously, it’s not possible to represent 137 thousand characters using only two bytes, so some Unicode characters require two chars. For example, the symbol “ِ𐀀” [Unicode character U+10000] requires to chars: '\uD800'
and '\uDC00'
, which you can see on the linked site if you scroll down to the “C/C++/Java
source code” entry in the table.
All things considered, Unicode is very nice. Before Unicode, there were many different character encodings in use. Many of them started with ASCII [which defines characters 0 through 127] and added up to 128 additional characters. For example, the encoding ISO-8859-1 added a number of accented Latin characters [e.g. á] and a few extra letters [like the German ß], and this encoding worked pretty well for European languages. But it wasn’t the only encoding in use in Europe, and naturally [for example] Japanese text was written in a different encoding. The result was that if you had a text file, you had to know [or guess] the right encoding to be able to read it. Nowadays when most things are Unicode [usually encoded as UTF-8, which is backwards-compatible with ASCII], you don’t have to worry nearly as much about knowing which encoding is in use.
Back to Reader
and Writer
InputStream
and OutputStream
read/write one byte at a time. When you’re dealing with text in Java, you should use String
and char
, instead of working directly with bytes. So, instead of using InputStream
and OutputStream
, use java.io.Reader
and java.io.Writer
. These
are very similar to InputStream
and OutputStream
, except that now read[]
and write[int b]
handle char values [2 bytes, 0 to 216-1] instead of byte values [0 to 28-1]. So, you can for example write and execute this code:
Writer w = new FileWriter["hello.txt"];
w.write["Hello, world\n"];
w.close[];
When to use InputStream
versus Reader
The decision criterion for InputStream
versus Reader
or OutputStream
versus Writer
is pretty simple: if you’re dealing with text, use Reader
or Writer
. If you’re dealing with inherently binary data [like a JPEG image, an MP4 video, executable code, a ZIP file], then use InputStream
or OutputStream
.
Wrappers: BufferedReaders, PrintWriters, et al.
We’ve
actually been using a special OutputStream
the whole time: java.lang.System.out
is an instance of the class java.io.PrintStream
, which is a sub-type of OutputStream
. PrintStream
is an example of a wrapper: given an existing OutputStream
[e.g. a FileOutputStream
that you opened], you can wrap the existing object in a PrintStream
like so:
// Open a basic OutputStream, somehow...
OutputStream os = new FileOutputStream["hello.txt"];
// Wrap it in a PrintStream
PrintStream ps = new PrintStream[os];
// Now, you cal call the familiar `print[]` or `println[]` [or `printf[]`, even]
ps.println["Hello, world"];
// We only need to close the wrapper ps; this will automatically close the
// underlying stream os.
ps.close[];
Another common type of stream wrapper is the family of BufferedInputStream
, BufferedOutputStream
, BufferedReader
, and
BufferedWriter
. These wrappers add a buffer. When you read from a BufferedReader
, for example, the Java runtime makes a call to your operating system to read a few kilobytes worth of text, and saves the result in a buffer. Then it returns the characters you asked for. Next time you call read[]
, the BufferedReader
can return a character directly from its buffer, instead of having to go back to the operating system
[which is slow!]. If you’re reading a large file in small pieces, it’s common to open it like so:
BufferedReader br = new BufferedReader[new FileReader["my-long-file.txt"]];
If you want to write to a file instead of to System.out
, here’s a good way of doing it: open the file with a FileWriter
, wrap that in a BufferedWriter
, then wrap that in a PrintWriter
. The resulting object works like System.out
[so you can call println[]
], but it will efficiently write to a file instead of to the console. For example,
FileWriter myFile = new FileWriter["myFile.txt"];
BufferedWriter bufFile = new BufferedWriter[myFile];
PrintWriter printFile = new PrintWriter[bufFile];
printFile.println["Hello, world!"];
printFile.printf["%d + %d = %d\n", 2, 4, 2+4];
By the way,
a nice bonus of using BufferedReader
is the readLine[]
method. As you might guess from the name, this reads a single line of text [up to, and including, a '\n'
character, generally speaking]. This is very useful for parsing text files, since text files often have a line-based structure.
Exceptions
Most of the IO methods discussed above can
throw exceptions [specifically, they can throw java.io.IOException
or one of its subtypes, like FileNotFoundException
]. These are checked exceptions: you have to handle them explicitly, either via a try
/catch
block, or by adding IOException
to the throws
clause of the function you’re writing. See the
slides on exceptions for more details.
Generally, IOException
means IO failed, so you have to give up on what you were trying to do. For example, if the user entered a file name but you got FileNotFoundException
when trying to open it, you should give up on reading that file [and if you’re nice, print an error message and ask the user to try again].