Tabs versus Spaces
Tabs versus Spaces:
An Eternal Holy War.
© 2000 Jamie Zawinski http://www.jwz.org/contact.html
The last time the tabs-versus-spaces argument flared up in my presence, I wrote this. Gasoline for the fire? Maybe.
I think a big part of these interminable arguments about tabs is based on people using the same words to mean different things.
In the following, I'm trying to avoid espousing my personal religion here, I just thought it would be good to try and explain the various sects.
Anyway. People care (vehemently) about a few different things:
- When reading code, and when they're done writing new code, they care about how many screen columns by which the code tends to indent when a new scope (or sexpr, or whatever) opens.
- When there is some random file on disk that contains
ASCII byte #9, the TAB character, they care about how their software reacts to that byte, display-wise. - When writing code, they care about what happens when they press the TAB key on their keyboard.
Note that I make a distinction between the TAB character (which is a byte which can occur in a disk file) and the TAB key (which is that plastic bump on your keyboard, which when hit causes your computer to do something.)
As to point #1:
- A lot of people like that distance to be two columns, and a lot of people like that distance to be four columns, and a smaller number of people like to have somewhat more complicated and context- dependent rules than that.
As to point #2, the tab character: there is a lot of history here.
- On defaultly-configured Unix systems, and on ancient dumb terminals and teletypes, the tradition has been for the TAB character to mean ``move to the right until the current column is a multiple of 8.'' (As it happens, this is how Netscape interprets TAB inside <PRE> as well.) This is also the default in the two most popular Unix editors, Emacs and vi.
In many Windows and Mac editors, the default interpretation is the same, except that multiples of 4 are used instead of multiples of 8.
However, some people configure vi to make TAB be mod-2 instead of mod-4 (see below.)
With these three interpretations, the ASCII TAB character is essentially being used as a compression mechanism, to make sequences of SPACE-characters take up less room in the file.
Both Emacs and vi are customizable about the number of columns used. Unix terminals and shell-windows are usually customizable away from their default of 8, but sometimes not, and often it's difficult.
A third interpretation is for the ASCII TAB character to mean ``indent to the next tab stop,'' where the tab stops are set arbitrarily: they might not necessarily be equally distanced from each other. Most word processors can do this; Emacs can do this. I don't think vi can do this, but I'm not sure.
On the Mac, BBedit defaults to 4-column tabs, but the tabstops can be set anywhere. It also has ``entab'' and ``detab'' commands, for converting from spaces to tabs and vice versa (just like Emacs's
As to point #3, the tab key: this is an editor user interface issue.
- Some editors (like vi) treat TAB as being exactly like X, Y, and Z: when you type it, it gets inserted into the file, end of story. (It then gets displayed on the screen according to point #2.)
With editors like this, the interpretation of point #2 is what really matters: since TAB is just a self-inserting character, the way that one changes the semantics of hitting the TAB key on the keyboard is by changing the semantics of the display of the TAB character.
- Some editors (like Emacs) treat TAB as being a command which means ``indent this line.'' And by indent, it means, ``cause the first non-whitespace character on this line to occur at column N.''
To editors like this, it doesn't matter much what kind of interpretation is assigned to point #2: the TAB character in a file could be interpreted as being mod-2 columns, mod-4 columns, or mod-8 columns. The only thing that matters is that the editor realize which interpretation of the TAB character is being used, so that it knows how to properly put the file characters on the screen. The decisions of how many characters by which an expression should be indented (point #1) and of how those columns should be encoded in the file using the TAB character (point #2) are completely orthogonal.
So, the real religious war here is point #1.
Points #2 and #3 are technical issues about interoperability.
My opinion is that the best way to solve the technical issues is to mandate that the ASCII #9 TAB character never appear in disk files: program your editor to expand TABs to an appropriate number of spaces before writing the lines to disk. That simplifies matters greatly, by separating the technical issues of #2 and #3 from the religious issue of #1.
As a data point, my personal setup is the same as the default Emacs configuration: the TAB character is interpreted as mod-8 indentation; but my code is indented by mod-2.
I prefer this setup, but I don't care deeply about it.
I just care that two people editing the same file use the same interpretations, and that it's possible to look at a file and know what interpretation of the TAB character was used, because otherwise it's just impossible to read.
In Emacs, to set the mod-N indentation used when you hit the TAB key, do this:
(setq c-basic-offset 2) | |
or | (setq c-basic-offset 4) |
To cause the TAB file-character to be interpreted as mod-N indentation, do this:
(setq tab-width 4) | |
or | (setq tab-width 8) |
To cause TAB characters to not be used in the file for compression, and for only spaces to be used, do this:
- (setq indent-tabs-mode nil)
You can also do this stuff on a per-file basis. The very first line of a file can contain a comment which contains variable settings. For the XP code in the client, you'll see many files that begin with
- /* -*- Mode: C; tab-width: 4 -*- */
The stuff between -*-, on the very first line of the file, is interpreted as a list of file-local variable/value pairs. A hairier example:
- /* -*- mode: java; c-basic-offset: 2; indent-tabs-mode: nil -*- */
If you have different groups of people with different customs, the presence of these kinds of explicit settings are really handy.
I believe vi has a mechanism for doing this sort of thing too, but I don't know how it works.
To keep myself honest (that is, to ensure that no tabs ever end up in source files that I am editing) I also do this in my .emacs file:
(defun java-mode-untabify () (save-excursion (goto-char (point-min)) (while (re-search-forward "[ \t]+$" nil t) (delete-region (match-beginning 0) (match-end 0))) (goto-char (point-min)) (if (search-forward "\t" nil t) (untabify (1- (point)) (point-max)))) nil) (add-hook 'java-mode-hook '(lambda () (make-local-variable 'write-contents-hooks) (add-hook 'write-contents-hooks 'java-mode-untabify)))
That ensures that, even if I happened to insert a literal tab in the file by hand (or if someone else did when editing this file earlier), those tabs get expanded to spaces when I save. This assumes that you never use tabs in places where they are actually significant, like in string or character constants, but I never do that: when it matters that it is a tab, I always use '\t' instead.
Here are some details on vi, courtesy of Woody Thrower:
Standard vi interprets the tab key literally, but there are popular vi-derived alternatives that are smarter, like vim. To get vim to interpret tab as an ``indent'' command instead of an insert-a-tab command, do this:
- set softtabstop=2
To set the mod-N indentation used when you hit the tab key in vim (what Emacs calls c-basic-offset), do this:
- set shiftwidth=2
To cause the TAB file-character to be displayed as mod-N in vi and vim (what Emacs calls tab-width), do this:
- set tabstop=4
To cause TAB characters to not be used in the file for compression, and for only spaces to be used (what emacs calls indent-tabs-mode), do this:
- set expandtab
In vi (and vim), you can do this stuff on a per-file basis using ``modelines,'' magic comments at the top of the file, similarly to how it works in Emacs:
- /* ex: set tabstop=8 expandtab: */
So go forth and untabify!