Oxford University Press's
Academic Insights for the Thinking World

On The Dot

Alexander Humez has written over ten books and lives in Somerville, Massachusetts.  Nicholas Humez is a freelance writer and silversmith who lives outside Cleveland, Ohio.  Together they wrote On The Dot: The Speck That Changed the World which looks at one of the most versatile players in the history of written communication.  The book looks at the dot in all its various forms, from sentence stopper, to musical notation, to Morse code.  In the excerpt below we learn briefly how the dot came to be used with computers, as a marker between a file name and its extension.

How is it possible, given the variety of manufacturers of computers and software, to have a computer literate society functioning as a unified system that allows Mary on her Mac to talk to Dennis on his Dell? This is the result of a nesting of languages within languages: The deep structure of any computer’s functioning lies ultimately in a machine language of binary digits (“bits” for short)—that is, the strings of zeros and ones—that any computer actually “reads.” A programmer can give any given device its input as machine language (and in the earliest days of the “big iron” computers such as UNIVAC I and ENIAC there was no choice), but a computer’s assembly language is a meta- language that allows the programmer to say consequential things about machine language, including instructions for common operations, with codename such as ADD or MULT. This saves a good deal of writing out binary numbers (and lessens the opportunity for error).

Machine language and assembly language are specific to a type of computer; you cannot input the assembly language for your Gateway PC into a Macintosh and expect results, at least not meaningful ones. (The first axiom in computers is GIGO: garbage in, garbage out.) And in any case, assembly language is still pretty tedious stuff and not particularly user-friendly. Higher-level languages solve this problem; one writes a program in COBOL or FORTRAN II or BASIC or C++ and then runs another program called a compiler, which translates the higher-order language into machine language (sometimes though not always via assembly language) for one’s particular device. This allows a much less tedious and more programmer-friendly programming environment and gets everyone through the day without having to crumple up too many sheets of wonky code and throw them in the wastebasket with a colorful oath.

More important is the fact that high-level languages have made possible the sorts of operating systems and accompanying software that are user-friendly to the point of requiring very little sophistication on the user’s part as to how a computer actually works. An operating system is control software; it manages the way operations stipulated by other programs are sequenced and mediates access to peripheral devices such as printers and disk drives. The IBM PC, which broke onto the personal computer market at the start of the 1980s, hard on the heels of the Apple IIC (1977) and Hewlett-Packard HP-85, came with an operating system called MS-DOS (or just DOS for short). Acquired by Microsoft from Seattle Computer Products (its original name was QDOS, for “Quick and-Dirty Operating System), it was retooled to work on IBM’s PC. DOS-based software for the PC burgeoned as independent developers saw the potential for business applications given IBM’s already strong position in this market. Apple’s Macintosh offered a more user-friendly graphic interface, with icons and mouse clicks in place of command-line instructions and carriage return; nevertheless, the business applications for DOS far outnumbered those for the Mac (many banks would still be using them well into the ‘90s), and Microsoft further widened the market share gap with the introduction of its first Windows operating system in 1985, offering a look and feel similar to the Mac environment while retaining the option to go into DOS for operation of its existing software.

Fundamental to the operation of DOS and Windows was its filing system, 5 to 8 characters+dot+extension of up to three characters having been the SFN (short file name) convention in DOS prior to support for LFNs (long file names), which eases this constraint. The 5-to-8 part is the base file name, and the up-to-three part is the file extension, e.g., autoexec.bat, myfile.txt, or whatsup.doc. The 8+3 file name is a feature of the FAT (file allocation table) system design, a file allocation table being a structure supported by various operating systems (including DOS) that contains a variety of information about the files you’ve written to some storage medium—such as your hard drive, a floppy disk, a CD—that allows the system to locate the file in that medium. Think of an entry in the table as a set of fixed length fields: For example, in DOS/Windows, these fields include name, size, file type, and modification date and time. The length of the name field in the SFN FAT system is 11 (the dot doesn’t count as a character). Long file names can be more than 8 characters for the base file name and more than 3 characters for the file extension.

So, why 8.3? This constraint on file names is a function of the format in which information is represented—stored, accessed, and processed—internally by a computer, the smallest unit being the binary digit (bit), access usually being by the byte (a block of binary digits) or by the word. Nowadays, a word is typically 16, 32, or 64 bits, which may be subdivided into 8-bit bytes. (The term bit was credited by Claude F. Shannon in a 1948 paper to John Tukey of Bell Labs who apparently coined the term in 1947; byte was coined by Dr. Werner Buchholz of IBM in 1956, a prudent respelling of the originally proposed bite, which looked too much like bit.) The characters that make up a file name (or any other piece of text) are represented in bits, the conventional encoding system being the American Standard Code for Information Interchange (ASCII); this system’s canonical inventory runs from binary 100000 (decimal 32), the space character, to 11111111 (decimal 255, i.e., 2^8–1), the character ÿ—where, reckoning right to left , each digit is an increasing power of 2, starting with 2^0, which has a decimal value of 1. (ASCII inventory has subsequently been extended above decimal 255 to accommodate additional characters. For example, extended ASCII 10000000 decimal 256—is used to represent A.)

In the days before IBM’s System/360 computer promulgated the 8-bit byte and 16-bit word, and the 8.3 file name length became standard, different computer architectures had supported a variety of word lengths and, as a result, a variety of permissible file name lengths. So a system whose native word length was 36 bits could accommodate base file names of five 7-bit characters (with a bit left over), or base file names of six 6-bit characters plus a 3-character extension. The 6-bit solution entailed some electronic legerdemain: The trick was to subtract 32 from the 7-bit ASCII value before processing (and put it back when the processing was done); this was fine for the uppercase characters (100001 to 1011010—decimal 65 to 90—minus 32 yields 100001 to 111010—decimal 33 to 58) and the characters with lower ASCII values (space, !, $, and so on), but not workable for the lowercase letters, which run from 1100001 to 1111010 (decimal 97 to 122) and which still use 7 bits even when you subtract 32. But since printers of this era could only print uppercase characters and some marks of punctuation, this limitation wasn’t really a hardship…

Recent Comments

There are currently no comments.