Good Math, Bad Math : Worlds Greatest Pathological Language: TECO

Now on ScienceBlogs, The World’s Fair asks: Is it inappropriate to display a loved one’s MRI as wall art?

Search this blog

Profile

MarkCC is a Computer Scientist working as a researcher in a corporate lab. My professional interests run towards how to build programming languages and tools that allow groups of people to work together to build large software systems.

Other Information

Add this blog to my Technorati Favorites!

Old Topic Indices

Great Online Books

SICP

« Using the Abacus, part 2: Multiplication | Main | Friday Random Ten, Sept 22 »

Worlds Greatest Pathological Language: TECO

Category: pathological programming
Posted on: September 22, 2006 9:14 AM, by Mark C. Chu-Carroll

I've got a real treat for you pathological programming fans! Today, we're going to take a quick look at the worlds most useful pathological programming language: TECO.

TECO is one of the most influential pieces of software ever written. If, by chance, you've ever heard of a little editor called "emacs"; well, that was originally a set of editor macros for TECO (EMACS = Editor MACroS).

As a language, it's both wonderful and awful. On the good side, The central concept of the language is wonderful: it's a powerful language for processing text, which works by basically repeatedly finding text that matches some kind of pattern, taking some kind of action when it finds it, and then selecting the next pattern to look for. That's a very natural, easy to understand way of writing programs to do text processing. On the bad side, it's got the most god-awful hideous syntax ever imagined.

History

TECO deserves a discussion of its history - it's history is basically the history of how programmers' editors developed. This is a very short version of it, but it's good enough for this post.

In the early days, PDP computers used a paper tape for entering programs. (Mainframes mostly used punched cards; minis like the PDPs used paper tape). The big problem with paper tape is that if there's an error, you need to either create a whole new tape containing the correction, or carefully cut and splice the tape together with new segments to create a new tape (and splicing was very error prone).

This was bad. And so, TECO was born. TECO was the "Tape Editor and COrrector". It was a turing complete programming language in which you could write programs to make your corrections. So you'd feed the TECO program in to the computer first, and then feed the original tape (with errors) into the machine; the TECO program would do the edits you specified, and then you'd feed the program to the compiler. It needed to be Turing complete, because you were writing a program to find the stuff that needed to be changed.

A language designed to live in the paper-tape world had to have some major constraints. First, paper tape is slow. Really slow. And punching tape is a miserable process. So you really wanted to keep things as short as possible. So the syntax of TECO is, to put it mildly, absolutely mind-boggling. Every character is a command. And I don't mean "every punctuation character", or "every letter". Every character is a command. Letters, numbers, punctuation, line feeds, control characters... Everything.

But despite the utterly cryptic nature of it, it was good. It was very good. So when people started to use interactive teletypes (at 110 baud), they still wanted to use TECO. And so it evolved. But that basic tape-based syntax remained.

When screen-addressable terminals came along - vt52s and such - suddenly, you could write programs that used cursor control! The idea of a full-screen editor came along. Of course, TECO lovers wanted their full screen editor to be TECO. For Vaxes, one of the very first full screen editors was a version of TECO that displayed a screen full of text, and did commands as you typed them; and for commands that actually needed extra input (like search), it used a mode-line on the bottom of the screen (exactly the way that emacs does now).

Not too long after that, Richard Stallman and James Gosling wrote emacs - the editor macros for TECO. Originally, it was nothing but editor macros for TECO to make the full screen editor easier to use. But eventually, they rewrote it from scratch, to be Lisp based. And not long after that, TECO faded away, only to be remembered by a bunch of aging geeks. The syntax of TECO killed it; the simple fact is, if you have an alternative to the mind-boggling hideousness that is TECO syntax, you're willing to put up with a less powerful language that you can actually read. So almost everyone would rather write their programs in Emacs lisp than in TECO, even if TECO was the better language.

The shame of TECO's death is that it was actually a really nice programming language. To this day, I still come across things that I need to do that are better suited to TECO than to any modern day programming language that I know. The problem though, and the reason that it's disappeared so thoroughly, is that the syntax of TECO is so mind-bogglingly awful that no one, not even someone as insane as I am, would try to write code in it when there are other options available.

A Taste of TECO Programming

Before jumping in in and explaining the basics of TECO in some detail, let's take a quick look at a really simple TECO program. This program is absolutely remarkably clear and readable for TECO source. It even uses a trick to allow it to do comments. The things that look like comments are actually "goto" targets.

0uz                             ! clear repeat flag !
<j 0aua l                       ! load 1st char into register A !
<0aub                           ! load 1st char of next line into B !
qa-qb"g xa k -l ga -1uz '       ! if A>B, switch lines and set flag !
qbua                            ! load B into A !
l .-z;>                         ! loop back if another line in buffer !
qz;>                            ! repeat if a switch was made last pass !

The basic idea of TECO programming is pretty simple: search for something that matches some kind of pattern; perform some kind of edit operation on the location you found; and then choose new search to find the next thing to do. The example program above finds the beginnings of lines, and does a swap-sort. So it finds each sequential pair of lines; if they're not in the right order, it swaps them, and sets a flag indicating that another pass is needed.

TECO programs work by editing text in a buffer. Every buffer has a pointer which represents the location where any edit operations will be performed. The cursor always sits between two characters.

The first thing that most TECO programs do is specify what it is that they want to edit - that is, what they want to read into the buffer. The command to do that is "ER". So to edit a file foo.txt, you'd write "ERfoo.txt", and then hit the escape key twice to tell it to execute the command; then the file would be loaded into the buffer.

TECO Commands

TECO commands are generally single characters. But there is some additional structure to allow arguments. There are two types of arguments: numeric arguments, and text arguments. Numeric arguments come before the command; text arguments come after the command. Numeric values used as arguments can be either literal numbers, commands that return numeric values, "." (for the index of the buffer pointer), or numeric values joined by arithmetic operators like "+", "-", etc.

So, for example, the C command moves the pointer forward one character. If it's preceded by a numeric argument N, it will move forward N characters. The J command jumps the pointer to a specific location in the buffer: the numeric argument is the offset from the beginning of the buffer to the location where the pointer should be placed.

String arguments come after the command. Each string argument can be delimited in one of two ways. By default, a string argument continues until it sees an Escape character, which marks the end of the string. Alternatively (and easier to read), if the command is prefixed by an "@" character, then the first character after the command is the delimiter, and the string will continue until the next instance of that character.

So, for example, we said "ER" reads a command into the buffer. So normally, you'd use "ERfoo.txt<ESC>". Alternatively, you could use "@ER'foo.txt'". Or "@ER$foo.txt$". Or "@ERqfoo.txtq". Or even "@ER foo.txt ".

Commands can also be modified by placing a ":" in from of them. For most commands, ":" makes them return either a 0 (to indicate that the command failed), or a -1 (to indicate that the command succeeded). For others, the colon does something else. The only way to know is to know the command.

TECO has variables; in it's own inimitable fashion, they're not called variables; they're called Q-registers. There are 36 global Q-registers, named "A" through "Z" and "0"-"9". There are also 36 local Q-registers (local to a particular macro, aka subroutine), which have a "." character in front of their name.

Q-registers are used for two things. First, you can use them as variables: each Q-register stores a string and an integer. Second, any string stored in a Q-register can be used as a subroutine; in fact, that's the only way to create a subroutine. The commands to work with Q-registers include:

"nUq": "n" is a numeric argument; "q" is a register name. This stores the value "n" as the numeric value of the register "q".
"m,nUq": both "m" and "n" are numeric arguments, and "q" is a register name. This stores "n" as the numeric value of register "q", and then returns "m" as a parameter for the next command.
"n%q": add the number "n" to the numeric value stored in register "q".
"^Uqstring": Store the string as the string value of register "q".
":^Uqstring": Append the string parameter to the string value of register "q".
"nXq": clear the text value of register "q", and copy the next "n" lines into its string value.
"m,nXq": copy the character range from position "m" to position "n" into register "q".
".,.+nXq": copy "n" characters following the current buffer pointer into register "q".
"*Qq": use the integer value of register "q" as the parameter to the next command.
"nQq": use the ascii value of the Nth character of register "q" as the parameter to the next command.
":Qq": use the length of the text stored in register "q" as the parameter to the next command.
"Gq": copy the text contents of register "q" to the current location of the buffer pointer.
"Mq": invoke the contents of register "q" as a subroutine.

There are also a bunch of commands for printing out some part of the buffer. For example, "T" prints the current line. The print command to print a string is control-A; so the TECO hello world program is: "^AHello world^A<ESC><ESC>". Is that pathological enough?

Commands to remove text include things like "D" to delete the character after the pointer; "FD", which takes a string argument, finds the next instance of that argument, and deletes it; "K" to delete the rest of the line after the pointer, and "HK" to delete the entire buffer.

To insert text, you can either use "I" with a string argument, or <TAB> with a string argument. If you use the tab version, then the tab character is part of the text to insert.

There are, of course, a ton of commands for moving the point around the buffer. The basic ones are:

"C" moves the pointer forward one character if no argument is supplied; if it gets a numeric argument N, it moves forwards N characters. C can be preceeded by a ":" to return a success value.
"J" jumps the pointer to a location specified by its numeric argument. If there is no location specified, it jumps to location 0. J can be preceeded by a ":" to see if it succeeded.
"ZJ" jumps to the position after the last character in the file.
"L" is pretty much like "C", except that it moves by lines instead of characters.
"R" moves backwards one character - it's basically the same as "C" with a negative argument.
"S" searches for its argument string, and positions the cursor after the last character of the search string it found, or at position 0 if the string isn't found.
"number,numberFB" searches for its argument string between the buffer positions specified by the numeric arguments.

Search strings can include something almost like regular expressions, but with a much worse syntax. I don't want to hurt your brain too much, so I won't go into detail.

And last, but definitely not least, there's control flow.

First, there are loops. A loop is "n<commands>", which executes the text between the left brack and the right bracket "n" times. Within the loop, ";" branches out of the loop if the last search command failed; "n;" exits the loop if the value of "n" is greater than or equal to zero. ":;" exits the loop if the last search succeeded. "F>" jumps to the loop close bracket (think C continue), "F<" jumps back to the beginning of the loop.

Conditionals are generally written "n"Xthen-command-string|else-command-string'". (Watch out for the quotes in there; there's no particularly good way to quote it, since it uses both of the normal quote characters. The double-quote character introduces the conditional, and the single-quote marks the end.) In this command, the "X" is one of a list of conditional tests, which define how the numeric argument "n" is to be tested. Some possible values of "X" include:

"A" means "if n is the character code for an alphabetic character".
"D" means "if n is the character code of a digit"
"E" means "if n is zero or false"
"G" means "if n is greater than zero"
"N" means "if n is not equal to zero"
"L" means "if n is a numeric value meaning that the last command succeeded"

Example TECO Code

This little ditty reads a file, and converts tabs to spaces assuming that tab stops are every 8 spaces:

  FEB  :XF27: F H M Y<:N   ;'.U 0L.UAQB-QAUC<QC-9"L1;'-8%C>9-QCUD
S    DQD<I >>EX

That's perfectly clear now, isn't it?

Ok, since that was so easy, how about something challenging? This little baby takes a buffer, and executes its contents as a BrainFuck program. Yes, it's a BrainFuck interpreter in TECO!

@^UB#@S/{^EQQ,/#@^UC#@S/,^EQQ}/@-1S/{/#@^UR#.U1ZJQZ\^SC.,.+-^SXQ-^SDQ1J#
@^U9/[]-+<>.,/<@:-FD/^N^EG9/;>J30000<0@I//>ZJZUL30000J0U10U20U30U60U7
@^U4/[]/@^U5#<@:S/^EG4/U7Q7; -AU3(Q3-91)"=%1|Q1"=.U6ZJ@i/{/Q2\@i/,/Q6\@i/}
/Q6J0;'-1%1'>#<@:S/[/UT.U210^T13^TQT;QT"NM5Q2J'>0UP30000J.US.UI
<(0A-43)"=QPJ0AUTDQT+1@I//QIJ@O/end/'(0A-45)"=QPJ0AUTDQT-1@I/
/QIJ@O/end/'(0A-60)"=QP-1UP@O/end/'(0A-62)"=QP+1UP@O/end/'(0A-46)"=-.+QPA
^T(-.+QPA-10)"=13^T'@O/end/'(0A-44)"=^TUT8^TQPJDQT@I//QIJ@O/end/'(0A-91)
"=-.+QPA"=QI+1UZQLJMRMB\    -1J.UI'@O
/end/'(0A-93)"=-.+QPA"NQI+1UZQLJMRMC\-1J.UI'@O/end/'
!end!QI+1UI(.-Z)"=.=@^a/END/^c^c'C>

If you're actually insane enough to want to try this masochistic monstrosity, you can get a TECO interpreter, with documentation and example programs, from here.

Send this entry to: Del.icio.us Spurl Ma.gnolia Digg Reddit A Friend

TrackBacks

TrackBack URL for this entry:

Comments

Wow.

My brain just OD-ed on syntax insanity.

Posted by: Elia Diodati | September 22, 2006 10:25 AM

And people complain about python's love of whitespace... which isn't that big of a deal (and is fairly useful) after your first non-trivial program.

Posted by: Dan R. | September 22, 2006 10:37 AM

I have fond memories of hacking teco both for emacs libraries and standalone programs. But I'm glad I'm not doing that any more. I still have some of my own code -- the emacs library stuff can be a little more readable due to long English names in subroutine calls (m.m) and variable names (m.v), but in general it's pretty much line noise.

Posted by: D. Eppstein | September 22, 2006 10:54 AM

Ah, this brings back fond memories of youth.

Posted by: Ron Avitzur | September 22, 2006 12:53 PM

Whoa! It's been a long time since I've seen the words "TECO" and "VT52" in the same paragraph. I believe that happened in one of DEC's mill buildings.

I enjoy your blog.

Peter

Posted by: Peter | September 22, 2006 01:03 PM

Sweet! Now I can run the BrainFuck implementation of DeCSS through a TECO implementation of BrainFuck!

FORTRAN looks good by comparison.

This gives me a wicked idea: I should write a description of BASIC in the style of a "pathological programming" post. Umpteen features of each dialect I encountered — from 1983-vintage BASIC-XL to Visual Basic 6.0 — could qualify them for pathological status.

Posted by: Blake Stacey | September 22, 2006 01:14 PM

My favorite languages are TECO and APL, although J is almost like a mongrel between APL and TECO ...

Posted by: _Arthur | September 22, 2006 02:13 PM

The series of posts could easily swell out to be about general pathological issues. This one reminds me of meeting the -vi editor first time on a Sun workstation in Dallas. IIRC it has three main states. I was there to work on microelectronic processes, not computers, so I hated the threshold factor at the time.

(But actually the real pathology IIRC was that the default account was set up to use the three button mouse (before gaming) and the manual was written for two buttons usage. Or maybe I thought I could adapt the computer to my habits, instead of vice versa. Anyhoo, the setting to change was hidden away under the moved extra button functions. Sick, I tell you!)

Posted by: Torbjörn Larsson | September 22, 2006 03:06 PM

Blake:

I thought *I* was insane... But actually *using* TECO to run a brainfuck interpreter to run DeCSS... That's just twisted.

Posted by: Mark C. Chu-Carroll | September 22, 2006 03:41 PM

I just missed the TECO era myself, but I remember older hackers playing a game of "what does your name do in TECO?" In any case, that totally looks like line noise. ;-)

Posted by: David Harmon | September 22, 2006 03:52 PM

The real successor to TECO isn't emacs, it's vim. See towers of hanoi and mandelbrot in vim.

Posted by: beza1e1 | September 22, 2006 04:08 PM

beza1e1:

Maybe in *spirit*, vim is the successor to teco. But historically, emacs is the successor. (And personally, I think that linguistically, I'd have to call Perl the child of teco. Perl has much of the same general sensibility of teco as a language; and it's almost as hard to read. :-) )

Also, I'm not sure, but I don't think that vim is actually turing complete. Certainly expressive, but I don't think that it quite makes it all the way.

Posted by: Mark C. Chu-Carroll | September 22, 2006 04:18 PM

Wow, what a blast from the past. I might still have a program for creating a table of contents written in TECO back in the late 70's. Of course, it might be archived on punch cards. Debugging, now that was a fun job in TECO.

Posted by: Steve Lewin-Berlin | September 22, 2006 05:31 PM

it may have been implicit in the background, but the architecture of the ITS OS (not TOPS, rather the "Incompatible Timesharing System", a kind of an *alter* ego to CTSS, with *unusual* security features) which ran on the MIT-AI machines contributed to TECO's power. in particular, this OS received characters from each and every terminal when a key was depressed, not merely when, as was common in display terminals at the time, a CRLF or Return was entered. this meant the program could respond after each and every key. that meant that when using TECO, a skillful operator could massage code or text as they went along, and minimize the number of keystrokes required to achieve a goal. the effect was especially mesmerizing on the AI Lab's Knight displays, in the dark.

as bad as TECO seems, two points.

first, writing TECO macros was and is as deep an intellectual activity as writing APL, J, or perhaps as studying wei-qi, and was fun. indeed, people often got teased when they showed off their collection of macros because they were essentially arguing "My thesis will be so much easier to do and faster if I have just the right set of TECO macros to do it with."

second, Data General ripped TECO off from DEC, although they called it something else entirely. (i believe DG was founded by DEC refugees.) i had to use a DG machine for a project at IBM Federal Systems and colleagues were astonished when i sat down at this brand new DG machine and began editing without much trouble using their editor: it was TECO under a new name.

Posted by: ekzept | September 22, 2006 08:11 PM

David Harmon is right about the game of using your name as TECO input. By a strange coincidence, "Robert G. Munck" is an Ada compiler.

Unfortunately, it's Ada 82.

Posted by: Bob Munck | September 22, 2006 09:23 PM

I've heard rumors that somebody I knew briefly in school had written a BASIC interpreter in vi (without the m) macros. I didn't attempt to verify this, and probably wouldn't've been able to follow what was going on if I had tried.
Apparently the macro language does have two stacks, which along with some sort of decision-making gives you turing-completeness.

Posted by: dave | September 23, 2006 01:14 AM

Q: and what happens if you make a mistake with teco?

A: you Type Every Character Over

Posted by: piedoggie | September 23, 2006 08:53 AM

Search All Blogs

Blogs in the Network

Top Five: Most Emailed

Fractal Fumento 03.03.2006 · Deltoid 14
The Grad School Application Process 09.22.2006 · Uncertain Principles 2
Worlds Greatest Pathological Language: TECO 09.22.2006 · Good Math, Bad Math 2
The Neuroscience of Porn 09.20.2006 · The Frontal Cortex 2
Sustainable killing 09.18.2006 · Effect Measure 2

Search this blog

Profile

Other Information

Recent Posts

Recent Comments

Categories

Blogroll