1 COMPRESS File compression ala IEEE Computer, June 1984. Authors: Spencer W. Thomas (decvax!harpo!utah-cs!utah-gr!thomas) Jim McKie (decvax!mcvax!jim) Steve Davies (decvax!vax135!petsd!peora!srd) Ken Turkowski (decvax!decwrl!turtlevax!ken) James A. Woods (decvax!ihnp4!ames!jaw) Joe Orost (decvax!vax135!petsd!joe) Mark Pizzolato (uunet!lupine!infopiz!mark) Alain Fauconnet (fauconne@frsim51.bitnet) Algorithm from "A Technique for High Performance Data Compression", Terry A. Welch, IEEE Computer Vol 17, No 6 (June 1984), pp 8-19. Usage: compress [-cdfivVpz] [-b bits] [file ...] 2 Inputs: -c: Write output on stdout, don't remove original. -d: If given, decompression is done instead. -f: Forces output file to be generated, even if one already exists, and even if no space is saved by compressing. If -f is not used, the user will be prompted if stdin is a tty, otherwise, the output file will not be overwritten. -i: Image mode (defined only under MS-DOS and VMS). Prevents conversion between UNIX text representation (LF line termination) in compressed form and MS-DOS text representation (CR-LF line termination) in uncompressed form. Useful with non-text files. -r rsize Output file record size (defined only under VMS). Defines RMS record size for output file when image mode is specified. Output will be fixed record size file with no record attributes. Implies -i. -v: Write compression statistics -V: Write version and compilation options. -b: Parameter limits the max number of bits/code. -p: When uncompressing require that file have "#! cunbatch\n" which is stripped off before uncompression, and when compressing provide it at the beginning of the output file. -z: Leave name unchanged and don't impose or require ".Z" file names. file ...: Files to be compressed. If none specified, stdin is used. 2 Outputs: file.Z: Compressed form of file with same mode, owner, and utimes or stdout (if stdin used as input) 2 Assumptions: When filenames are given, replaces with the compressed version (.Z suffix) only if the file decreases in size. 2 Algorithm: Modified Lempel-Ziv method (LZW). Basically finds common substrings and replaces them with a variable size code. This is deterministic, and can be done on the fly. Thus, the decompression procedure needs no input table, but tracks the way the table was built.