2-Mar-89  6:18:46-GMT,39284;000000000001
Return-Path: <fdc@watsun.cc.columbia.edu>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA05724; Thu, 2 Mar 89 01:18:38 EST
Received: from watsun.cc.columbia.edu by cunixc.cc.columbia.edu (5.54/5.10) id AA03005; Thu, 2 Mar 89 01:16:06 EST
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA05568; Thu, 2 Mar 89 01:02:25 EST
Date: Thu, 2 Mar 1989 1:02:25 EST
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: Joe Doupnik <jrd@usu.bitnet>, Paul Placeway <paul@cis.ohio-state.edu>,
        Andre Pirard <A-PIRARD@bliulg11.bitnet>,
        Baruch Cochavy <baruchc@techunix.bitnet>,
        Johan Van Wingen <MOSGLA@hlerul2.bitnet>,
        Ken-ichiro Murakami <MURAKAMI%NTT-20.NTT.JP@relay.cs.net>,
        Kohichi Nishimoto <s153380%tkov02.DEC@decwrl.dec.com>,
        Hirofumi Fujii <KEIBUN@jpnkekvm.bitnet>,
        Gisbert W.Selke <RECK@dbnuama1.bitnet>,
        Kurt Enulf <UPSKE@seguc11.bitnet>,
        Jacob Palme <jacob_palme_qz@qzcom.bitnet>,
        Per Lindberg <Per_Lindberg_ZQ@qzcom.bitnet>,
        "Bj|rn Larsen" <x_larsen_b%use.uio.uninett@tor.nta.no>,
        "Hans A. ]lien" <hans%ifi.uio.no@tor.nta.no>,
        Kai U.Leppamaki <LK-KLE@finhut.bitnet>,
        Steve Jenkins <pdsoft%uk.ac.lancs.cent1@nss.cs.ucl.ac.uk>,
        Jean Dutertre <dutertre%padis1.DEC@decwrl.dec.com>,
        Gerard Gaye <GAYE@frsac11.bitnet>,
        David Guerlet <KERMIT@czheth5a.bitnet>,
        Bernie Eiben <eiben@tops20.dec.com>,
        Volker Edelhoff <edelhoff@unido.bitnet>
Cc: Christine M Gianone <cmg@cunixc.cc.columbia.edu>
Subject: Kermit International Character Set Proposal
Message-Id: <CMM.0.88.604821745.fdc@watsun.cc.columbia.edu>

	 A KERMIT PROTOCOL EXTENSION FOR INTERNATIONAL CHARACTER SETS

		     Christine Gianone and Frank da Cruz
	     Columbia University Center for Computing Activities
				   New York
				March 1, 1989


PREFACE

This is a ROUGH FIRST DRAFT of a proposal, based upon a reading of the current
ISO and ECMA character-set standards, some familiarity with the issues
involved, and limited testing with devices that claim to implement these
standards (such as the DEC VT340 terminal).  Readers are urged to correct us
if we have misinterpreted the standards, to fill in missing information, and
to make any comments or criticisms they desire.  Readers with knowledge of
real-world multi-alphabet applications and file formats are especially urged
to comment on how this proposal meshes with these particular file formats.

This first draft is being sent to a selected list of people who we know to be
familiar with both Kermit and the character set issues discussed here, so your
comments will be especially helpful before we circulate this proposal among a
wider audience, most likely by mailing it to Info-Kermit and the ISO8859
discussion group.


INTRODUCTION

The Kermit protocol makes a distinction between text and binary files, and it
defines a particular transfer syntax for text files, namely ASCII characters
with carriage return and linefeed (CRLF) after each line, so that text may be
stored in useful fashion on any computer it is transferred to.  Each Kermit
program knows how to translate from the local storage conventions to Kermit's
transfer syntax, and vice versa.  In this way, text files can be transferred
between unlike systems (say, an EBCDIC card oriented system and an ASCII
stream file system) and remain useful after transfer.

Now that the world's computer users have begun to find US ASCII insufficient
for their uses, and ISO, ECMA, etc, are adopting standard codes for the
world's other alphabets, and vendors like IBM, DEC, and Apple have begun to
make these characters available on their displays (albeit in different
positions), and people are beginning to produce increasing numbers of
multilingual documents, ... (will this sentence ever end???) ... Kermit's text
file transfer syntax needs to be extended to allow for texts in a mixture of
alphabets.

It is best if this can be done in line with currently existing and evolving
standards.  Here are the standards we believe are pertinent:

ANSI X3.4 (1986), "Coded Character Sets - 7-bit American Standard Code for
  Information Interchange" (US ASCII), is the 7-bit code currently used by
  Kermit for transferring text files. 

ISO 646 (1983), "Information Processing - ISO 7-bit Coded Character Sets for
  Information Interchange", gives us a 7-bit character set equivalent to
  ASCII, and says we can substitute "national characters" for for ASCII
  characters #$@[\]^`{|}.  Different languages put different characters in
  these positions, and there's no mechanism defined to specify which language
  is being used. 

ISO 4873 (1986) (= ECMA-43), "Information Processing - ISO 8-bit Code for
  Information Interchange - Structure and Rules for Implementation", defines
  8-bit character sets, their graphic and control regions, and how to extend
  an 8-bit character set by using multiple graphics sets.

ISO 2022 (1986) (= ECMA-35), "Information Processing - ISO 7-bit and 8-bit
  Coded Character Sets - Code Extension Techniques", describes how to use
  8-bit character sets in both 7-bit and 8-bit environments, and how to switch
  among different character sets and alphabets.

ANSI X3.41-1974, "Code Extension Techniques for Use with the 7-Bit Coded
  Character Set of the American National Standard Code for Information
  Interchange", describes 7- and 8-bit codes and extension techniques in
  approximately the same manner as ISO 4873 and ISO 2022.

ISO 8859 (1987-present) (see below for ECMA equivalents), "Information
  Processing - 8-Bit Single-Byte Coded Graphic Character Sets", defines the
  actual 8-bit character sets to be used for many of the world's languages.
  The "lower half" (C0 + G0) of each of these is the same as ASCII and ISO
  646.

ISO is the Internation Standardization Organization, ANSI is the American
National Standards Institute, and ECMA is the European Computer Manufacturers
Association.


HOW THE STANDARDS WORK

ASCII and ISO 646 give us a 128-character 7-bit character set.  This set is
divided into several parts:

  1. 32 control characters (characters 0 through 31).
  2. Space (SP, character 32).
  3. 94 graphic, or printing, characters (33-126).
  4. DEL (rubout, character 127), considered a control character.

The control characters except DEL compose the C0 part of ASCII, and the
graphic characters plus SP and DEL compose the G0 part.  If the ASCII alphabet
is written in a table of 16 rows and 8 colums, then the left 2 columns are the
C0 set, and the right 6 columns are the G0 set:

     <--C0--> <---------G0---------->
      00  01  02  03  04  05  06  07
     +---+---+---+---+---+---+---+---+
  00 |NUL DLE| SP  0   @   P   `   p |
  01 |SOH DC1| !   1   A   Q   a   q |
  02 |STX DC2| "   2   B   R   b   r |
  03 |ETX DC3| #   3   C   S   c   s |
  04 |EOT DC4| $   4   D   T   d   t |
  05 |ENQ NAK| %   5   E   U   e   u |
  06 |ACK SYN| &   6   F   V   f   v |
  07 |BEL ETB| '   7   G   W   g   w |
  08 |BS  CAN| (   8   H   X   h   x |
  09 |HT  EM | )   9   I   Y   i   y |
  10 |LF  SUB| *   :   J   Z   j   z |
  11 |VT  ESC| +   ;   K   [   k   { |
  12 |LF  FS | ,   <   L   \   l   | |
  13 |CR  GS | -   =   M   ]   m   } |
  14 |SO  RS | .   >   N   ^   n   ~ |
  15 |SI  US | /   ?   O   _   o  DEL|
     +---+---+---+---+---+---+---+---+
     <--C0--> <---------G0---------->

Many vendors are now using the full 8 bits available within the computer byte
(and on the transmission line in some cases) for character representation.
At first there were ad-hoc character assignments (e.g. IBM PC 8-bit ASCII,
Apple Macintosh ASCII, etc), but standards are beginning to emerge.

8-bit character sets are described in ISO 4873 and ANSI X3.41.  An 8-bit
character set has two halves.  The "left half" or "lower half" corresponds to
ASCII (and ISO 646).  All the characters in the left half have their
high-order, or 8th, bit set to zero, and are therefore representable in 7
bits.  The "right half" or "upper half" mirrors the left half in that the
first 32 characters are control characters, and the remaining 94 or 96
characters are graphics, but all characters in the right half have their high
order bits set to one.  The right-half controls are called C1, and the
right-half graphics are called G1:

     <--C0--> <---------G0---------->  <--C1--> <---------G1---------->
       00  01  02  03  04  05  06  07    08  09  10  11  12  13  14  15
     +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+
  00 |NUL DLE| SP  0   @   P   `   p | |    DCS|---+                   |
  01 |SOH DC1| !   1   A   Q   a   q | |    PU1|                       |
  02 |STX DC2| "   2   B   R   b   r | |    PU2|                       |
  03 |ETX DC3| #   3   C   S   c   s | |    STS|                       |
  04 |EOT DC4| $   4   D   T   d   t | |IND CCH|                       |
  05 |ENQ NAK| %   5   E   U   e   u | |NEL MW |                       |
  06 |ACK SYN| &   6   F   V   f   v | |SSA SPA|                       |
  07 |BEL ETB| '   7   G   W   g   w | |ESA EPA|                       |
  08 |BS  CAN| (   8   H   X   h   x | |HTS    |      (special         |
  09 |HT  EM | )   9   I   Y   i   y | |HTJ    |       graphics)       |
  10 |LF  SUB| *   :   J   Z   j   z | |VTS    |                       |
  11 |VT  ESC| +   ;   K   [   k   { | |PLD CSI|                       |
  12 |LF  FS | ,   <   L   \   l   | | |PLU ST |                       |
  13 |CR  GS | -   =   M   ]   m   } | |RI  OSC|                       |
  14 |SO  RS | .   >   N   ^   n   ~ | |SS2 PM |                       |
  15 |SI  US | /   ?   O   _   o  DEL| |SS3 APC|                   +---|
     +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+
     <--C0--> <---------G0---------->  <--C1--> <---------G1---------->

G1 character sets can have either 94 or 96 characters.  A 94-character G1 set
has Space (SP) in its first position and DEL in its last, just like G0 (the
notches shown in G1 in the diagram).  A 96-character set has graphic
characters in all 96 positions.

ISO 4873 allows up to four sets of printable graphic characters, of either 94
or 96 characters each (G0, G1, G2, and G3), "active" at one time, plus the
control sets (C0 and C1).  Therefore there can be up to

  2 x 32 + 4 x 96 = 448

characters simultaneously within the repertoire of a given device.  But in
today's world, 7- or 8-bit character i/o is the norm imposed by our terminals,
computer architectures, and (most of all) the nature of our asynchronous
communication devices and transmission systems.  Therefore, a terminal or
computing device will receive at most a 7- or 8-bit code capable of denoting
only 128 or 256 different characters, out of the possible 448.  How can the
additional characters be accessed in the 8-bit environment?  In the 7-bit
environment?

Switching among character sets is accomplished using special control
characters or escape sequences embedded within the data stream.  These control
characters and escape sequences are specified in ISO 2022.  In the following
discussion, we use this notation (numbers are in decimal unless otherwise
noted):

  <ESC> Escape (ASCII 27)
  <SP>  Space  (ASCII 32)
  <SO>  Shift Out (Ctrl-N, ASCII 14)
  <SI>  Shift In  (Ctrl-O, ASCII 15)

ISO 2022 provides two separate mechanisms for handling multiple character
sets.  The first is a set of escape sequences for assigning a particular
alphabet (such as Cyrillic, Hebrew, Arabic, etc) to a particular character set
(G0, G1, G2, or G3).  The second is a set of functions for shifting among the
currently active sets.  Here are the alphabet selectors and shift functions:

  Escape Sequence                                         Shift Function

  <ESC>(F - assigns 94-character graphics set "F" to G0.  Invoke by SI
  <ESC>)F - assigns 94-character graphics set "F" to G1.  Invoke by SO
  <ESC>*F - assigns 94-character graphics set "F" to G2.  Invoke by SS2 or LS2
  <ESC>+F - assigns 94-character graphics set "F" to G3.  Invoke by SS3 or LS3
  <ESC>-F - assigns 96-character graphics set "F" to G1.  Invoke SO
  <ESC>.F - assigns 96-character graphics set "F" to G2.  Invoke by SS2 or LS2
  <ESC>/F - assigns 96-character graphics set "F" to G3.  Invoke by SS3 or LS3

The values for "F" are discussed below.  The shift functions are:

  SO  (Ctrl-N) - Shift Out:       select G1 (locking shift)
  SI  (Ctrl-O) - Shift In:        select G0 (locking shift)
  LS2 (<ESC>n) - Locking Shift 2: select G2 (locking shift)
  LS3 (<ESC>o) - Locking Shift 3: select G3 (locking shift)
  SS2 (<ESC>N) - Single Shift 2:  select G2 (single character shift)
  SS3 (<ESC>O) - Single Shift 3:  select G3 (single character shift)

"Locking shift" is like shift-lock on a typewriter.  It means that all
subsequent characters until the next shift character are to be taken from the
designated character set.  "Single shift" applies only to the character that
follows it immediately, but single shift functions are only available for
the G2 and G3 character sets.  Locking shift functions remain in effect across
alphabet changes.

There are many possible ways to use these code extension facilities within
both 7-bit and 8-bit environments.  In any particular data transfer, the
facilities that are actually used can be announced using <ESC><SP>F, where the
possibilities for F are listed in ISO 2022.  This "announcer" escape sequence
should be sent at the beginning of the data transfer.  <ESC><SP>B means the G0
and G1 sets will be used, where <SI> invokes G0 in the left half, <SO>
invokes G1 in the left half.  <ESC><SP>C means the full 8-bit set shall be
used, with no shifting.  There are many other possibilities, which need not
concern us here.

ISO 8859 defines a series of 8-bit character sets.  In each of these, the left
half (G0) is the ISO 646 set, i.e. 7-bit ASCII.  Because of this, the left
half of any ISO 8859 character set may be used to represent English or any
other Latin-alphabet language that can make do without diacritical marks (e.g.
German without umlauts or ess-zet).

By convention, the G0 set can be selected with <ESC>(B.  When we say "by
convention" we mean that each of the ISO 8859 standards says to select the G0
set using this sequence, even if the G1 set is selected using some other
letter, like A, C, L, etc (see below).  Theoretically, <ESC>(A could also be
used to select the G0 set of "alphabet A", <ESC>(L could select the G0 set of
"alphabet L", etc.

Languages with special characters must use specific ISO 8859 G1 sets.  These
sets are specified (to date) in ISO 8859-1 through 8859-9:

ISO 8859-1 is Latin Alphabet No. 1.  The right half (G1) contains all the
  special characters needed for Dutch, Faeroese, Finnish, French, German,
  Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish, and Swedish.
  Select G1 with <ESC>-A. 

ISO 8859-2 is Latin Alphabet No. 2.  G1 contains special characters for
  Albanian, Czech, German, Hungarian, Polish, Romanian, Serbocroation, Slovak,
  and Slovene.  Select G1 with <ESC>-B.

ISO 8859-3 is Latin Alphabet No. 3, for Afrikaans, Catalan, Esperanto,
  French, Galician, German, Italian, Maltese, and Turkish.  Select G1 with
  <ESC>-C.

ISO 8859-4 is Latin Alphabet No. 4, for Danish, Estonian, Finnish, German,
  Greenlandic, Lappish, Latvian, Lithuanian, Norwegian, and Swedish.  Select G1
  with <ESC>-D.

ISO 8859-5 is the Latin/Cyrillic Alphabet, for Bulgarian, Byelorussian,
  Macedonian, Russian, Serbocroation, and Ukrainian.  Select G1 with <ESC>-L.
  (Comptible with USSR GOST Standard 19768-1987 and ECMA-113).

ISO 8859/6 is the Latin/Arabic Alphabet.  *** Selection sequence unknown ***.

ISO 8859-7 is the Latin/Greek Alphabet.  Select G1 with <ESC>-F.

ISO 8859-8 is the Latin/Hebrew Alphabet.  Select G1 with <ESC>-H.

ISO DIS 8859-9 is Latin alphabet No. 5, in which six Icelandic letters from
  Latin Alphabet No. 1 were replaced by 6 letters needed for Turkish.  Select
  G1 with <ESC>-M.

The alphabet selection escape sequences are registered in the International
Register of Coded Character Sets under the provisions of ISO 2375, "Data
Processing - Procedure for Registration of Escape Sequences".  The
registration authority is the ECMA, which periodically issues updates.  The
reference number for this register is ISBN 2-12-953907-0.

There may also be "private alphabets", such as those found on DEC terminals.
In the DEC environment only, these may be selected using escape sequences
listed in the DEC manuals, e.g. <ESC>)> to select the DEC Technical
94-character set and assign it to G1.

Alphabet Summary Table:

  Esc Seq   Alphabet Name             ISO Number     ECMA Number

  <ESC>(B   ASCII (ANSI X3.4-1986)    ISO 646        ECMA-6
  <ESC>-A   Latin Alphabet No. 1      ISO 8859-1     ECMA-94
  <ESC>-B   Latin Alphabet No. 2      ISO 8859-2     ECMA-94
  <ESC>-C   Latin Alphabet No. 3      ISO 8859-3     ECMA-94
  <ESC>-D   Latin Alphabet No. 4      ISO 8859-4     ECMA-94
  <ESC>-L   Latin/Cyrillic            ISO 8859-5     ECMA-113
  <ESC>-*** Latin/Arabic              ISO 8859-6     ECMA-114
  <ESC>-F   Latin/Greek               ISO 8859-7     ECMA-118
  <ESC>-H   Latin/Hebrew              ISO 8859-8     ECMA-121
  <ESC>-M   Latin Alphabet No. 5      ISO 8859-9     ECMA-128

*** Unassigned as of June 1986


KERMIT FILE TRANSFER

Different computer systems and software packages have different conventions
for representing, storing, and displaying mixed-alphabet textual data.  Such
data can be transferred in binary mode by Kermit, but it will only make sense
when transferred to a system that uses the same representational conventions.

To transfer mixed-alphabet textual data between systems that use different
conventions, a new mechanism is required.  Currently, Kermit defines the
"common intermediate representation", or "transfer syntax", for textual data
(before encoding) to be ASCII characters arranged in lines or records
terminated by ASCII Carriage Return and Linefeed (CRLF).  Henceforth, this
will be known as the Normal Kermit transfer syntax.

The extension proposed here will allow a Kermit program that has specific
knowledge of the local file format (or formats) for storing multilingual or
multi-alphabet text to translate between these system- and
application-specific formats and a new format to be used during file transfer.
This will be called ISO-2022 Kermit transfer syntax.  Like all extensions to
the original Kermit protocol, this will be an optional feature of any Kermit
program.


SELECTING ISO-2022 TRANSFER SYNTAX

The proposed extension to the Kermit protocol follows a subset of ISO 2022, in
which a single ISO-8859 alphabet, comprised of a C0, G0, C1, and G1 set, may
be active at one time, and in which:

 - the C1 and G1 sets are transmitted using ISO 2022's 7-bit code extension
   techniques,

 - escape sequences can be used to switch among different alphabets,

 - the C0, G0, and C1 sets are assumed to be identical for all alphabets.

Kermit's default transfer syntax is Normal.  Kermit's ISO-2022 transfer syntax
must therefore be enabled in some way, either automatically or explicitly by
the user.  In the automatic case, the Kermit program recognizes (somehow) that
it is to transfer a multi-alphabet text file.  In the manual case, the user
issues a SET command.

The sending Kermit may inform the receiving Kermit of the selected transfer
syntax by means of the Kermit File Attribute (A) packet, whose use is
negotiated in the Kermit Initialization exchange.  There is an attribute "*"
(ASCII 42) which represents "encoding", with values like "A" for Normal Kermit
ASCII encoding, "E" for EBCDIC (so far, never used).  The proposed new value
is "I8", for "ISO 8-bit character sets".  The receiver can agree to accept the
file or refuse it using Kermit's attribute reply mechanism.  If the receiver
does not do attribute packets, then the sender may still elect to send the
file (with a warning to the user), as either a binary file or an 8-bit text
file, for storing (and perhaps forwarding) purposes only.

It should also be possible for the user to select ISO-2022 transfer syntax
using an explicit SET command.  This command would have to be given to both
Kermits in order for the ISO transfer syntax to have its desired effect.  The
suggested command is:

  SET TRANSFER-SYNTAX ISO8

This denotes the use of ISO 8-bit alphabets.

(By the way, if the user gives this command to the sender, but not to the
receiver, then the received file will be stored in ISO 2022 format, with the
escape codes mixed with the file characters on disk; if Attribute packets are
not being used, then the receiver will get no warning).

The advantage of using Attribute packets is that the sending Kermit can
automatically inform the receiving Kermit of the file transfer syntax, so that
the user does not have to type a SET command to both Kermits.  On a computer
system where the Kermit program can recognize the attributes and encoding of a
file automatically, this mechanism will allow files of different types (text,
binary, multi-alphabet) to be sent together as a group, even between unlike
systems.  The drawback is that the attribute mechanism must be programmed into
a Kermit program that doesn't already have it.

There should be a way for the user to disable the use of ISO-2022 transfer
syntax.  The recommended command is SET TRANSFER-SYNTAX NORMAL.


DESCRIPTION OF ISO-2022 TRANSFER SYNTAX

Transfer of a multi-character-set text file in ISO-2022 transfer syntax is the
same as transfer of a 7-bit ASCII text file, except that it may contain
embedded escape sequences to switch between character sets.  The file sender
translates the file's characters (if necessary) into one or more selected ISO
8859 alphabets, and terminates lines of text (records) with CRLF.  The file
receiver translates from ISO-2022 transfer syntax into the format demanded by
the local system or application.  The current alphabet is designated by an
escape sequence, and locking shift functions switch between its G0 and G1 sets.

The mechanism described in ISO 646 for building composite graphic characters
by overprinting using Backspace or Carriage Return should not be used; this
practice is prohibited by ISO 8859.

ISO-2022 transfer syntax uses only 7-bit data.  If any character arrives
with its high-order (8th) bit set to one (after stripping of parity and Kermit
decoding), there has been an error.

ISO 2022 states that "at the beginning of information interchange, except
where the interchanging parties have agreed otherwise, all deisgnations shall
be defined by use of the appropriate escape sequences, and the shift status
shall be defined by the use of the appropriate locking-shift functions."
Kermit programs should "agree otherwise" that the default character set is the
US ASCII / ISO-646 / ECMA-6 7-bit set; thus ISO-2022 transfer syntax can be
identical to Normal Kermit transfer syntax when transferring 7-bit text files.
There is no default G1 set, in the interest of fairness to all countries and
peoples.

When the text contains characters outside the ASCII alphabet, an escape
sequence must be used to identify which other alphabet these characters belong
to.  This sequence is <ESC>-F, where F is the officially registered letter for
that alphabet, e.g.  A-D for Latin Alphabets 1-4, L for Cyrillic, etc.  This
sequence assigns the designated alphabet to the active G1 set.

The G1 set is transmitted in its 7-bit form to eliminate Kermit's 8th-bit
prefix overhead on 7-bit connections.  Once a G1 set is selected, it remains in
effect until another G1 set is selected.  Switching between the G0 (ASCII) set
and the G1 (extended) set is done using the ISO-2022 "locking shifts":

  SO (Ctrl-N) - select G1 (the extended set)
  SI (Ctrl-O) - select G0 (the ASCII set)

If a particular set is already invoked, use of the corresponding shift has no
effect.

During file transfer, an <ESC>-F or <ESC>)F sequence must be given before the
first occurance of an extended character from a 96-character or 94-character
set, respectively.  If no such sequence is given, then all characters are
treated as ASCII data, including <ESC>, <SI>, and <SO>.  In other words, the
file transfer behaves in the normal Kermit fashion for text files.

The C0 and C1 sets, i.e. the two sets of control characters, are not subject
to shifting.  Control characters from the C1 set must be transmitted using
2-character escape sequences, as described in ISO 2022: <ESC>@, <ESC>A,
<ESC>B, etc, stand for 10000000, 10000001, 10000010, etc (binary).  This
method results in less Kermit encoding overhead on 7-bit connections than
would sending these characters "bare" (which is not allowed).

All the escaping and shifting operations specified here take place before
normal Kermit packet encoding, and are subject to Kermit's control-character
and repeat-count prefixing.  For example, <ESC>-A<SO>x<SI>y becomes #$-A#Nx#Oy
according to Kermit's normal rules for control character prefixing.

ISO-2022 transfer syntax may be used in conjunction with even, odd, mark, or
space parity, or with no parity at all.  8-bit data is never transferred in
this mode, so 8th-bit prefixing will never occur.


ADDITIONAL ESCAPES

The preceding mode of operation is the one described in ISO-2022 under
"Announcer 4/2" for the 7-bit environment, which is selected by the escape
sequence <ESC><SP>B.  This means that the G0 and G1 sets are used, both in
their 7-bit forms, with <SO> and <SI> used to shift between them.  "Announcer
4/10" <ESC><SP>J specifies that a 7-bit code is used, even in an 8-bit
environment.  The use of 2-character escape sequences for C1 characters can be
announced using <ESC><SP>F (the "F" in this case is really an F).  For
clarity, these escape sequences may be sent at the beginning of the file
transfer, but they are not required.

Similarly, the ISO-2022 Coding Method Delimiter, <ESC>d, may be transmitted at
the end of the file, or at any point within the file after which this coding
method is no longer used.

Since ISO 8859 character sets are subject to revision from time to time, an
alphabet selector may be preceded by <ESC>&F, where F is the revision number
(@ = 1, A = 2, B = 3, etc).  For example, <ESC>&@<ESC>-A means Latin Alphabet
Number One, Revision One.


TRANSFER SYNTAX SUMMARY

All characters are 7-bit, all sequences are optional, except if an extended
alphabet is selected, <SI> and <SO> are required to shift between its G0 and
G1 sets.

Preamble:
  <ESC><SP>J<ESC><SP>B<ESC><SP>F (before first file characters):

    <ESC><SP>J - Using 7-bit code.
    <ESC><SP>B - Map both G0 and G1 into the left half.
    <ESC><SP>F - Using 2-character escape sequences for C1 set.

Alphabet selector:
  <ESC>(B<ESC>&@<ESC>-F (before first use of extended characters):

    <ESC>(B - Designate the normal (ASCII, ISO 646) G0 character set.
    <ESC>&@ - Specify the alphabet revision number, if any (@=1, A=2, etc)
    <ESC>-F - Designate the alphabet for G1 (substitute the appropriate F)

Alphabet shifts:
  <SO> - Select G1 set (extended characters)
  <SI> - Select G0 set (ASCII, ISO 646) (default)

Postamble:
  <ESC>d - Coding method delimiter (terminator), at end of file.


LOCAL FILE REPRESENTATION

This proposal assumes nothing about the representation of the file on the
local storage medium.  It may be ASCII, EBCDIC, a proprietary word processor
format, IBM code page, or anything else.  It is an implementation "detail" for
Kermit programmer to convert between the local file representation for
multi-alphabet text files, and Kermit's file transfer syntax.

In some cases, the file itself (or its directory entry) might contain the
necessary identifying information, in which case the sending Kermit program
can automatically emit the appropriate escape sequences during file transfer.
In others, the user will have to tell the sending program how the file is
encoded.  If file attribute packets are not used, the user will also have to
tell the receiving Kermit that the transfer syntax is ISO-2022, and in what
format to store the file upon receipt.

The suggested command is SET FILE TYPE <xxx>, where <xxx> specifies how the
file is (or when receiving, is to be) encoded on disk.  This will necessarily
be highly dependent on the system's conventions, or the conventions of the
applications to be used with the file (e.g. a multi-language word processing
program).  Possibilities for <xxx> might include application names like
WORDPERFECT, XYWRITE, NOTA-BENE, MACWRITE, or system-specific names like
IBM-CODE-PAGE-437 (the IBM PC US character set), IBM-CODE-PAGE-850
(multilingual), IBM-CODE-PAGE-865 (Norway), etc.

It may be that a file is encoded entirely in a single ISO-8859 alphabet, e.g.
Latin Alphabet No. 1, or Latin/Cyrillic, but the file itself contains no
information to that effect.  Therefore, it should be possible for the user to
specify the alphabet in the SET FILE TYPE command, where the possibilities
are:

  LATIN1-ISO8      ARABIC-ISO8
  LATIN2-ISO8      CYRILLIC-ISO8
  LATIN3-ISO8      GREEK-ISO8
  LATIN4-ISO8      HEBREW-ISO8
  LATIN5-ISO8

The part before the dash is the name of the alphabet, and the "-ISO8"
says that the alphabet belongs to the ISO family of 8-bit character sets.
This allows for the possibility of other encoding methods for the same
languages, e.g. GREEK-DEC, where the Greek letters are taken from the DEC
technical character set.

If the local file is not encoded according to ISO 2022 rules, it may contain
<ESC>, <SO>, and <SI> characters.  It is up to the Kermit program to know
what these characters mean in the context of the file's format, and to either
strip them from the file or translate them to something else.  The ISO 2022
rules forbid the use of these characters as data to be transferred.


SPECIAL EFFECTS

Today, most multi-alphabet files are produced by proprietary text processing
programs.  These programs have many functions besides switching among
alphabets.  They may also endow text with special attributes such as boldface,
italic, underline, super- or subscript, color, etc, and render characters in a
variety of type styles and sizes.  Each text processing program may have its
own unique formats and conventions.

These special effects are not addressed by this proposal.  Nevertheless, it is
likely that a multi-alphabet file produced by a text processing program also
contains special effects.  In order for a Kermit program to send a
multi-alphabet file, it must have detailed knowledge of the file's format and
coding conventions.  Therefore, the Kermit program should be able to strip out
the special effects, and send only the text.  Otherwise the result would be
meaningless when received on an unlike system or for use with a different
application.  (When transferring such files between like systems or compatible
applications, Kermit binary mode transfers will suffice.)

At some future time, it might be possible to adapt one of the popular document
description languages to Kermit, so that Kermit will be able to transfer
formatted documents between unlike systems and applications.  Presently, there
are many competing would-be standards inlcuding IBM DCA and DIA, DEC DDIF, US
Navy DIF, ISO ODA and ODIF, Postscript.  Kermit should wait for the dust to
settle and then pick a relatively simple, stable alternative.  (Comments
welcome!)


ARCHIVING

The Kermit protocol includes a so-far little-used archiving function.  In this
mode, Kermit stores incoming file data together with the attribute packets
that precede it, so that the file can be retrieved and reconstituted on
another system at a later time.  In archive mode, the alphabet escapes and
shifts should not be interpreted by the receiving Kermit, but simply stored as
data.


MULTIBYTE ALPHABETS

This proposal does not address alphabets such as Japanese, Chinese, and Korean
that do not fit into 8-bit character sets.  A new standard, ISO 10646, is in
preparation.  This standard will define a universal 3-byte character code to
cover all the world's written languages, providing for 1- and 2-byte shortcuts
within a given language environment.  All designation, invocation and shifting
as in ISO 2022 will be avoided.  When and if this standard becomes relatively
stable, it too can be added as a Kermit file transfer syntax option, perhaps
ISO24.

In the meantime, national versions of Kermit can (and do) use SET FILE TYPE
commands to identify the encoding or standard used for a multibyte alphabet.
For example, some Japanese Kermit programs have the command SET FILE TYPE
TEXT, BINARY, or KANJI, and add a further command to specify the local Kanji
encoding: SET KANJI VAX, JIS, or SHIFTJIS (JIS is the Japan Industrial
Standard, JIS X 0208; SHIFTJIS is JIS X 0202 which differs from JIS X 0208 by
the introduction of escape sequences to shift between Kanji and ASCII; VAX is
the encoding used on Japanese VAX/VMS systems).  These Kermit programs use
SHIFTJIS as the transfer syntax, and the Kermit program maps between it and
the local format, which may be VAX, JIS, or SHIFTJIS.  To better mesh with the
current proposal, however, these programs should make a distinction between
the file format and the transfer syntax by adding a command like SET
TRANSFER-SYNTAX SHIFTJIS.

In this connection, a "rider" to this proposal is that "JS" (for SHIFTJIS)
be added to the list of Kermit Kermit encodings under Attribute "*".
Designations for Chinese, Korean, and other multibyte-character-set languages
are welcome, as are alternative designations for Japanese.


TERMINAL EMULATION

While not part of the Kermit file transfer protocol, terminal emulation is a
feature of many Kermit programs.  It is hoped that these terminal emulators
will evolve along the lines of the ISO standards mentioned above.  In some
cases, this is already a fact, insofar as DEC VT200 and 300 series terminals
already follow these standards.

In this regard, it is important to note that not all languages are written
from left to right, top to bottom.  Hebrew and Arabic are two examples of
right-to-left languages, and Japanese and Chinese may be written top to
bottom.  The order of the text characters on disk or on the transmission line
do not necessarily reflect their order on the screen or the printed page.


FILE TRANSFER SYNTAX EXAMPLES

A simple 7-bit ASCII text file can be transmitted in the normal Kermit manner
for text files, without any escapes or shifts, even in ISO8 mode.

A text file containing characters from a language or languages covered by a
single ISO 8859 alphabet will require an <ESC>-F sequence to identify the
alphabet.  <SO> and <SI> are used to shift between the G0 and G1 sets.  The
following lines are all produce the same result:

  A dangerous German word is "gef<ESC>-A<SO>d<SI>hrlich".
  <ESC>-AA dangerous German word is "gef<SO>d<SI>hrlich".
  <ESC>-A<SI>A dangerous German word is "gef<SO>d<SI>hrlich".
  <ESC>&@<ESC>-A<ESC>(B<SI>A dangerous German word is "gef<SO>d<SI>hrlich".

In this case, the only extended character is the umlaut-a in "gefaehrlich"
(where ae is a way of writing umlaut-a without an umlaut).

For clarity and consistency with the ISO-2022 recommendations, the latter form
is preferred: the text begins with an announcement of the G0 and G1 sets in
use, including the version number, and then explicitly shifts into the G0 set,
rather than defaulting to it.  Similarly, use of the preamble at the beginning
of the file and the postamble at the end is also recommended.

A text file containing characters from multiple ISO 8859 alphabets requires an
<ESC>-F sequence to identify each alphabet.  SO and SI can be used to shift
between G0 and G1 of the current alphabet, and <ESC>(B can be used to select
G0 of any of the alphabets, since these are all the same.  For example, the
following text contains the same word in English, French, and Russian:

  <ESC>-A<SI>Disappointed, d<SO>ig<SI>u, <ESC>-L<SO>`PW^gP`^RP]]kY<SI>.

The first escape sequence assigns Latin Alphabet No. 1 to G1, and the
subsequent <SO> and <SI> shifts apply to its G0 and G1 set, which is used to
form the English and French words.  The second escape sequence assigns the
Latin/Cyrillic 96-character set to G1, and the subsequent shifts apply to this
new set.

A final example, in which the same word is repeated in English, Russian, and
German, shows how a locking shift remains in effect when the alphabet
is changed.  We begin in Latin/Cyrillic, start with an English word from G0,
shift to G1 for the Russian word, and while still in G1 switch to Latin
Alphabet No. 1 for German to get the umlaut-A at the beginning of Aenderung
(where Ae = umlaut-uppercase-A), and shift back to G0 for the rest of the word:

  <ESC>-LAlteration <SO>_U`UTU[ZP <ESC>-AD<SI>nderung.


PERFORMANCE

For each file, the preamble and postamble add from 0 to 11 characters.  There
are an additional 3 characters per alphabet change, for instance when
switching between Finnish and Russian, and an additional shift character for
every shift between G0 and G1, and finally 2-character escape sequences used
in place of the C1 control characters.

For files of any length at all, the preamble/postamble overhead is negligible.
It is recommended that the "ambles" be included for compatibility with other
ISO-2022-conformant applications.

The restriction of data to 7 bits during transmission should not incur a high
transmission penalty, since the locking shift mechanism will tend to add fewer
characters to the transmission stream than would 8th-bit prefixing of
characters from the G1 set (although in the worst case -- a file composed of
characters alternating between the G0 and G1 sets -- the overhead of shifting
would actually be higher).  The use of two-character escape sequences for the
C1 control set should also have small impact; the overhead will be the same as
for 8th-bit prefixing, but these characters should appear rarely in text
files.

Hence, the transmission overhead of ISO-2022 transfer syntax should not not be
significantly different from that of normal Kermit, and in some cases (e.g.
for texts completely in Russian, Greek, Hebrew, or Arabic) the overhead is far
lower.


WHERE TO GET STANDARDS

The ISO/ECMA standards discussed in this proposal may be obtained free of
charge in their ECMA form by writing to:

  ECMA Headquarters
  Rue du Rhone 114
  CH-1204 Geneva
  SWITZERLAND

Be sure to specify the title and the ECMA number of each standard requested.
We tried this ourselves, and got delivery within about two weeks.

ISO standards can also be ordered from the UN bookstore, but not for free:

  CCITT
  United Nations Bookstore
  United Nations Building
  New York, NY  10017

ANSI standards may be ordered, for a fee, from:

  Sales Department
  American National Standards Institute
  1430 Broadway
  New York, NY  10018


SUMMARY

We hope that this attempt to blend Kermit text file transfer with the ISO
international character set standards is in keeping with the intended use of
those standards.  Anyone who has can offer insights as to whether we are using
the standards appropriately is encouraged to comment.

 2-Mar-89 17:20:40-GMT,2351;000000000001
Return-Path: <fdc@cunixc.cc.columbia.edu>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA10260; Thu, 2 Mar 89 12:20:30 EST
Received: by cunixc.cc.columbia.edu (5.54/5.10) id AA01174; Thu, 2 Mar 89 12:16:20 EST
Date: Thu, 2 Mar 1989 12:16:19 EST
From: Frank da Cruz <fdc@cunixc.cc.columbia.edu>
To: Joe Doupnik <jrd@usu.bitnet>, Paul Placeway <paul@cis.ohio-state.edu>,
        Andre Pirard <A-PIRARD@bliulg11.bitnet>,
        Baruch Cochavy <baruchc@techunix.bitnet>,
        Johan Van Wingen <MOSGLA@hlerul2.bitnet>,
        Ken-ichiro Murakami <MURAKAMI%NTT-20.NTT.JP@relay.cs.net>,
        Kohichi Nishimoto <s153380%tkov02.DEC@decwrl.dec.com>,
        Hirofumi Fujii <KEIBUN@jpnkekvm.bitnet>,
        Gisbert W.Selke <RECK@dbnuama1.bitnet>,
        Kurt Enulf <UPSKE@seguc11.bitnet>,
        Jacob Palme <jacob_palme_qz@qzcom.bitnet>,
        Per Lindberg <Per_Lindberg_ZQ@qzcom.bitnet>,
        "Bj|rn Larsen" <x_larsen_b%use.uio.uninett@tor.nta.no>,
        "Hans A. ]lien" <hans%ifi.uio.no@tor.nta.no>,
        Kai U.Leppamaki <LK-KLE@finhut.bitnet>,
        Steve Jenkins <pdsoft%uk.ac.lancs.cent1@nss.cs.ucl.ac.uk>,
        Jean Dutertre <dutertre%padis1.DEC@decwrl.dec.com>,
        Gerard Gaye <GAYE@frsac11.bitnet>,
        David Guerlet <KERMIT@czheth5a.bitnet>,
        Bernie Eiben <eiben@tops20.dec.com>,
        Volker Edelhoff <edelhoff@unido.bitnet>
Subject: Kermit/ISO proposal
Cc: Christine M Gianone <cmg@cunixc.cc.columbia.edu>
Message-Id: <CMM.0.88.604862179.fdc@cunixc.cc.columbia.edu>

It occurs to me that since the proposal was sent from a brand-new computer,
some of you might not be able to reply to the message.  You can also mail
to us as cmg@cunixc.cc.columbia.edu and fdc@cunixc.cc.columbia.edu, or simply
(but less efficiently) as cmg@columbia.edu and fdc@columbia.edu.  And on
BITNET/EARN you can send direct to KERMIT@CUVMA or FDCCU@CUVMA.  If you don't
know what I'm talking about (i.e. if you didn't receive the proposal) please
let me know and I'll get it to you somehow.

Meanwhile, I'd also appreciate any comments on how it meshes with X.400 and
FTAM and other ISO application protocols in their current incarnations.  I
have some several-year-old drafts of these standards, and as far as I can
tell, the only character set they talk about is ISO 646.

Thanks!  - Frank

 3-Mar-89 19:59:09-GMT,5320;000000000011
Return-Path: <@cuvmb.cc.columbia.edu:A-PIRARD@BLIULG11.BITNET>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA01921; Fri, 3 Mar 89 14:59:04 EST
Message-Id: <8903031959.AA01921@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA27239; Fri, 3 Mar 89 14:58:52 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 3521; Fri, 03 Mar 89 14:55:23 EST
Received: from VM1.EARN-ULG.AC.BE by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with
 BSMTP id 1035; Fri, 03 Mar 89 14:55:21 EST
Received: by BLIULG11 (Mailer R2.02) id 8393; Fri, 03 Mar 89 18:53:57 +0100
Date:         Fri, 03 Mar 89 16:59:40 +0100
From: Andre' PIRARD <A-PIRARD%BLIULG11@cuvmb.cc.columbia.edu>
Subject:      Re: MacKermit and national characters
To: Paul Placeway <paul@cis.ohio-state.edu>
Cc: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Your message of Mon, 27 Feb 89 22:59:57 EST

Paul,

Well, for one thing we're not on strike, but there was a lot of talking.

About the "fonts", Frank is right. The 80-9F range (128-159, really)
is "forbidden" in ISO. I suspect a reason is they would map to control
characters when sent on 7-bit lines and could upset some nodes.
About switching G sets, I include below a message that once appeared
on the ISO8859 list.
I still wonder how multiple fonts are used by MacKermit. Are they in the
code or does MacKermit refer to external font files?
Please pardon my ignorance about MacIntosh internals.

As to the keyboard, one thing I should explain is that our keyboards are
the other way round. We key in some accented symbols directly, form others
with dead keys overstrikes, but are missing the @ and some others of ASCII.
But these missing symbols are still available thru the Alt key (the one
next to the Apple one, with some kind of sleigh on it) combined with many
keys yielding apparently the whole sets of Apple characters. I expect the
same should hold for US keyboards with slightly different assignments.
In that case, we shouldn't touch the user interface by changing
a well defined Apple convention and the best is to translate them to ISO
to keep keyboard independence.  We can leave the Apple-missing characters
to the user's taste by using the keyboard macros that would have to yield
the ISO codes I guess.
Any key code readily comes to Kermit in a keyboard independent way since
Mathias'es version, so we get those missing ASCII characters back to live.
But even in "no parity" mode I can't get the special characters be echoed
to the screen. They get the bell sound. I think the problem is being 8-bit.

In all but "no-parity" mode, the final operation would be converting
A0-FF to 20-7F and imbed it between SO/SI for sending to the line.
One thing to be careful is how the keyboard is echoed to the screen
(although it must not be a very frequent mode of operation, yet is still
useful for checking).
Tell me if I can be of some help.
As soon as you get SO/SI working, I can undertake live tests.

The network has been bad this last week.
My latest notes from you are of the 27th + Frank's reply.

Andr).

Date:         Fri, 27 May 88 18:35:47 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859@JHUVM>
Sender:       ASCII/EBCDIC character set related issues <ISO8859@JHUVM>
From:         John Kesich <KESICH@NYUCIMSA>
Subject:      Re: Extended ASCII with Kermit
To:           Andre' Pirard <A-PIRARD@BLIULG11>
In-Reply-To:  Message of Fri, 27 May 88 14:44:05 +0200 from <A-PIRARD@BLIULG11>

>From my reading of ISO's 646, 2022, 4873, 8859-1 & 8859-2 I have come
to the conclusion that there is a fairly widespread misunderstanding of
ISO8859.  If I'm the one who has misunderstood I hope someone will take
the trouble to correct me.
People seem to think that you pick one of the ISO8859-x sets and then
those 256 characters are the only ones used.  However, ISO's 2022 & 4873
define a number of escape sequences for switching among different
versions (as they term character sets which conform to the standards).
What this means is that simple translation table mappings are not enough
to translate ISO to other code sets, one must also change translation
tables 'on the fly' as the escape sequences are encountered.  A somewhat
simplified example may help to illustrate the problem:

data stream
(ISO notation)   hex       comments
--------------   ---       --------
ESC 02/00 04/12  1B 20 4C  select level 1 of ISO4873
ESC 02/13 04/01  1B 2D 41  designate (and invoke) ISO8859-1's G1 set
12/00            C0        1st 'real' character - capital A, grave accent
ESC 02/13 04/02  1B 2D 42  designate (and invoke) ISO8859-2's G1 set
12/00            C0        2nd 'real' character - capital R, grave accent

Does an implementation which uses a single set of ISO8859-x characters
conform to the standard?
Even if it does, would it make any sense to standardize on a particular
ISO8859-x to the exclusion of others?
Finally, if one were to do so, how would the 2 character text in my
example be transmitted?

Any implementation which doesn't include the ISO escape sequences will
eventually have to incorporate some such mechanism.  I think the ISO
escape sequences should be a part of any standard which is adopted.

 5-Mar-89  6:38:30-GMT,6160;000000000411
Return-Path: <MAILER@cuvmb.cc.columbia.edu>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA18531; Sun, 5 Mar 89 01:38:28 EST
Resent-Message-Id: <8903050638.AA18531@watsun.cc.columbia.edu>
Message-Id: <8903050638.AA18531@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA08526; Sun, 5 Mar 89 01:38:20 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 4127; Sun, 05 Mar 89 01:34:46 EST
Received: by CUVMB (Mailer X1.25) id 3524; Sun, 05 Mar 89 01:34:45 EST
Received: from JPNKEKVM by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 3523; Sun, 05 Mar 89 01:34:44 EST
Received: by JPNKEKVM (Mailer R2.02) id 2408; Sun, 05 Mar 89 15:37:52 JST
Date:         SUN, 05 MAR 89 15:35:06 JST
From: Hirofumi Fujii <KEIBUN%JPNKEKVM@cuvmb.cc.columbia.edu>
Subject:      RE:Kermit/ISO proposal
To: Frank da Cruz <FDCCU@cuvmb.cc.columbia.edu>, Joe Doupnik <JRD@usu>,
        Ken-Ichirou Murakami <murakami%ntt-20.ntt.jp@jpntsuku>,
        Hirohide Mikami <mikami%ntt-20.ntt.jp@jpntsuku>,
        Hirohide Mikami <mikami%Scandium.NTT.junet@jpntsuku>,
        Masamichi Ute <UTE@jpnsut30>
Resent-Date: Sun, 05 Mar 89 01:34:44 EST
Resent-From: Network Mailer <MAILER@cuvmb.cc.columbia.edu>
Resent-To: fdc@cunixc.cc.columbia.edu

Dear Frank,

Thank you very much for informing us about the proposal of Kermit extension.

1. My understanding of your proposal is like following figures; Is it
   correct ?  If it is correct, I agree with you.

     +------------------------[ Sender machine ]--------------------------+
     |                                                                    |
     |  Read file in internal (local) representation                      |
     |       |                                                            |
     |       v                                                            |
     |  Internal-to-ISO converter (machine dependent)                     |
     |  ( ESC-sequnce + GL/GR character sets according to the ISO-2022 )  |
     |       |                                                            |
     |       v                                                            |
     |  Traditional Kermit SEND routine                                   |
     +--------------------------------------------------------------------+
             ||
    ( communication line )
             ||
             \/
     +--------------------------------------------------------------------+
     |  Traditional Kermit RECEIVE routine                                |
     |       |                                                            |
     |       v                                                            |
     |  ISO-to-internal converter (machine dependent)                     |
     |  ( Interpret the ISO-2022 ESC-sequnce )                            |
     |       |                                                            |
     |       v                                                            |
     |  Write file in internal (local) representation                     |
     |                                                                    |
     +-----------------------[ Receiver machine ]-------------------------+

2.About the Japanese character sets
  Your description of MULTIBYTE ALPHABETS is not correct.
(1) SHIFTJIS is NOT the Japanese standard (the name is quite misleading).
    It is the internal code of the Japanese MS-DOS like EBCDIC.
(2) JIS X 0202 and X 0208 are diffrent kind of standards.
    The title of the JIS X 0202 is
      "Code Extension Techniques for Use with the Code for
       Information Interchange",
    and of the JIS X 0208 is
      "Code of the Japanese Graphic Character Set for Information
       Interchange".

    JIS X 0202 corresponds to the ISO-2022.
    JIS X 0208 is the table of the code and its graphical representation
    (like ASCII table).  This is so called JIS-code table.
(3) It is possible to send Kanji file within the ISO-2022 scheme (therefore,
    it is not necessary to prepare some attribute like 'JS' for Japanese
    character sets).  JIS X 0202 (I am not sure the followings are ISO-2022
    or not) defines

    <ESC>$F  and <ESC>$,F   designates multi-byte character set "F" to G0
    <ESC>$)F and <ESC>$-F   designates multi-byte character set "F" to G1
    <ESC>$*F and <ESC>$.F   designates multi-byte character set "F" to G2
    <ESC>$+F and <ESC>$/F   designates multi-byte character set "F" to G3

    and Invocation of these character sets to GL or GR is the same as
    ISO-2022 (includeing sigle- and locking-shifts).

    JIS X 0208 is the 2-byte character set for Japanese (Symbol:147+
    Number:10+Roman:52+Hirakana:83+Katakana:86+Greek:48+Russian:66+
    Kanji:6353+Rule:32 characters!).  The above "F" for JIS X 0208
    character set is assigned to "B(4/2)"  (I am not sure it is ISO-registerd
    or not, but it is described in JIS X 0208).  For example,
      <ESC>$B
    designates JIS X 0208 character set to G0.

    Therefore, we can send Kanji file using this scheme; for example,
    send Kanji-file from MS-DOS machine (SHIFTJIS) to VAX (DEC-KANJI),

        read file in SHIFTJIS
        convert SHIFTJIS to JIS X 0202 form
        send packet
           |
           v
        receive packet
        convert JIS X 0202 form to DEC-KANJI
        write file

    And I think this is compatible with your proposal if my understang
    is correct.
(4) I don't know about the new standard, ISO 10646.  However, many of the
    Japanese people have already used above method (by hand). So, I think
    we are very happy if the above scheme are included to the Kermit.

                                                 05-Mar-1989
                                                Hirofumi Fujii
                                   Natinal Lab. for High-energy Physics (KEK)
                                                   JAPAN
                                         KEIBUN@JPNKEKVM.BITNET
                                         KEKVAX::KEIBUN (HEPNET)

 6-Mar-89  4:17:33-GMT,7127;000000000001
Return-Path: <JRD@cc.usu.edu>
Received: from cc.usu.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA27782; Sun, 5 Mar 89 23:17:15 EST
Message-Id: <8903060417.AA27782@watsun.cc.columbia.edu>
Date: Sun, 5 Mar 89 21:16 MDT
From: Joe Doupnik <JRD@cc.usu.edu>
Subject: Reply.ISO-2022 (pretty neat)
To: fdc@watsun.cc.columbia.edu
X-Vms-To: IN%"fdc@watsun.cc.columbia.edu",JRD

Frank and the group,

	The description of ISO 2022 appears to be clear enough to use, and
I appreciate being part of the initial discussion group. I would like to add
some small comments to the discussion, however.

        It appears that the ISO suggestion is directed at two objectives,
terminal emulation and file transfer. Set aside terminal emulation for a
moment since it is well adapted to this method, and let's examine the file
transfer case.

Background
----------

	At the bottom of the protocol stack Kermit implements some encoding
techniques such as control quoting, eight bit prefixing, run length encoding,
and so forth. These are intended to be transparent operations to allieviate
many communicaitons channel difficulties. They are understood only by matching
Kermits and operate on an unstructured stream of characters; they are Kermit
to Kermit protocol items.

	Above that we have the highly useful but nevertheless sticky area
of using CR/LF in packets to indicate file system record delimiters. Who
knows what a record is? The best we can do is cope with the two common display
control commands, CR and LF, and use CR/LF in the character stream where the
local operating system would do an equivalent if displaying the stream on a
very simple terminal device. We even treat Horizontal Tabs as literal text.

        The CR/LF item means two things to me. First, we are transferring
flat files, sequentially. Second, we are attempting to map an important piece
of file system architecture from one side to the another via an in-line
message (CR/LF). Thus, CR/LF is not literal file data unless we force it to be
so by blinding the receiver. This is a file system to file system message
about record demarkation.

	We recognize that much work needs to be done to work with files
which are not simple sequential objects. This includes indexed files, file
descriptor blocks, resource forks, and other structured or multicomponent
"objects". And we wish to be careful in distinguishing the object itself from
access methods. I think that even the most elaborate object can be reduced
to one or more flat files, with reconstruction rules attached, since we do
just that when making backups and patching disks. Reconstruction might not
be much fun, but it is possible.

Commentary
----------

      Now we come to the ISO parts of the proposal. It is suggested that we
use the ISO 2022 conventions to encode the contents of files. I interpret this
to mean that a local Kermit needs to understand the contents or "meaning" of a
file (as distinct from the file system architecture). The contents are
controlled by applications programs rather than being firmly rooted in the
host file system design. This is really an applications program to
applications program protocol, or in ISO networking terms a Presentation Layer
service.  In the case at hand the destination application is a model terminal.
Again, the two sides must cooperate to achieve error free transmission and
that requires a decoder to match (i.e., understand) the encoder. Existing
Kermit negotiation mechanisms can easily achieve that match, though some short
replies from Kermit Servers are outside files and cannot be encoded this way.

        It also means there are major difficulties with Kermit discovering how
to interpret the file/object without help from either the user or hopefully
from the host's file system. The SET FILE TYPE command illustrates the point.
What the proposal is saying is that each Kermit would have a set of filter
procedures to understand the file's contents and communicate the information
in-line via ISO 2022 conventions.  Needless to say, word processor formats,
spread sheets, and other popular applications program data are not 100%
convertible between systems (and barely from version to version of the same
program on the same machine). Ref: SPECIAL EFFECTS section of the proposal.
My view is Kermit ought to stay well away from making any selection of
"useful" versus "special effects" material in such files.

        Underlying the whole ISO 2022 discussion is the concept of displayable
characters, with no reference to file systems. It is a terminal communications
mechanism: a sequential stream of characters, with hints to the display
hardware to select the symbols shown for given (heavily overloaded) data
codes. It presumes the capabilities of the display hardware to provide those
symbols and to position them appropriately. Thus for terminal emulation I
strongly support the ISO 2022 convention, and follow ons, in Kermits.

        Regarding ISO 2022 and file transfer I think that the two are not
necessarily related. For example, a text file composed in two or more mixed
languages needs internal codes to indicate how data items are to be displayed;
the editor or other applications program uses some kind of system for this.
If the system is private then it becomes burdensome for production Kermits to
support that filter. The filter would best be done as either a stand alone
program or a special loadable filter to Kermit (not an easy thing to
accomplish if the filter needs to recognize strings of characters at a time). 
Such a filter would be like <your favorite word processor> to ISO 2022 and the
matching ISO 2022 to display at the other end, but word processors use far
more elaborate display formatting methods than ISO 2022 understands since the
target is usually not a simple character based terminal.

        Matters make sense when the file is homogeneous, say ISO 2022, or
Kanji, or Spanish.  In principle no filter is needed for the communications
channel aside from squeezing 8-bit data into 7-bit form via shift locks. (I
omit consideration of non-8-bit systems). Stand alone filter programs could
convert local forms to one more generally understood, and back again on the
other side and do so for even complicated multi-character representations.  To
me this means in-built filters are tailored to specific kinds of homogeneous
files and a single Kermit implementation has either a small embedded set
and/or a convenient way to load new procedures at run time (I vote for both).
Loadable ones allow local enhancements and are the pathway to transferring
hetrogeneous documents, even though the implementation details will be a
real headache if it's code rather than a data table.

        In summary, I think the ISO 2022 approach has much merit and I support
it. At the same time we should be sensitive to the fact that we are discussing
some, and only some, terminal based display attributes but not file system
differences nor, for the most part, applications programs.

	This is not much to add to the discussion really, except I think it
is a good concept in its area.

	Joe Doupnik


 6-Mar-89 13:57:25-GMT,2024;000000000001
Return-Path: <MAILER@cuvmb.cc.columbia.edu>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA01156; Mon, 6 Mar 89 08:57:17 EST
Resent-Message-Id: <8903061357.AA01156@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA05330; Mon, 6 Mar 89 08:56:21 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 4307; Mon, 06 Mar 89 08:53:32 EST
Received: by CUVMB (Mailer X1.25) id 4208; Mon, 06 Mar 89 08:53:31 EST
Received: from CUVMB by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 4207; Mon, 06 Mar 89 08:53:30 EST
Received: from watsun by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with TCP; Mon, 06
 Mar 89 08:53:29 EST
Received: by watsun (4.0/SMI-4.0)
        id AA01110; Mon, 6 Mar 89 08:55:48 EST
Date: Mon, 6 Mar 1989 8:52:47 EST
From: Frank da Cruz <fdc%watsun.cc.columbia.edu@cuvmb.cc.columbia.edu>
To: Hirofumi Fujii <KEIBUN%JPNKEKVM@cuvmb.cc.columbia.edu>
Cc: Frank da Cruz <FDCCU@cuvmb.cc.columbia.edu>, Joe Doupnik <JRD@usu.>,
        Ken-Ichirou Murakami <murakami%ntt-20.ntt.jp@jpntsuku.>,
        Hirohide Mikami <mikami%ntt-20.ntt.jp@jpntsuku.>,
        Hirohide Mikami <mikami%Scandium.NTT.junet@jpntsuku.>,
        Masamichi Ute <UTE@jpnsut30.>,
        Christine M Gianone <cmg@cunixc.cc.columbia.edu>
Subject: RE:Kermit/ISO proposal
In-Reply-To: Your message of SUN, 05 MAR 89 15:35:06 JST
Message-Id: <CMM.0.88.605195567.fdc@watsun.cc.columbia.edu>
Resent-Date: Mon, 06 Mar 89 08:53:30 EST
Resent-From: Network Mailer <MAILER@cuvmb.cc.columbia.edu>
Resent-To: fdc@cunixc.cc.columbia.edu

Thanks very much for your explanation of the Japanese standards.  Your
understanding of our proposal was completely correct, and we're very glad
to see that the same mechanism can be used for the present-day Japanese
codes.  We will change our proposal to reflect what you have said.  Too bad
we didn't get to meet you when we were in Japan.  Thanks again!  - Frank

 6-Mar-89 22:41:21-GMT,2987;000000000001
Return-Path: <fdc>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA05998; Mon, 6 Mar 89 17:41:18 EST
Date: Mon, 6 Mar 1989 17:41:18 EST
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: Joe Doupnik <jrd@usu.bitnet>
Subject: Proposal
Message-Id: <CMM.0.88.605227278.fdc@watsun.cc.columbia.edu>

Joe, thanks for the comments.  I take it the main discomfort you have with the
proposal is that a terminal-oriented mechanism is being bent into a file
transfer application.  And you're right.  And as you point out, Kermit has
always done that (with the CRLF line terminators, etc).  So the proposal
carries along shamelessly in this tradition.  Anyway, if you want to be able
to unambiguously flag different alphabets in a file that's being transferred,
what other mechanism is there?  I don't think there is one, except maybe
certain proprietary schemes cooked up by Xerox, etc.

It's also true that by following the terminal model, we seem to restrict
ourselves to flat, sequential files.  Simple Kermit programs have never
claimed to be able to transfer anything else.  There is some
not-very-well-thought-out mumbling in the Attributes section of KtB intended
to address this problem (it doesn't, really).

Do you think Kermit -- or any other nonproprietary file transfer protocol --
will ever be able to handle complex record-oriented files (e.g. ISAM or other
kinds of databases containing strings, integers, floats, bit flags, etc)
between unlike systems, without resorting to tricks (like Kermit-11 sending
the FAB in the Attribute packet)?  I doubt that even ISO FTAM with full-blown
ASN.1 encoding could do it (I may be wrong).  Even if it could, it probably
should not be a goal of Kermit to do everything that FTAM can do (even though
it can pretty much do everything that FTP can!).

So anyway, insofar as the proposal confines itself to TEXT (and it does),
we're in pretty good shape.  As to implementation -- we have the choice of
putting the conversions into Kermit (either statically or dynamically) or
forcing the user to run pre- and postprocessors.  Naturally, I'd rather see
Kermit do the work whenever possible to save users headaches, confusion, and
disk space.  The question is, how hard will it be on the programmer?  Probably
MS-DOS is an extreme case, with hundreds of mutually-incompatible word
processors, every user of each clamoring for JRD to put support into
MS-Kermit.  At the other extreme are the national versions of VMS, where all
files are encoded in a single ISO 8859 alphabet (e.g. Roman/Hebrew).

What I hope will come out of this is some incentive for makers of
multi-alphabet software to get together and come up with some common file
formats, preferably in line with existing standards.  Speaking of which, do
you have the name and number of the Wordperfect guy I passed along to you?
Maybe I can grill him about Wordperfect formats to get a better idea of what
goes on in a typical "real-world" application...

Thanks again!  - Frank

 8-Mar-89  3:24:57-GMT,1064;000000000411
Return-Path: <MURAKAMI@ntt-20.ntt.jp>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA20052; Tue, 7 Mar 89 22:24:55 EST
Received: from ntt-sh.ntt.jp ([129.60.57.1]) by cunixc.cc.columbia.edu (5.54/5.10) id AA25613; Tue, 7 Mar 89 22:23:01 EST
Received: by ntt-sh.ntt.jp (3.2/ntt-sh-03c) with TCP; Wed, 8 Mar 89 12:24:22 JST
Date: Wed, 8 Mar 89 12:23:03 I
From: ken-ichiro murakami <MURAKAMI@ntt-20.ntt.jp>
Subject: Re: ISO Kermit Proposal
To: fdc@watsun.cc.columbia.edu
In-Reply-To: <CMM.0.88.605325018.fdc@watsun.cc.columbia.edu>
Message-Id: <12476256355.16.MURAKAMI@NTT-20.NTT.JP>

Hi Frank!

    I've got the messages from you and from Dr.Fujii. Now, we, DECUS
Japan, are considering to have a meeting to discuss about your
proposal. Many users might have their opinion as well as Dr.Fujii. So,
Mr.Nishimoto at DECUS Japan is preparing to send postal mail to Kermit
users. The meeting will be held in the last week on March. Would you
please wait for the result? I've already send a mail to Dr.Fujii about
it. 

-Ken
-------

 8-Mar-89 15:22:28-GMT,1288;000000000001
Return-Path: <fdc>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA24513; Wed, 8 Mar 89 10:22:24 EST
Date: Wed, 8 Mar 1989 10:22:23 EST
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: Andre' PIRARD <A-PIRARD%BLIULG11@cuvmb.cc.columbia.edu>
Subject: Re: Kermit International Character Set Proposal
In-Reply-To: Your message of Wed, 08 Mar 89 13:16:05 +0100
Message-Id: <CMM.0.88.605373743.fdc@watsun.cc.columbia.edu>

For now, I think the discussion group should be confined to the people
in the message header, except for the Finnish guy, who apparently has
disappeared.  If you "reply all" that should do the trick.  If the discussion
becomes lively and detailed, then maybe I'll set up a mailing list.  I've
already received two substantive replies.  One from Joe Doupnik (MS-Kermit)
who likes it, but says we should stress that we're still following the
terminal model -- all these escape sequences are really designed for host-
terminal interaction (but what else is there that we can apply to the
problem at hand?).  The other from the people in Japan, who told me that
the ISO 2022 scheme also applies to Japanese codes, and they don't need to
be a special case -- just a multibyte application of ISO 2022 and 4873.  So
far nothing back from anyone else yet.  - Frank

 8-Mar-89 23:13:42-GMT,2877;000000000011
Return-Path: <@cuvmb.cc.columbia.edu:PEPMNT@CFAAMP.BITNET>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA28557; Wed, 8 Mar 89 18:13:41 EST
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA16757; Wed, 8 Mar 89 18:11:43 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 5586; Wed, 08 Mar 89 18:09:50 EST
Received: from CFAAMP.BITNET (PEPMNT) by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25)
 with BSMTP id 8824; Wed, 08 Mar 89 18:09:49 EST
Date: Wed, 1989 Mar 8   14:48:24 EST
From: (John F. Chandler)   PEPMNT@cfaamp.bitnet
To: (Frank da Cruz)   fdc@cunixc.cc.columbia.edu
Subject: Kermit alphabets
Message-Id: <PEPMNT.890308.144824.B0@CFAAMP.BITNET>

Frank,
   I have read through the draft protocol extension, and I have a few
comments.

1. It wasn't clear until rather late in the document that the proposal
   was for pure 7-bit transmission.  Since my thinking about character
   sets had been mostly in terms of 256-to-256 translation tables, I
   kept looking for 8-bit features.

2. In the section "DESCRIPTION OF IS0-2022 TRANSFER SYNTAX" at line 437
   it says:

to shifting.  Control characters from the C1 set must be transmitted using
2-character escape sequences, as described in ISO 2022: <ESC>@, <ESC>A,
<ESC>B, etc, stand for 10000000, 10000001, 10000010, etc (binary).  This
method results in less Kermit encoding overhead on 7-bit connections than
would sending these characters "bare" (which is not allowed).

   However, the overhead should be the same, since <ESC> gets encoded to
two characters.  By the way, the next paragraph gives the encoding for
<ESC> as #$, rather than #[.

3. In the 3rd paragraph of that section, by the way, I would say
   "parity stripping", instead of "stripping of parity" -- it's a
   matter of style and also to avoid the impression that you meant to
   say "stripping off".

4. Line 617 (in the section "MULTIBYTE ALPHABETS") contains the phrase
   "Kermit Kermit encodings" -- should that be "Kermit alphabet
   encodings" or perhaps just "encodings"?

5. I presume the very last paragraph ("SUMMARY") will be dropped from
   the ultimate draft.  If not, the last sentence should be amended
   from "Anyone who has can offer" by dropping the "has".

6. The section "ADDITIONAL ESCAPES" is a little unclear.  In view of the
   quote above, which says 2-byte escapes *must* be used for C1, it seems
   superfluous to require <ESC><SP>F -- presumably one or the other of
   these statements is incorrect.  Perhaps the description of what *may*
   be sent should be more of a *prescription* -- in particular, I think
   <ESC>d should not be permitted anywhere except at the end of a file
   and should be required there (in Kermit transfers, that is).
                                         John

 9-Mar-89  1:20:40-GMT,1818;000000000001
Return-Path: <fdc>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA29372; Wed, 8 Mar 89 20:20:29 EST
Date: Wed, 8 Mar 1989 20:20:28 EST
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
In-Reply-To: Your message of 03/08 19:21:24
Cc: Christine M Gianone <cmg@cunixc.cc.columbia.edu>
To: Gisbert W.Selke <RECK@dbnuama1.bitnet>
Subject: Re: ISO / Kermit Proposal
Message-Id: <CMM.0.88.605409628.fdc@watsun.cc.columbia.edu>

Actually, we have a vested interest in keeping the world from blowing itself
up, so any little bit we can do to help people people and nations communicate
with each other...  Also (let's be honest) maybe we'll get some more trips out
of it...

Your specific comments were very good.  We're not sure what to do about the
overloaded ISO-646 characters...  Maybe there's a list somewhere of what
"national" characters are used in these positions in each country, so that the
ISO-8859 equivalents can be identified.  And yes, the problem of
non-language-related escapes within multilanguage files is a conundrum.
Presumably, it "shouldn't happen", but in real life, who knows?

We agree about the silly single character shifts.  In this case Western
Europeans lose (having to shift between ASCII and special characters all the
time), whereas Russians, Israelis, Greeks, and Arabs win -- they can stay in
the "right half" all the time.  The shifts could be avoided by using all 8
bits, but then we'd get even more overhead on 7-bit connections due to
Kermit's 8th-bit-prefixing mechanism.

Extended negotiations problably won't happen.  A single, two-sided exchange is
imbedded too deeply in the protocol.  On some level, the user is simply going
to have to know something about the alphabet codes in use, and which ones are
supported by the Kermit programs.

Thanks again!  - Chris and Frank

 9-Mar-89  2:04:17-GMT,2289;000000000001
Return-Path: <FDCCU@cuvmb.cc.columbia.edu>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA29634; Wed, 8 Mar 89 21:04:16 EST
Message-Id: <8903090204.AA29634@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA05259; Wed, 8 Mar 89 21:02:16 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 5657; Wed, 08 Mar 89 21:00:26 EST
Received: by CUVMB (Mailer X1.25) id 9131; Wed, 08 Mar 89 21:00:25 EST
Date: 03/08 20:44:16
From: FDCCU@cuvmb.cc.columbia.edu
Subject: RECK NOTE - PUN file from RSCS
To: FDC@cunixc.cc.columbia.edu
Reply-To: RSCS@cuvmb.cc.columbia.edu

Date: 9 March 1989, 02:29:11 SET
From: Gisbert W.Selke           +49 228 225888       <RECK@DBNUAMA1.BITNET>
To:   FDCCU at CUVMA
Re:   Caught in the act

Frank and Chris,
                 Yes, I'm quite sympathetic to that healthy blend of
philantropism and hedonism that drives you in the making of Kermit...

A few more quick comments on comments on comments:
(i) In Germany, the ISO 646 standard is quite rigorously the following:
left square bracket: A umlaut        left curly brace: a umlaut
right "       "    : U umlaut        right "      "  : u umlaut
backslash          : O umlaut        vertical bar    : o umlaut
tilde              : ess-zet
So, even if you can't tell from looking at the file itself, a semi-
knowledgeable user will know which overloading is used. (Or am I being too
optimistic?) Currently, I am using a filter when going from the PC
(IBM extended ASCII) to the host (ISO 646) or vice versa.
(ii) Other-purpose escape sequences: at least on the PC, it's a common thing
to have, say, batch files, or user screens, using ANSI features (colouring,
highlighting, positioning,...) *and* umlaute. So it would of course be nice
to have this catered for, but we've been living without. - There should be
some warning, though, if extraneous escape sequences are encountered in
an ISO file, instead of stealthily doing some wild alphabet switching.
(iii) Shiftin/out: why not use 8-bit data where available? Binaries work like
that, too.
(iv) Extended negotiation: I agree... if it ain't broken, don't fix it.

On my way to extended feasting,

\Gisbert

10-Mar-89 14:04:23-GMT,3706;000000000611
Return-Path: <FDCCU@cuvmb.cc.columbia.edu>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA15480; Fri, 10 Mar 89 09:04:21 EST
Message-Id: <8903101404.AA15480@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA18990; Fri, 10 Mar 89 09:01:24 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 6202; Fri, 10 Mar 89 09:00:27 EST
Received: by CUVMB (Mailer X1.25) id 1342; Fri, 10 Mar 89 09:00:25 EST
Date: 03/10 08:51:16
From: FDCCU@cuvmb.cc.columbia.edu
Subject: PUN file from RSCS - MOSGLA.MAIL
X-Tag: FILE (9356) ORIGIN HLERUL2  MAILER    3/10/89  3:52:41 E.S.T.
To: FDC@cunixc.cc.columbia.edu
Reply-To: MAILER%HLERUL2@cuvmb.cc.columbia.edu

Date:    Fri, 10 Mar 89 13:42 CET
From:    "Johan van Wingen"                          <MOSGLA@HLERUL2>
To:      "F. da Cruz"                         <FDCCU@CUVMA>
Subject: Kermit/ISO


Dear Christine and Frank
I read your proposal with great interest, although I am not a Kermit,
nor even a network expert. Congratulations with your tutorial on ISO
standards, parts of which I would like to copy in my own documents in
the future (source stated of course). My present comments are very
provisional, and do not cover the Kermit part in detail.

ISO is strictly Int. Organization for Standardization.

ISO 4873 contains an important feature: "levels". With Level 1 no shifts
are allowed, with Level 2 only single shifts, with Level 3 all the rest.
(I keep the documents at home, not here, so I cannot quote literally
now.) Thus the generality of ISO 2022 is somewhat restricted here. It
must be said that there are no known implementations of ISO 2022 in data
processing whatsoever, so one should be careful not to raise too high
expectations of its use.  But it is good as a specification method.
It still remains one of the most impenetrable ISO standards.

It is not so much ISO 8859 that defines a series of 8-bit character
sets, but ISO 4873. The left half will become after revision identical
to ASCII, that is ISO 646 International Reference Version Revised (not
plain ISO 646). There is a ISO-XYZ in development where switching the
right hand part is defined.

Another standard is now being proposed, a non-extensible 8-bit set with
NULL, ESC and 254 other characters, graphic or control. I submitted a
first draft, a copy of which I may send you. It includes only HT (tab),
CR and LF. Both West and East Europe is covered, and Turkish.

As for Kermit I think it important to indicate the Level of ISO 4873
used. Most environments do not allow midstream code table switching, and
it is only fair to tell when that is not intended. For program texts
only Level 1 will be permitted.

There is a strong tendency with SC2, and still more with SC22, to do
away all 7-bit processing. Then ISO 646 will only be kept for the CCITT
Telematic services.

At the meeting of the SC22 Ad Hoc Group on Character Handling in
Programming Languages earlier this week in Paris, there was a strong
request for giving names to specific coded character sets, like LATIN1,
LATIN2. This could also be used for SET FILE TYPE, after it has been
standardized (by SC2).

As for document description languages there is now Standard Generalized
Mark-up Language (SGML) for which there is an ISO standard, and a very
active users group. No characters other than found in ISO standards are
used. Not even adapting Kermit will be required for its use in
transferring files.

Be sure this is not meant to be my final and last reaction.

FROM  J. W. van Wingen    MOSGLA@HLERUL2.BITNET
Mail to
P. O. Box 486,  2300AL Leiden, Netherlands

13-Mar-89 13:04:31-GMT,4408;000000000011
Return-Path: <FDCCU@cuvmb.cc.columbia.edu>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA11432; Mon, 13 Mar 89 08:04:29 EST
Message-Id: <8903131304.AA11432@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA12680; Mon, 13 Mar 89 08:04:17 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 6847; Mon, 13 Mar 89 08:00:28 EST
Received: by CUVMB (Mailer X1.25) id 3703; Mon, 13 Mar 89 08:00:26 EST
Date: 03/13 07:22:11
From: FDCCU@cuvmb.cc.columbia.edu
Subject: PUN file from RSCS - MOSGLA.MAIL
X-Tag: FILE (2245) ORIGIN HLERUL2  MAILER    3/13/89  2:25:25 E.S.T.
To: FDC@cunixc.cc.columbia.edu
Reply-To: MAILER%HLERUL2@cuvmb.cc.columbia.edu

Date:    Mon, 13 Mar 89 13:23 CET
From:    "Johan van Wingen"                          <MOSGLA@HLERUL2>
To:      "F. da Cruz"                         <FDCCU@CUVMA>
Subject: Kermit/ISO


Dear Frank
Here are some additional comments. You can order the International
Register of Coded Character Sets from ECMA free, on official paper,
stating your name and address as the recipient. Ask also for ECMA
Memento 1989, a nice mandarin-colourd booklet which includes a list of
all ECMA standards.
The final character for Arabic (Part 6 ) is G. Then we can update:

Alphabet Summary Table:

  Esc Seq   Alphabet Name             ISO Number     ECMA Number Regist.
                                                                     nr.
  <ESC>(B   ASCII (ANSI X3.4-1986)    ISO 646        ECMA-6
  <ESC>-A   Latin Alphabet No. 1      ISO 8859-1     ECMA-94     100
  <ESC>-B   Latin Alphabet No. 2      ISO 8859-2     ECMA-94     101
  <ESC>-C   Latin Alphabet No. 3      ISO 8859-3     ECMA-94     109
  <ESC>-D   Latin Alphabet No. 4      ISO 8859-4     ECMA-94     110
  <ESC>-L   Latin/Cyrillic            ISO 8859-5     ECMA-113    144
  <ESC>-G   Latin/Arabic              ISO 8859-6     ECMA-114    127
  <ESC>-F   Latin/Greek               ISO 8859-7     ECMA-118    126
  <ESC>-H   Latin/Hebrew              ISO 8859-8     ECMA-121    138
  <ESC>-M   Latin Alphabet No. 5      ISO 8859-9     ECMA-128    148

  Other registered sets (with also a 96-char. G1)

  <ESC>-I   Czech Standard ($ <-> currency sign)                 139
  <ESC>-J   Right Half of ISO 6937-2                 ECMA- ?     142
  <ESC>-K   Mathematical/ Technical set              ECMA- ?     143

A large lot has been registered as a G0. These also include many of the
national versions of ISO 646, but not all. I'll send a table a soon as
have typed one. In Dutch the "ij" and the "IJ" are sometimes handled as
a separate character, and sorted with "y" (as in the list in the Railway
timetable), but even if two letters, "ij" is always capitalized as "IJ".
As a single letter both figure in ISO 6937-2 and in the National Coded
Character Standard for Korean (!!!).

The SGML documents are:
  ISO 8879 SGML
  ISO TR 9573 SGML Users Guide
  ISO 9069 SDIF (SGML Document Interchange Format)
The ODA (Office Document Architecture) standard is:
  ISO 8613 Parts 1,2,4-8 (there is no 3) (630 pages !)
I only receive SC2, SC18 and SC22 documents, not those from SC6, SC21,
(which would be too much for a single person to understand). Thus I have
not got details about FTAM. As far I know X.400 is MOTIS which is about
to be approved as ISO 10021. I received the DIS, but did not study it.
It is under SC18.

As for naming and identifying entities in data transfer increasing use
is being made of ISO 8824,8825 ASN.1 Abstract Syntax Notation 1. This
may even prove an alternative for ISO 2022.
In the document SC2/WG3 N 48, Revision of ISO 4873, it is stated:
"A proposal is now under review in SC21/WG6 for extension of ASN.1 in
the area of character coding identification. That proposal uses
Registration Numbers to identify character sets. It also assumes that a
single identification number is sufficient to identify coding structure,
and that SC2 will maintain a list of such numbers. For simplicity, the
proposal assumes that such a number can be derived directly from the
final character of the announcer ESC sequence. This feature of the
proposal must be clarified before the proposal is finally approved."

More in my next contribution.

FROM  J. W. van Wingen    MOSGLA@HLERUL2.BITNET
Mail to
P. O. Box 486,  2300AL Leiden, Netherlands

17-Mar-89 15:05:28-GMT,1774;000000000001
Return-Path: <FDCCU@cuvmb.cc.columbia.edu>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA09434; Fri, 17 Mar 89 10:05:25 EST
Message-Id: <8903171505.AA09434@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA09299; Fri, 17 Mar 89 10:03:39 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 9051; Fri, 17 Mar 89 10:00:25 EST
Received: by CUVMB (Mailer X1.25) id 1922; Fri, 17 Mar 89 10:00:24 EST
Date: 03/17 09:28:17
From: FDCCU@cuvmb.cc.columbia.edu
Subject: PUN file from RSCS - MOSGLA.MAIL
X-Tag: FILE (9635) ORIGIN HLERUL2  MAILER    3/17/89  4:32:15 E.S.T.
To: fdc@cunixc.cc.columbia.edu
Reply-To: MAILER%HLERUL2@cuvmb.cc.columbia.edu

Date:    Fri, 17 Mar 89 15:18 CET
From:    "Johan van Wingen"                          <MOSGLA@HLERUL2>
To:      "F. da Cruz"                         <FDCCU@CUVMA>
Subject: More on char. sets


Dear Frank
To continue, SGML is Standard Generalized Mark-up Language. A precursor,
GML, runs on IBM systems under DCF, with MVS or VM.
The East-Asian standards are:
China: (ISO Reg.  58)  GB 2312-80
Japan: (ISO Reg.  87)  JIS X 0208 (formerly JIS C 6226-1983)
Korea: (ISO Reg. 149)  KS C 5601-1987
I located a VT340 in the Computing Centre, and got it demonstrated.
I was quite impressed. It is possible to show all the otherwise
unprintable characters on the screen, and all the escape sequences, and
to change the "mode" for displaying what you would see on paper.
In the manual EK-VT3XX-TP-001 you find on page 25 the list of ISO 646
versions that you wanted (except Hungarian).

FROM  J. W. van Wingen    MOSGLA@HLERUL2.BITNET
Mail to
P. O. Box 486,  2300AL Leiden, Netherlands

18-Mar-89 15:13:32-GMT,6449;000000000011
Return-Path: <MURAKAMI@ntt-20.ntt.jp>
Received: from ntt-20.NTT.JP by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA24177; Sat, 18 Mar 89 10:12:40 EST
Date: Sun, 19 Mar 89 00:09:45 I
From: ken-ichiro murakami <MURAKAMI@ntt-20.ntt.jp>
Subject: Re: ISO Kermit Proposal
To: fdc@watsun.cc.columbia.edu
Cc: cmg@cunixc.cc.columbia.edu, KEIBUN%JPNKEKVM.BITNET@ume.cc.tsukuba.junet,
        murakami@ntt-20.ntt.jp
In-Reply-To: <CMM.0.88.605332970.fdc@watsun.cc.columbia.edu>
Message-Id: <12479006446.14.MURAKAMI@NTT-20.NTT.JP>

Frank and Chris,

I talked with Mr.Hirofumi Fujii at KEK about your proposal. We
confirmed that we needed the facility to convert character code and we
had the same idea except for our implementation model. However, we
have not came to the conclusion. It will take for a few weeks to find
solution. So, I'll give you a comment for your original proposal.

Prior to give you a comment, I must explain you our complex situation
about Kanji code. As Mr.Fujii pointed out, we have several kinds of
Kanji code as follows;

(1) SHIFTJIS --- 2 byte length, MSB is used, mainly used in micro
		 computer OS such as MS-DOS and CP/M
(2) EUC      --- 2 byte length, MSB is used, mainly used in mini
		 computer and workstation OS such as SUN, DEC and ELIS
		 (NTT's AI workstation)
		 EUC stands for Extended Unix Code. It's equivalent to
		 VAX code.
(3) JIS-7    --- 2 byte length, MSB is unused, standard Kanji code on
		 UUCP and TCP/IP.
                 This might be equivalent to ISO-2022(JIS X 0202).
(4) JIS-8    --- 2 byte length, MSB is used. I don't know in detail.
(5) vendor specific Kanji code such as IBM, XEROX, etc

Since there is no de facto standard Kanji code, we are often confused
by the inconsistency. This also affects terminal emulation facility in
Kermit. We must support more than three Kanji code.

   In our(NTT's) implementation, we prepared SET KANJI
{JIS-7|VAX(EUC)|SHIFTJIS} command to inform BOTH emulator AND file
transfer module of the Kanji code. The problem is that this will make
inconsistency between your proposal and our requirement. If we adopt
your command(SET FILE TYPE), we must prepare another command(SET
TERMINAL) for terminal emulator to specify Kanji code. It's
inconvenient, since we must specify Kanji code twice.  It's dilemma.;-(

   In contract with our implementation, Mr.Fujii has yet another
idea. In his implementation, he prepared SET TERMINAL KANJI command to
specify Kanji code only for terminal emulator. To support local Kanji
conversion, he will prepare SET LOCAL TRANSLATION {ON|OFF|EUC|JIS8|JIS7}.
This command resembles your SET FILE TYPE command. If user specifies ON,
file transfer module will convert Kanji based on the Kanji code
specified in SET TERMINAL KANJI command. If other code(EUC, JIS8 or
JIS7) is specified, kanji is converted based on the specified Kanji
code type.
   
   Basically, we agree with your proposal. It's good idea to have
standard character code on transmission channel. However, it takes for
a long time for ALL kermit implementation to have code conversion
facility. For the present, we would also like to allow local Kanji
conversion from local Kanji code to remote Kanji code(non-ISO code).
This is our common requirement.

I hope we can find common solution for Kanji conversion implementation.

The following is my comment to your proposal.

>In the meantime, national versions of Kermit can (and do) use SET FILE TYPE
>commands to identify the encoding or standard used for a multibyte alphabet.
>For example, some Japanese Kermit programs have the command SET FILE TYPE
>TEXT, BINARY, or KANJI, and add a further command to specify the local Kanji
                                                                  ~~~~~
>encoding: SET KANJI VAX, JIS, or SHIFTJIS (JIS is the Japan Industrial
>Standard, JIS X 0208; SHIFTJIS is JIS X 0202 which differs from JIS X 0208 by
                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>the introduction of escape sequences to shift between Kanji and ASCII; VAX is
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>the encoding used on Japanese VAX/VMS systems).  These Kermit programs use
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>SHIFTJIS as the transfer syntax, and the Kermit program maps between it and
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>the local format, which may be VAX, JIS, or SHIFTJIS.  To better mesh with the
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>current proposal, however, these programs should make a distinction between
>the file format and the transfer syntax by adding a command like SET
>TRANSFER-SYNTAX SHIFTJIS.

>In this connection, a "rider" to this proposal is that "JS" (for SHIFTJIS)
                                                              ~~~~~~~~~~~~
>be added to the list of Kermit Kermit encodings under Attribute "*".
>Designations for Chinese, Korean, and other multibyte-character-set languages
>are welcome, as are alternative designations for Japanese.

[corrected sentences]

In the meantime, national versions of Kermit can (and do) use SET FILE TYPE
commands to identify the encoding or standard used for a multibyte alphabet.
For example, some Japanese Kermit programs have the command SET FILE TYPE
TEXT, BINARY, or KANJI, and add a further command to specify the remote Kanji
encoding: SET KANJI VAX(EUC), JIS, or SHIFTJIS (JIS is the Japan Industrial
Standard, JIS X 0202 and JIS X 0208; SHIFTJIS and VAX(EUC) are the encoding 
used on Japanese MS-DOS systems and VAX/VMS systems respectively.
These Kermit programs use these specified Kanji encoding 
as the transfer syntax, and the Kermit program maps between the remote format
and local one, which may be VAX(EUC), JIS, or SHIFTJIS.To better mesh with the
current proposal, however, these programs should make a distinction between
the file format and the transfer syntax by adding a command like SET
TRANSFER-SYNTAX JIS.

In this connection, a "rider" to this proposal is that "JS" (for JIS)
be added to the list of Kermit Kermit encodings under Attribute "*".
Designations for Chinese, Korean, and other multibyte-character-set languages
are welcome, as are alternative designations for Japanese.


-Ken

P.S. I posted to fj.kermit(kermit news group in Japan) for Korean and
Chinese character set. But, nobody has contacted me yet.

-------

19-Mar-89 22:07:43-GMT,10386;000000000401
Return-Path: <fdc>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA07279; Sun, 19 Mar 89 17:05:18 EST
Date: Sun, 19 Mar 1989 17:05:17 EST
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: ken-ichiro murakami <MURAKAMI@ntt-20.ntt.jp>
Cc: Hirofumi Fujii <KEIBUN@jpnkekvm.bitnet>,
        Christine M Gianone <cmg@watsun.cc.columbia.edu>
Subject: Re: ISO Kermit Proposal
In-Reply-To: Your message of Sun, 19 Mar 89 00:09:45 I
Message-Id: <CMM.0.88.606348317.fdc@watsun.cc.columbia.edu>

Ken, many thanks for your comments, and your careful reading of the proposal.
You have clarified the situation for us a lot.  In particular, it seems that
we did not devote enough attention to terminal emulation in our proposal.  We
hope that the following comments will be useful in your discussions with Hiro
and in the meeting which you mentioned previously.

First, we have several questions:

  1. Do the Japanese codes JIS-7, JIS-8, EUC (VAX, DEC), and SHIFTJIS also
     include the ASCII alphabet?  Or must you use two different alphabets to
     switch between ASCII and Kanji?  If so, how?

  2. Do the Japanese codes include Kana?

  3. On a particular system, such as the VAX, the MS-DOS PC, or the ELIS
     workstation, is it normal to use only one character code internally?

  4. Is JIS C 6228 equivalent to ISO 2022?  That is, does it specify the
     same mechanisms for transmitting 8-bit data over a 7-bit connection --
     Shift-In and Shift-Out to switch between the G0 and G1 sets?

Kermit should be as easy to use as possible, but should still give the user
the ability to specify exactly what character codes are in use for both
terminal emulation and file transfer.  There should also be a consistent set
of commands for all Kermit programs.

TERMINAL EMULATION

The following command should specify what character set is sent and received
on the transmission medium during terminal emulation.  The Kermit program must
translate between this character set and the one that is used locally.

SET TERMINAL CHARACTER-SET <name>
  This command already exists, but is currently used only in MS-DOS Kermit, and
  only to switch between US and UK ASCII.  We should extend this command to
  select any character code, and we should have a standard set of <name>'s
  including the currently defined ISO 8-bit alphabets:

    LATIN1-ISO, ..., LATIN5-ISO, CYRILLIC-ISO, GREEK-ISO,
    HEBREW-ISO, ARABIC-ISO, etc.

  7-bit ASCII and its national variants (ISO-646):

    ASCII-US, ASCII-UK, ASCII-FR, ASCII-DE, ASCII-IT, ASCII-NL, ASCII-ES,
    ASCII-DK, ASCII-FI, ASCII-IS, ASCCI-SE, ASCII-NO, ASCII-TR, etc.

  And for Japanese:

    KANJI-JIS, KANJI-SHIFTJIS, KANJI-EUC (this is the same as VAX or DEC?).

For example, an MS-DOS computer might use SHIFTJIS locally, but a VAX
communicates using EUC, so the MS-DOS Kermit user would give the command SET
TERM CHAR KANJI-EUC.

We assume that a Kermit terminal emulator may be used to connect to a variety
of computers -- DEC, IBM, Fujitsu, Hitachi, etc -- which probably use
different character codes for communicating with terminals.  So unfortunately,
the Japanese user who logs in to more than one kind of computer will have to
issue the appropriate SET TERMINAL CHARACTER-SET command each time.

You may have noticed that we did not define separate names for 7-bit and 8-bit
versions of the same alphabet.  We think that the actual method used for
transmitting these alphabets should be governed by the SET PARITY setting.
That is, if parity is EVEN, ODD, MARK, or SPACE, then the 7-bit code extension
techniques described in ISO 2022 (and JIS C 6228?) should be used. If parity
is NONE, then 8-bit codes may be sent and received.

FILE TRANSFER

Now, what about file transfer?  Here we must answer three questions.  First,
is the file text, binary, or some special application?  Second, what character
code is used in the file?  Third, what character set is used inside the Kermit
packets?  Third, what character code is used in the local file?  These are
specified in separate commands:

  SET FILE TYPE {TEXT, BINARY, WORDSTAR, ...}
  SET FILE CHARACTER-SET <name>
  SET TRANSFER-SYNTAX <standard>

FILE TYPE BINARY means that data is transmitted and received without any
translation or conversion at all.  SET FILE TYPE TEXT means that alphabet and
record format conversions are done.  SET FILE TYPE <application> means that
some application-specific conversions are done between the disk file and the
transfer syntax.  This would be used with word processors, spreadsheets,
databases, etc.  A lot of design work needs to be done in this area!

SET FILE CHARACTER-SET can be any of the alphabet names listed above, or also
some system-dependent codes like EBCDIC or IBM-CODEPAGE-xxx (IBM mainframes),
CDC-SIXBIT (CDC mainframes), etc.  This applies to files of type TEXT, but not
BINARY, and may or may not apply to application-specific file types, depending
on the application.

The possibilities for TRANSFER-SYNTAX should be much more restricted.  So far
we have NORMAL -- the old Kermit syntax (which follows TEXT or BINARY).  We
have proposed adding ISO8 for the European 8-bit alphabets.  We should now
also add names for the common Asian codes, such as KANJI-JIS, KANJI-EUC, or
KANJI-SHIFTJIS.  Ideally, Japanese Kermit programmers would agree upon only
one transfer syntax for Kanji, and preferably this would be a code that also
included ASCII as a subset.

We believe that the terminal emulation character set should not be linked to
the file transfer syntax.  There are potentially hundreds of different
terminal character sets in the world, but we don't want the Kermit protocol
to have to know about them, otherwise we will have a situation in which each
Kermit program would have to know the codes of hundreds of other systems.
This is the kind of combinatorial problem that data communication protocols
are designed to avoid.  And we are in the lucky position of being able to
design the Kermit protocol in the best possible way right now.

So far, we have separated the 8-bit European alphabets from Kanji in this
discussion.  What mechanism can be used to allow Kanji to coexist with French,
Hebrew, Russian, Greek, and other language codes?  We hope that the answer to
this question is that JIS C 6228 uses the same mechanisms as ISO 2022 to
identify and switch between alphabets.  Therefore, we hope it will be possible
to use an escape sequence to identify Kanji code, and therefore to switch
between Japanese and ISO alphabets in the same data stream.

SIMPLICITY

So now the poor user is faced with several confusing commands: SET TERMINAL
CHARACTER-SET, SET FILE CHARACTER-SET, SET FILE TYPE, and SET TRANSFER-SYNTAX.
If a Kermit program has all these commands, how can we make it easy to use?
Can we supply each command with a useful default?

TERMINAL CHARACTER-SET:
  It is not possible to specify a default special terminal character
  set for a particular Kermit program, because it depends on what kind of
  computer is on the other end of the connection.  Therefore, the default
  must remain what it has always been -- ASCII.

FILE TYPE:
  The default is, as always, TEXT.

FILE CHARACTER-SET:
  The default here should be the local system's normal encoding of
  text (ASCII, EBCDIC, LATIN1-ISO, KANJI-JIS, etc).

TRANSFER-SYNTAX:
  The default is, as always, NORMAL (that is, ASCII text or binary, depending
  on SET FILE TYPE).  Other possibilities like ISO8 or JIS must be specified.

We cannot change these basic defaults, because these are already used by
hundreds of different Kermit programs all over the world.  So how can we make
Kermit easy to use by the Japanese (or Korean, or Russian, or ...) user?

These four commands give all the information necessary to perform the required
translations during both terminal emulation and file transfer.  But there may
be thousands of different combinations of these three commands with all their
possible parameters.  The best way to simplify the user interface is to define
macros for the combinations that are commonly used at each site.  For example,
in MS-DOS Kermit the Japanese user could have the following definitions in her
or his MSKERMIT.INI file:

  define vax set parity even, set term char kanji-euc, -
     set transfer-syntax kanji-euc, set file type shiftjis

  define fujitsu set parity none, set term char kanji-jis, -
     set transfer-syntax kanji-jis, set file type shiftjis

  define pdp11 set parity even, set term char ascii-us, -
     set transfer-syntax normal, set file type text

and then just give "vax" or "fujitsu" commands.  Japanese Kermit programs
could even be distributed with a commonly-useful set of macros pre-defined and
documented.

Kermit programs that do not have command macros can define special new
commands that are equivalent to specific combinations of the four
character-set-related commands.

SUMMARY

We have attempted to specify the simplest set of commands that can be used
in all Kermit programs all over the world.  Unfortunately, they are not as
simple as we would want them to be because the character code used for
terminal emulation might be different from the local file system's character
code, and also from the Kermit file transfer syntax.  And we also must have
a way to identify the local file format (text, binary, word processor,
database, etc).

Obviously, many Japanese Kermit programs that are in operation today do not
have these same commands, or do not use them in the same way.  That's OK.
They should still be able interoperate with "new" Kermit programs in any of
several ways: (1) using their current SET KANJI, SET TERMINAL KANJI, SET LOCAL
TRANSLATION, and similar commands; (2) using binary-mode file transfer between
systems that use the same code; or (3) making sure that the file transfer
syntax of "old" and "new" Kanji-capable Kermit programs is compatible.

We hope that (3) is possible.  But we think it is more important that Kanji
file transfer syntax be compatible with ISO8 transfer syntax -- that is, use
the same kinds of escape sequences -- so that Japanese is not treated
differently from all the other languages, and so that Japanese text can be
mixed with text in any other language.

Thank you once again for your careful attention and your valuable insights.
We hope that agreement can be reached very soon.

- Christine and Frank

20-Mar-89 23:05:09-GMT,5910;000000000011
Return-Path: <@cuvmb.cc.columbia.edu:A-PIRARD@BLIULG11.BITNET>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA18673; Mon, 20 Mar 89 18:01:42 EST
Message-Id: <8903202301.AA18673@watsun.cc.columbia.edu>
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 0266; Mon, 20 Mar 89 17:56:20 EST
Received: from VM1.EARN-ULG.AC.BE by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with
 BSMTP id 5632; Mon, 20 Mar 89 17:56:19 EST
Received: by BLIULG11 (Mailer R2.02) id 3540; Mon, 20 Mar 89 23:50:55 +0100
Date:         Mon, 20 Mar 89 23:46:41 +0100
From: Andre' PIRARD <A-PIRARD%BLIULG11@cuvmb.cc.columbia.edu>
Subject:      Re: Kermit International Character Set Proposal
To: Frank da Cruz <fdc@watsun.cc.columbia.edu>, Joe Doupnik <jrd@usu>,
        Paul Placeway <paul@cis.ohio-state.edu>,
        Andre Pirard <A-PIRARD@bliulg11>, Baruch Cochavy <baruchc@techunix>,
        Johan Van Wingen <MOSGLA@hlerul2>,
        Ken-ichiro Murakami <MURAKAMI%NTT-20.NTT.JP@relay.cs.net>,
        Kohichi Nishimoto <s153380%tkov02.DEC@decwrl.dec.com>,
        Hirofumi Fujii <KEIBUN@jpnkekvm>, "Gisbert W.Selke" <RECK@dbnuama1>,
        Kurt Enulf <UPSKE@seguc11>, Jacob Palme <jacob_palme_qz@qzcom>,
        Per Lindberg <Per_Lindberg_ZQ@qzcom>,
        Bj|rn Larsen <x_larsen_b%use.uio.uninett@tor.nta.no>,
        "Hans A. ]lien" <hans%ifi.uio.no@tor.nta.no>,
        "Kai U.Leppamaki" <LK-KLE@finhut>,
        Steve Jenkins <pdsoft%uk.ac.lancs.cent1@nss.cs.ucl.ac.uk>,
        Jean Dutertre <dutertre%padis1.DEC@decwrl.dec.com>,
        Gerard Gaye <GAYE@frsac11>, David Guerlet <KERMIT@czheth5a>,
        Bernie Eiben <eiben@tops20.dec.com>, Volker Edelhoff <edelhoff@unido>
Cc: Christine M Gianone <cmg@cunixc.cc.columbia.edu>
In-Reply-To:  Message of Thu, 2 Mar 1989 1:02:25 EST from
 <fdc@watsun.cc.columbia.edu>

>         A KERMIT PROTOCOL EXTENSION FOR INTERNATIONAL CHARACTER SETS

Well, we are busy moving to new premises and it's going to last
probably at least one week after Easter. But I take some extra
work time for a short reply to the main points. Please excuse if
my English is not very "careful" this time.

First of all, many thanks to Frank for considering the problem
and for a neat summary of the standards.

Now, I think some history of our problem will help.
ASCII (ANSI X 3.4) is a well settled standard, but only covers
some languages. 7-bit communication is a well encrusted habit
too. For those reasons, the least-shaking way to support other
languages was to invent ISO 646 which redefines some characters
of ASCII. But, in addition to the inherent inconvenience, the
amount redefined is not enough for many languages and we had to
compose additional characters (circumflex and trema in French) by
mean of <letter-backspace-accent> (or the other way round
according to occasional taste). It allowed some word processing,
but is a nightmare for data processing. Imagine that in DBASE
fixed fields, then trying to sort it! The only easy way out was
to uppercase everything, an excuse to drop the accents.

This is why 8-bit extended sets, and especially an ISO 8859
standard, are being applauded here (but why we much regret some
software insist on screening out the 8th bit on the IBM PC).

But, sorrily, that charm holds only when confining into a single
version of ISO 8859. These standards define how to transmit data,
not how to store it. It probably comes from the evidence of how
to store ASCII (text files are coherent within a given system and
almost across all of them) and that this fact extends to a single
8859 version. Thus, every software knows what to store. Thinking
of storing multi-ISO data by using 2022 would render it even less
manageable than with 646. Leaving it up to anybody's whim is
starting the same story all over again. And this time, the 8 bits
are exhausted and I really don't know where the wind blows from.
Maybe it is too soon to say.

So, my opinion is that while ISO 8859 + 2022 is OK to instruct a
terminal or printer how to switch ISO versions and is quite
suitable for a Kermit's terminal mode, the lack of definition of
how to store the data makes Kermit's file transfer a real
problem, mostly because it will not know how a particular
software would store it. I am not sure there is a present way out
this dead end, but I hope to be wrong because I sure hope for
one.

The problem I raised is that, even when restricting to a single
version of ISO, one cannot switch an Macintosh for an IBM PC on a
communication line without telling the other end that it
happened. That's because they use different code points for the
upper half of what are roughly equivalent codes. So, the "other
end" has to be aware of the Mac, the PC, the Amiga, the Atari
etc... And the PC has to be aware of the Mac, the Atari etc...
Translation tables for terminal mode and file transfer are a real
plus for any purpose, easy to do and not committing. The Kermit
protocol does well for 8-bit transfer, but SI/SO is needed for 7-
bit wide terminal mode. I didn't want to further bother people
with our problems and suggested that these tables be patchable
internals, but my idea is that each machine talking the best of
its ISO8859 on the communication like is the suggestion to
include in an apparent program's code support option to make each
machine understand the other at best.

Before going to more details, I can say I loved the idea of
Kermit being able to transmit an emerging standard file structure
(being able to tell text from pictures). But why drop 8th bit
quoting? It's Kermit's own right to superimpose it's own data
encoding and, on the contrary, I can tell Kermit 8th bit quoting
is more efficient that SI/SO would be (2 to 3, accented letters
are most often isolated in French).

Thanks once more, Frank. And hoping to hear from others' opinion.
Gee, it's late!

Andr).

21-Mar-89  0:13:58-GMT,2413;000000000001
Return-Path: <JRD@cc.usu.edu>
Received: from cc.usu.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA19362; Mon, 20 Mar 89 19:13:00 EST
Message-Id: <8903210013.AA19362@watsun.cc.columbia.edu>
Date: Mon, 20 Mar 89 17:10 MDT
From: Joe Doupnik <JRD@cc.usu.edu>
Subject: Andre's ISO comments.
To: fdc@watsun.cc.columbia.edu
X-Vms-To: IN%"fdc@watsun.cc.columbia.edu"

From:	USU::JRD          "Joe Doupnik" 20-MAR-1989 17:09
To:	IN%"A-PIRARD@BLIULG11.BITNET",JRD         
Subj:	RE: Re: Kermit International Character Set Proposal

Frank,
	I think that Andre is making the case FOR using ISO 2022, or
relatives. Terminal communications is one part, where the host emits
characters using a standard of one kind or another. In that case the
PC has to translate comms line codes to displayable characters via a
table (as present) or through in-line shift locks (ISO style or similar).
If Macs and PCs do their job properly then the host always sends the
same bytes for the same text and the screens appear much alike. In fact,
terminal emulation has the messier task of needing to convert from more
host formats than file transfers.
	File transfers need language translation at both ends and a small
number of comms line forms.
	The proposal is directed at finding those small number of comms
line formats, and at the same time satisfying some or most of the terminal
emulation aspects as a consequence. If we do select a set then a particular
Kermit need only understand the set <--> local conversion for file transfer,
plus any optional terminal emulation problems (the poor PC's need to do
the most work here, alas). If the sets are well selected they will include
the widely used "local" forms, such as ISO xxxx and straight ASCII and
Kermit privately encoded but otherwise transparent Binary (omitting the
byte ordering problems) stream i/o.
	Eight vs seven bits is always a worry. It affects terminal emulation
more than file transfer but Andre's point of shift vs prefixing overhead
on some languages is well taken. However, the same file ought to be
transportable through either channel width automatically (by knowing whether
parity is used or if the channel is seven bits wide regardless, as on VAX
VMS systems). Personally, I think that extra comms line characters might
offset extra program execution time for some encoding methods and thus make
throughput a difficult quantity to estimate.
	Regards,
	Joe D.

21-Mar-89  0:47:02-GMT,3127;000000000401
Return-Path: <fdc>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA19083; Mon, 20 Mar 89 18:43:50 EST
Date: Mon, 20 Mar 1989 18:43:43 EST
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: Andre' PIRARD <A-PIRARD%BLIULG11@cuvmb.cc.columbia.edu>
Cc: Joe Doupnik <jrd@usu>, Paul Placeway <paul@cis.ohio-state.edu>,
        Andre Pirard <A-PIRARD@bliulg11>, Baruch Cochavy <baruchc@techunix>,
        Johan Van Wingen <MOSGLA@hlerul2>,
        Ken-ichiro Murakami <MURAKAMI%NTT-20.NTT.JP@relay.cs.net>,
        Kohichi Nishimoto <s153380%tkov02.DEC@decwrl.dec.com>,
        Hirofumi Fujii <KEIBUN@jpnkekvm>, "Gisbert W.Selke" <RECK@dbnuama1>,
        Kurt Enulf <UPSKE@seguc11>, Jacob Palme <jacob_palme_qz@qzcom>,
        Per Lindberg <Per_Lindberg_ZQ@qzcom>,
        Bj|rn Larsen <x_larsen_b%use.uio.uninett@tor.nta.no>,
        "Hans A. ]lien" <hans%ifi.uio.no@tor.nta.no>,
        "Kai U.Leppamaki" <LK-KLE@finhut>,
        Steve Jenkins <pdsoft%uk.ac.lancs.cent1@nss.cs.ucl.ac.uk>,
        Jean Dutertre <dutertre%padis1.DEC@decwrl.dec.com>,
        Gerard Gaye <GAYE@frsac11>, David Guerlet <KERMIT@czheth5a>,
        Bernie Eiben <eiben@tops20.dec.com>, Volker Edelhoff <edelhoff@unido>,
        Christine M Gianone <cmg@cunixc.cc.columbia.edu>,
        Frank da Cruz <fdc@watsun.cc.columbia.edu>
Subject: Re: Kermit International Character Set Proposal
In-Reply-To: Your message of Mon, 20 Mar 89 23:46:41 +0100
Message-Id: <CMM.0.88.606440623.fdc@watsun.cc.columbia.edu>

Brief response to Andre's message...  First of all, Christine Gianone
is the principal author of the ISO / Kermit proposal, I only helped!

Second, Andre is absolutely correct: the proposal begs the question of local
file storage.  Yes, there is no standard for storing mixed alphabets within
files.  But by using ISO 2022 as the file transfer syntax, we are able to
represent mixed alphabets unambiguously on the communication line.  It is up
to the Kermit programs to convert between this syntax and the local storage
formats.  We recognize that there are many application-specific formats for
mixed alphabets, so it is up to the Kermit programmer to learn these formats
and make the conversions.  We hope that the introduction of this extension to
the Kermit protocol will in some small way provide an incentive for computer
and software makers and standards organizations to speed up their efforts to
define storage formats for mixed alphabet files.

Third, others have complained about the lack of attention to terminal
emulation in this proposal.  This deficiency will be corrected in the next
draft of the proposal.

Finally, others have also suggested that there is no reason (other than
complexity) to restrict the ISO / Kermit file transfer syntax to the 7-bit
environment with locking shifts (similar to ISO 4873 Level 3).  If this is
the general opinion, then the proposal will be amended to allow for 8-bit data
transfer without shifts (similar to ISO 4873 Level 1).  Level 2 is not
considered practical, because too much complexity is required if we are to
keep G2 and G3 sets active.

Further opinions?  - Chris and Frank


21-Mar-89 14:55:32-GMT,2886;000000000001
Return-Path: <paul@cis.ohio-state.edu>
Received: from tut.cis.ohio-state.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA25944; Tue, 21 Mar 89 09:53:48 EST
Received: from morganucodon.cis.ohio-state.edu by tut.cis.ohio-state.edu (5.61/3.890314)
	id AA25428; Tue, 21 Mar 89 09:52:54 -0500
Received: by morganucodon.cis.ohio-state.edu (3.2/2.890120)
	id AA07368; Tue, 21 Mar 89 09:50:09 EST
Date: Tue, 21 Mar 89 09:50:09 EST
From: Paul W. Placeway <paul@cis.ohio-state.edu>
Message-Id: <8903211450.AA07368@morganucodon.cis.ohio-state.edu>
To: A-PIRARD%BLIULG11.BITNET@cunyvm.cuny.edu
Cc: Frank da Cruz <fdc@watsun.cc.columbia.edu>,
        Christine M Gianone <cmg@cunixc.cc.columbia.edu>,
        Joe Doupnik <jrd@usu.bitnet>, Andre Pirard <A-PIRARD@bliulg11.bitnet>,
        Baruch Cochavy <baruchc@techunix.bitnet>,
        Johan Van Wingen <MOSGLA@hlerul2.bitnet>,
        Ken-ichiro Murakami <MURAKAMI%NTT-20.NTT.JP@relay.cs.net>,
        Kohichi Nishimoto <s153380%tkov02.DEC@decwrl.dec.com>,
        Hirofumi Fujii <KEIBUN@jpnkekvm.bitnet>,
        Gisbert W.Selke <RECK@dbnuama1.bitnet>,
        Kurt Enulf <UPSKE@seguc11.bitnet>,
        Jacob Palme <jacob_palme_qz@qzcom.bitnet>,
        Per Lindberg <Per_Lindberg_ZQ@qzcom.bitnet>,
        "Bj|rn Larsen" <x_larsen_b%use.uio.uninett@tor.nta.no>,
        "Hans A. ]lien" <hans%ifi.uio.no@tor.nta.no>,
        Kai U.Leppamaki <LK-KLE@finhut.bitnet>,
        Steve Jenkins <pdsoft%uk.ac.lancs.cent1@nss.cs.ucl.ac.uk>,
        Jean Dutertre <dutertre%padis1.DEC@decwrl.dec.com>,
        Gerard Gaye <GAYE@frsac11.bitnet>,
        David Guerlet <KERMIT@czheth5a.bitnet>,
        Bernie Eiben <eiben@tops20.dec.com>,
        Volker Edelhoff <edelhoff@unido.bitnet>
In-Reply-To: Andre' PIRARD's message of Mon, 20 Mar 89 23:46:41 +0100 <8903202303.AA07885@cheops.cis.ohio-state.edu>
Subject: Kermit International Character Set Proposal
Reply-To: paul@cis.ohio-state.edu

I have to agree with Andre': since Kermit allready has a transport
layer capable of 8-bit wide transmision, why further confuse things by
making the text translation layer only 7 bits wide?  On of the
strengths of the Kermit protocol is a reasonable layering.  As Andre'
said, the kermit 8-bit-quote is more efficient than the locking shifts
for Latin character based languages, and if the actuall comunication
path is 8 bits wide, then there is not penalty for using the G1
characters at all.

I like the idea of a standard international text transfer protocol for
Kermit, and think the preamble definition of character sets used, and
the ability to switch them back and forth is a good thing, but we
should use the whole 8 bit channel, and let lower layers deal with
shoving the data through 7 bit hardware.  If the local machine cannot
store 8 bit text, fine; the local kermit allready is responsible for
translating into the local text format.

			-- Paul

22-Mar-89 15:26:29-GMT,8463;000000000401
Return-Path: <MAILER@cuvmb.cc.columbia.edu>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA03833; Wed, 22 Mar 89 10:23:24 EST
Resent-Message-Id: <8903221523.AA03833@watsun.cc.columbia.edu>
Message-Id: <8903221523.AA03833@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA23238; Wed, 22 Mar 89 10:23:11 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 1030; Wed, 22 Mar 89 10:19:06 EST
Received: by CUVMB (Mailer X1.25) id 8160; Wed, 22 Mar 89 10:19:04 EST
Received: from JPNKEKVM by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 8159; Wed, 22 Mar 89 10:19:03 EST
Received: by JPNKEKVM (Mailer R2.02) id 2231; Thu, 23 Mar 89 00:22:03 JST
Date:         THU, 23 MAR 89 00:21:28 JST
From: Hirofumi Fujii <KEIBUN%JPNKEKVM@cuvmb.cc.columbia.edu>
Subject:      Re:Kermit/ISO proposal
To: Frank da Cruz <FDCCU@cuvmb.cc.columbia.edu>, Joe Doupnik <JRD@usu>,
        Andre Pirard <A-PIRARD@bliulg11>,
        Ken-Ichirou Murakami <murakami%ntt-20.ntt.jp@jpntsuku>,
        Kohichi Nishimoto <s153380%tkov02.DEC@decwr1.dec.com>
Resent-Date: Wed, 22 Mar 89 10:19:03 EST
Resent-From: Network Mailer <MAILER@cuvmb.cc.columbia.edu>
Resent-To: fdc@cunixc.cc.columbia.edu

                          - Summary -
I agree to the proposal of the Kermit ISO 2022 extension.  And I also agree
to allow the 8-bit data for ISO/Kermit file transfer.  The overhead to switch
the character set in 7-bit environment is very high for Japanese text files.
I propose to use the Kermit A-packet with ISO 2022 announcer to negotiate
the extension protocol.
                       - End of summary -

I don't know about the ISO 2022. My only knowledge about this standard
is the JIS X 0202 (JIS = Japanese Industrial Standard) which is corresponding
to the ISO 2022.  There may be the diffrences between JIS X 0202 and ISO 2022,
so I will briefly describe my understanding about this standard at the bottom
of this mail.  Please point out if my understanding is wrong.
In the followings, it is assumed that the JIS X 0202 is equivalent to the
ISO 2022.

Many of the Japanese computer system is using at least 3 character sets,
Roman (almost equivalent to US-national i.e., so called ASCII),
Katanana ( 1byte code ) and Kanji ( 2byte code ).  And the internal(local)
representation of these characters are system(OS) dependent.  For example,

      SYSTEM          Roman        Katakana       Kanji
      ------      ------------ --------------- -------------------
      MS-DOS      JIS-Roman    JIS-Katakana    MS-Kanji(Shift-JIS)
      VAX/VMS     JIS-Roman    (see note 1)    DEC-Kanji (see node 2)
      IBM/VM/CMS  EBCDIC-Roman EBCDIC-Katakana IBM-Kanji
      Unix        JIS-Roman    (see note 1)    EUC(Extended Unix code)

(Note)
1.Katakata is invoked by Locking-shift mechanism on VAX/VMS, and by
  Single-shift mechanism on Unix.
2.DEC-Kanji code and EUC are almost equivalent.

Usually, Japanese text file contains the above three characters.  And in
this case, switching the character set in 7-bit environment is very expensive.
Let me show you an example

            1234567890123456789012345678
            ----------------------------
            This is an English sentence.
            ----------------------------
            NNNNRKKRNNRRRRRRRRRNNNNNNNNR

where N is Kanji, K is Katakana and R is Roman (of course this is not
the real Japanese sentence but the character set in the sentence looks
like this).  In 7-bit enviroment, we usually assign as
            G0:Roman, G1:Katakana
so the above sentence is translated as

<ESC>$BThis<ESC>(J <SO>is<SI> <ESC>$Ban<ESC>(J English <ESC>$Bsentence<ESC>(J.

28-byte text needs additional 20 bytes in this case!
In 8-bit enviroment, usually we assign at the beginning,
           GL=G0:Roman
              G1:Katakana
           GR=G3:Kanji
so the above sentence becomes

           This <LS1>is<LS0> an English sentence.
           ^^^^              ^^         ^^^^^^^^

where ^ means 8th bit ON (GR character set).  In this case, only 2 bytes
are required to switch the character set. (Note that the Locking-shift
mechanism is required even in 8-bit environment.)

These discussions are restricted to the Japanese, but I think the situation
is the same for other countries where they use more than three character sets

The 8-bit environment is better than 7-bit one.  However, full 8-bit
implementation requires a lot of efforts.  Thererfore, I propose to
use the announcer to negotiate the extension protocol.  The form of the
ISO 2022 announcer is <ESC><SP>F.  The sending Kermit inform the 'F' of
the announcer by using Kermit A-packet (e.g., with encoding (*) T{xxx}
where xxx is the combination of 'F').  The receiver accept or refuse
by using Kermit reply mechanism. I think this may be a great help for
implementation.

............................................................................
The followings are my understanding about the JIS-X0202 (ISO-2022).

1. This standard defines code extension techniques for information interchange
   in BOTH 7bit AND 8bit environments.
      ^^^^^^^^^^^^^^^^^^
2. There are four intermediate character sets called G0, G1, G2 and G3.
             ^^^^                                    ^^^^^^^^^^^^^^^^^
3. In 7bit environment, only one character set can be activated at one
   time.  The active character set can be selected from the above intermediate
   character set by issuing the following control codes
         SI  (Shift in)            invoke G0 character set
         SO  (Shift out)           invoke G1 character set
         LS2 (locking-shift two)   invoke G2 character set
         LS3 (locking-shift three) invoke G3 character set

4. In 8bit environment, two character sets GL and GR can be activated
   at one time. GL character set is selected if the 8th bit is OFF and GR
   is selected if the 8th bit is ON.  The active chatacter sets are
   selected by
         LS0  (Locking-shift zero)        invoke G0 character set to GL
         LS1  (Locking-shift one)         invoke G1 character set to GL
         LS2  (Locking-shift two)         invoke G2 character set to GL
         LS3  (Locking-shift three)       invoke G3 character set to GL
         LS1R (Locking-shift one right)   invoke G1 character set to GR
         LS2R (Locking-shift two right)   invoke G2 character set to GR
         LS3R (Locking-shift three right) invoke G3 character set to GR

5. In both 7bit and 8bit environments, a single-byte chatacter set is
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^
   designated to the intermediate character set by issuing the following
   ESC sequences
         ESC 2/8 F  or  ESC 2/12 F   designate character set F to G0
         ESC 2/9 F  or  ESC 2/13 F   designate character set F to G1
         ESC 2/10 F or  ESC 2/14 F   designate character set F to G2
         ESC 2/11 F or  ESC 2/15 F   designate character set F to G3

         where F is
           A:UK, B:US,..., I:JIS-Katakana, J:JIS-Roman etc.

   A multi-byte character set is designated to the intermediate character
     ^^^^^^^^^^^^^^^^^^^^^^^^
   set by
         ESC 2/4 F      or ESC 2/4 2/12 F  designate character set F to G0
         ESC 2/4 2/9 F  or ESC 2/4 2/13 F  designate character set F to G1
         ESC 2/4 2/10 F or ESC 2/4 2/14 F  designate character set F to G2
         ESC 2/4 2/11 F or ESC 2/4 2/15 F  designate character set F to G3

         where F is
           A:Chinese-Kanji, B:Japanese-Kanji etc.,

6. One character in the G2 or G3 character set can be invoked by
   issuing the
         SS2   invoke next one character from G2
         SS3   invoke next one character from G3
   In 8bit environment, the character is invoked to GL character set.

7. At the beginning of the information interchange, extension method
   used in the subsequent data stream is announced by
         ESC 2/0 F
   where F is one of the 4/1, 4/2, 4/3, 4/4, 4/5, 4/6, 4/7,
   5/0, 5/2, 5/3, 5/4, 5/5, 5/6, 5/7, 5/10, 5/11.
   For example,  4/1  G0 only. No LS. GR is not used.
                 4/2  G0 and G1. SI and SO (LS0 and LS1) are used.
                 4/3  G0 and G1 only in 8bit env. LS's are not used.
                      GL = G0, GR = G1.
                 etc.

--------------
Hirofumi Fujii
National Laboratory for High Energy Physics (KEK)
KEIBUN@JPNKEKVM.BITNET

22-Mar-89 16:28:01-GMT,3543;000000000001
Return-Path: <fdc>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA04632; Wed, 22 Mar 89 11:19:30 EST
Date: Wed, 22 Mar 1989 11:19:29 EST
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: Joe Doupnik <jrd@usu.bitnet>, Paul Placeway <paul@cis.ohio-state.edu>,
        Andre Pirard <A-PIRARD@bliulg11.bitnet>,
        Baruch Cochavy <baruchc@techunix.bitnet>,
        Johan Van Wingen <MOSGLA@hlerul2.bitnet>,
        Ken-ichiro Murakami <MURAKAMI%NTT-20.NTT.JP@relay.cs.net>,
        Kohichi Nishimoto <s153380%tkov02.DEC@decwrl.dec.com>,
        Hirofumi Fujii <KEIBUN@jpnkekvm.bitnet>,
        Gisbert W.Selke <RECK@dbnuama1.bitnet>,
        Kurt Enulf <UPSKE@seguc11.bitnet>,
        Jacob Palme <jacob_palme_qz@qzcom.bitnet>,
        Per Lindberg <Per_Lindberg_ZQ@qzcom.bitnet>,
        "Bj|rn Larsen" <x_larsen_b%use.uio.uninett@tor.nta.no>,
        "Hans A. ]lien" <hans%ifi.uio.no@tor.nta.no>,
        Steve Jenkins <pdsoft%uk.ac.lancs.cent1@nss.cs.ucl.ac.uk>,
        Jean Dutertre <dutertre%padis1.DEC@decwrl.dec.com>,
        Gerard Gaye <GAYE@frsac11.bitnet>,
        David Guerlet <KERMIT@czheth5a.bitnet>,
        Bernie Eiben <eiben@tops20.dec.com>,
        Volker Edelhoff <edelhoff@unido.bitnet>,
        John Chandler <pepmnt@cfaamp.bitnet>
Subject: Kermit / ISO Transfer Syntax: 7-bit vs 8-bit
Cc: Frank da Cruz <fdc@watsun.cc.columbia.edu>,
        Christine M Gianone <cmg@watsun.cc.columbia.edu>
Message-Id: <CMM.0.88.606586769.fdc@watsun.cc.columbia.edu>

The prevailing sentiment seems to be to allow 8-bit data transfer, a`la ISO
4873 Level 1, and let Kermit's packet encoding do all of the transformations
necessary to transfer 8-bit characters in the 7-bit environment.  That means
that whenever a character with its 8th bit set to 1 is transmitted on a
7-bit connection (i.e. when PARITY is not NONE), it will be prefixed by the
Kermit 8th-bit-prefix character (normally '&').  This is equivalent to the
Single Shift that is used in ISO 4873 Level 2.  This mode of operation
depends upon the Kermit program having the 8th-bit-prefixing option.  This
is an OPTIONAL feature of the Kermit protocol, negotiated between the two
Kermits.  In practice, most widely-used Kermit programs do have this
feature.

The main advantage of allowing 8-bit ISO text transfer is that the overhead
is lower for languages like French and German that shift frequently between
the G0 and G1 sets.  A disadvantage is that languages like Russian, Greek,
Hebrew, and Arabic that tend to stay in the G1 set will have a very high
prefixing overhead in the 7-bit environment.

The question now becomes: should Kermit continue to allow ISO 7-bit text
transfer with locking shifts, as originally proposed?  If we do not, then
Kermit programs that do not implement 8th-bit prefixing will not be able to
transfer mixed-alphabet texts.  But maybe that's OK -- we can simply state
that 8th-bit prefixing is a PREREQUISITE for mixed-alphabet text transfer.
The advantage here is simplicity.  The Kermit program will be simple, and
the protocol specification will be simple.  This increases the chance that
programmers will actually want to -- and be able to -- do the work.

Allowing the full range of ISO 4873 / 2022 code extension techniques would
give us the greatest flexibility (e.g. efficiency for both French and
Cyrillic), but would make the Kermit mixed-alphabet text protocol
specification nearly as complicated as the ISO standards themselves, and
about as likely to be implemented.

Shall we take a vote?  - Christine and Frank

22-Mar-89 17:59:22-GMT,12197;000000000401
Return-Path: <MAILER@cuvmb.cc.columbia.edu>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA05922; Wed, 22 Mar 89 12:59:17 EST
Resent-Message-Id: <8903221759.AA05922@watsun.cc.columbia.edu>
Message-Id: <8903221759.AA05922@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA04140; Wed, 22 Mar 89 12:59:06 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 1170; Wed, 22 Mar 89 12:55:01 EST
Received: by CUVMB (Mailer X1.25) id 8560; Wed, 22 Mar 89 12:55:00 EST
Received: from JPNKEKVM by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 8559; Wed, 22 Mar 89 12:54:59 EST
Received: by JPNKEKVM (Mailer R2.02) id 2301; Thu, 23 Mar 89 02:57:39 JST
Date:         THU, 23 MAR 89 02:57:09 JST
From: Hirofumi Fujii <KEIBUN%JPNKEKVM@cuvmb.cc.columbia.edu>
Subject:      Japanese character sets
To: Frank da Cruz <FDCCU@cuvmb.cc.columbia.edu>, Joe Doupnik <JRD@usu>,
        Ken-Ichirou Murakami <murakami%ntt-20.ntt.jp@jpntsuku>,
        Kohichi Nishimoto <s153380%tkov02.DEC@decwrl.dec.com>
Resent-Date: Wed, 22 Mar 89 12:54:59 EST
Resent-From: Network Mailer <MAILER@cuvmb.cc.columbia.edu>
Resent-To: fdc@cunixc.cc.columbia.edu

Dear Frank and Ken,

I think I can answer some of the Frank's questions.  I also put the
description about JIS at the bottom of this mail (Appendix A and
Appendix B).

1. Japanese code systems:
   The answer is NO.
   Japanese computer system has at least three character sets,
   Roman (almost ASCII), Katakana(1byte code), and Kanji
   (2byte code).  Kanji set also includes the Roman characters
   but the face of the character is double width.  Therefore it
   should be considered as different characters.
   The local representation for these character sets are

       OS           Roman       Katakana          Kanji
      ------       ----------  -------------    -------------------
      MS-DOS       JIS X 0201  JIS X 0201 in GR MS-Kanji (SHIFTJIS)
      VAX/VMS      US-national  (see note 1)    DEC-Kanji (see note 2)
      IBM/VM/CMS   EBCDIC      EBCDIC-Katakana  IBM-Kanji
      UNIX         JIS X0201   EUC (see note 3) EUC (see note 4)
      Elis         JIS X0201   EUC (see note 3) EUC (see note 4)

  (Note 1) Invoked by LS2 (Locking shift two).
  (Note 2) JIS X 0208 in GR (i.e., 8th bit on).
  (Note 3) Invoked by SS2 (Single shift two) and 8th bit on.
  (Note 4) JIS X 0208 in GR (i.e., 8th bit on).

   To switch the Roman, Katakana and Kanji, both shift mechnism and
 GR extension in 8bit environment are used.  MS-DOS uses GL as Roman(ASCII)
 ,GR as Katakana, and 1st byte of the Kanji is mapped to the C1- and undefined
 Katakana-area. Therefore MS-DOS does not need shift-mechanism but it violates
 the standard (C1 is used as a visible character).  VAX/VMS, UNIX and Elis uses
 GL as Roman, GR as Kanji, and Katakana is invoked by shift-mechanism.
 IBM/VM/CMS uses their original code system.  Kanji is invoked by some
 shift-like mechanism.

2. Kana
  The answer is YES.
  Kana is a Japanese phonetic character set.  There are two types of Kana,
  Hirakana (Hiragana) and Katakana. Both character sets are included in
  JIS X 0208 (Kanji set).  However, all the characters in JIS X 0208 has
  double width cahracter face.  JIS X 0201 alse defines Katakana and its
  character face is single width. (See Appendix A).

3. Intranal character set
  The answer is NO.  As described in 1, normally, we used at least three
  character sets.

4. JIS X 0202 (old name is JIS C 6228) and ISO 2022
  I'm not sure.  It is written in footnote of the JIS X 0202 that the
  JIS X 0202 correspond to the ISO 2022 (see APPENDIX B).


TERMINAL EMULATION

SET TERMINAL CHARACTER-SET <name>

My Kermit (MSVP98) have another command, SET TERMINAL KANJI CODE <name>.
The SET TERMIANAL CHARACTER-SET specifies GL character set.  However,
Kanji is mainly used in GR character set as described above.  This is
because we need another command to specify the Kanji code.  There is one
more reason we need another command.  It is the code for keyinput.
SET TERMINAL KANJI CODE also used for keyinput character conversion.
To unify these command, I propose

  SET TERMINAL CHARACTER-SET <name> [as {GL,GR}]

where the default is GL.  And for keyinput

  SET KEYINPUT CHARACTER-SET <name> [as {GL,GR}]

                                           Hirofumi Fujii
                           National Laboratory for High Energy Physics (KEK)

<<<<<<<<<<<<<<<<<<<<< Appendix A >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Japanese code for information interchange

JIS X 0201 Code for Information Interchange
JIS X 0208 Code of the Japanese Graphic Character Set for Informaiton
           Interchange

   First, I will explain JIS X 0208 (old name is JIS C6226).

   This is a code system for Japanese characters.  All Japanese characters
   are represented in 2byte code.  Each byte is ranged in '21'x - '7E'x, i.e.,
   the same range as ASCII character set.  This is called JIS Kanji code,
   but this set contains not only Kanji but also English, Greeks, Russians,
   etc. At present time, this character set contains the following characters.
       -  147 special symbols (+,-,square_root,arrows,etc.)
       -   10 numeric characters (0,1,2,...,9)
       -   52 Roman characters (A,B,C,..,Z,a,b,c,...,z)
       -   83 Hira-kana characters
       -   86 Kata-kana characters
       -   48 Greek characters (Upper alpha, beta,..., Lower alpha,..)
       -   66 Russian characters
       - 6353 Kanji characters (1st level 2965 + 2nd level 3388)
       -   32 Rules

   The code table of the JIS X 0208 looks like

              |                    2nd byte                       |
              +--+--+--+----+--+--+--+---+--+--+---------+--+--+--+
              |21|22|23 ....|30|31|32|   |41|42|......... 7C|7D|7E|
   -----------+--+--+--+----+--+--+--+---+--+--+---------+--+--+--+
        1  |21|SP|..|..|.................................|..|..|..|
        s  +--+--+--+--+.................................+--+--+--+
        t  |22|..|..|.......................................|..|..|
           +--+--+--+--.....+--+--+--+-..+--+--+-.........--+--+--+
        b  |23|..|..|.......| 0| 1| 2|...| A| B|............|..|..|
        y  +--+--+--+--.....+--+--+--+-..+--+--+-.........--+--+--+
        t  | :|...................................................|
        e  +--+--+--+--+.................................+--+--+--+
           |7E|..|..|..|.................................|..|..|..|
   -----------+--+--+--+---------------------------------+--+--+--+

   For example, the code of Roman 'A' is '2341'x (1st byte is '23'x
   and 2nd byte is '41'x).  Therefore, if you use simple English
   terminal, it is displayed as '#A'.  The diffrence between
   '2341'x and '41'x (ASCII 'A') is the width of the character face.
   All the JIS X0208 characters are double width because that the
   Kanji is so complex to display.


   There is another character set, JIS X 0201 (old name is JIS C6220).
   This is 1byte code system like ASCII. The character face is single width.
   This set contains Roman characters in the range '21'x - '7E'x, and
   Katakana in the range 'A1'x - 'DF'x.

        +--+--+--+--+--+--+--+--++--+--+--+--+--+--+--+--+
        |00|10|20|30|40|50|60|70||80|90|A0|B0|C0|D0|E0|F0|
        +--+--+--+--+--+--+--+--++--+--+--+--+--+--+--+--+
      00|     |SP|              ||        |        |     |
      01|     |                 ||     |           |     |
      02|  C  |                 ||  U  |           |  U  |
      03|  O  |                 ||  N  |           |  N  |
      04|  N  |                 ||  D  |           |  D  |
      05|  T  |                 ||  E  |           |  E  |
      06|  R  |     Roman       ||  F  | Katakana  |  F  |
      07|  O  |                 ||  I  |           |  I  |
      08|  L  |                 ||  N  |           |  N  |
      09|  S  |                 ||  E  |           |  E  |
      0A|     |                 ||  D  |           |  D  |
      0B|     |                 ||     |           |     |
      0C|     |                 ||     |           |     |
      0D|     |                 ||     |           |     |
      0E|     |                 ||     |           |     |
      0F|     |              DEL||     |           |     |
        +--+--+--+--+--+--+--+--++--+--+--+--+--+--+--+--+

                  JIS X0201 looks like this

   The code for Roman characters is almost equivalent to ASCII.


<<<<<<<<<<<<<<<<<<<<< Appendix B >>>>>>>>>>>>>>>>>>>>>>>>>>>

The followings are my understanding about the JIS X 0202.

JIS X 0202  Code Extension Techniques for Use with the Code
            for Information Interchange

1. This standard defines code extension techniques for information interchange
   in BOTH 7bit AND 8bit environments.
      ^^^^^^^^^^^^^^^^^^
2. There are four intermediate character sets called G0, G1, G2 and G3.
             ^^^^                                    ^^^^^^^^^^^^^^^^^
3. In 7bit environment, only one character set can be activated at one
   time.  The active character set can be selected from the above intermediate
   character set by issuing the following control codes
         SI  (Shift in)            invoke G0 character set
         SO  (Shift out)           invoke G1 character set
         LS2 (locking-shift two)   invoke G2 character set
         LS3 (locking-shift three) invoke G3 character set

4. In 8bit environment, two character sets GL and GR can be activated
   at one time. GL character set is selected if the 8th bit is OFF and GR
   is selected if the 8th bit is ON.  The active chatacter sets are
   selected by
         LS0  (Locking-shift zero)        invoke G0 character set to GL
         LS1  (Locking-shift one)         invoke G1 character set to GL
         LS2  (Locking-shift two)         invoke G2 character set to GL
         LS3  (Locking-shift three)       invoke G3 character set to GL
         LS1R (Locking-shift one right)   invoke G1 character set to GR
         LS2R (Locking-shift two right)   invoke G2 character set to GR
         LS3R (Locking-shift three right) invoke G3 character set to GR

5. In both 7bit and 8bit environments, a single-byte chatacter set is
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^
   designated to the intermediate character set by issuing the following
   ESC sequences
         ESC 2/8 F  or  ESC 2/12 F   designate character set F to G0
         ESC 2/9 F  or  ESC 2/13 F   designate character set F to G1
         ESC 2/10 F or  ESC 2/14 F   designate character set F to G2
         ESC 2/11 F or  ESC 2/15 F   designate character set F to G3

         where F is
           A:UK, B:US,..., I:JIS-Katakana, J:JIS-Roman etc.

   A multi-byte character set is designated to the intermediate character
     ^^^^^^^^^^^^^^^^^^^^^^^^
   set by
         ESC 2/4 F      or ESC 2/4 2/12 F  designate character set F to G0
         ESC 2/4 2/9 F  or ESC 2/4 2/13 F  designate character set F to G1
         ESC 2/4 2/10 F or ESC 2/4 2/14 F  designate character set F to G2
         ESC 2/4 2/11 F or ESC 2/4 2/15 F  designate character set F to G3

         where F is
           A:Chinese-Kanji, B:Japanese-Kanji etc.,

6. One character in the G2 or G3 character set can be invoked by
   issuing the
         SS2   invoke next one character from G2
         SS3   invoke next one character from G3
   In 8bit environment, the character is invoked to GL character set.

7. At the beginning of the information interchange, extension method
   used in the subsequent data stream is announced by
         ESC 2/0 F
   where F is one of the 4/1, 4/2, 4/3, 4/4, 4/5, 4/6, 4/7,
   5/0, 5/2, 5/3, 5/4, 5/5, 5/6, 5/7, 5/10, 5/11.
   For example,  4/1  G0 only. No LS. GR is not used.
                 4/2  G0 and G1. SI and SO (LS0 and LS1) are used.
                 4/3  G0 and G1 only in 8bit env. LS's are not used.
                      GL = G0, GR = G1.
                 etc.

-------------------------------------
Hirofumi Fujii
National Laboratory for High Energy Physics (KEK)
KEIBUN@JPNKEKVM.BITNET


22-Mar-89 23:40:10-GMT,3611;000000000011
Return-Path: <@cuvmb.cc.columbia.edu:PEPMNT@CFAAMP.BITNET>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA09707; Wed, 22 Mar 89 18:40:08 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 1374; Wed, 22 Mar 89 18:35:57 EST
Received: by CUVMB (Mailer X1.25) id 9382; Wed, 22 Mar 89 18:35:56 EST
Date: Wed, 1989 Mar 22   17:16 EST
From: "John F. Chandler"   <PEPMNT%CFAAMP@cuvmb.cc.columbia.edu>
Subject: Re: Kermit / ISO Transfer Syntax: 7-bit vs 8-bit
In-Reply-To: fdc@watsun.cc.columbia.edu message
  <CMM.0.88.606586769.fdc@watsun.cc.columbia.edu> of Wed, 22 Mar 1989 11:19:29
  EST
Message-Id: <PEPMNT.890322.171627.C0@CFAAMP.BITNET>
To: Frank da Cruz <fdc@watsun.cc.columbia.edu>

>                  A disadvantage is that languages like Russian, Greek,
> Hebrew, and Arabic that tend to stay in the G1 set will have a very high
> prefixing overhead in the 7-bit environment.
>
> The question now becomes: should Kermit continue to allow ISO 7-bit text
> transfer with locking shifts, as originally proposed?  If we do not, then
> Kermit programs that do not implement 8th-bit prefixing will not be able to
> transfer mixed-alphabet texts.

I'm not sure the term mixed-alphabet is quite the right one here.
Wouldn't the languages cited above normally be encoded in 8-bit
alphabets that are "mixed" only in the sense that the character sets
place the Latin alphabet in the G0 slots?  I would imagine that the
typical use would consist of un-mixed Cyrillic or Hebrew or whatever.

>   the full range of ISO 4873 / 2022 code extension techniques would
> give us the greatest flexibility (e.g. efficiency for both French and
> Cyrillic), but...

I have another suggestion that just occurred to me.  First, let me
state what I am assuming about the nature of text files:

1. Truly mixed-alphabet stuff is rare, that is, stuff reguiring more
   than a single 256-entry character set.  I realize that ideographic
   text representation requires more than 256 distinct characters, but
   I think the solution to that difficulty is to represent ideograms by
   strings of bytes, rather than to define universal escape sequences
   for switching among alphabets.

2. In non-ideographic representations, the language will either be
   written either entirely in G0 (or in G1) or will switch back and
   forth frequently between G0 and G1.  This certainly fits all the
   languages represented by the various ISO 8859 alphabets mentioned
   in the draft protocol extension.  I confess that I don't know the
   situation for the various Japanese syllabaries, nor whether anyone
   would choose to use them if given a chance to use the standard
   combination of kana and kanji.

Given the above, I think it makes sense to offer a single new feature:
8th-bit-complement mode.  In that mode, ALL codes 0-127 would be swapped
with those 128-255 as the first step of Kermit encoding (and the last
step of decoding).  Such a mode could be selected via an Attribute (or
perhaps by a Capability flag).  The implementation would be simple and
would entail no overhead associated with scanning data streams for
locking shifts.  8BC mode would be the method of choice for all the
languages that use G1 exclusively, but could also be useful for
transferring certain kinds of binary files -- there's nothing about it
that need be restricted to text files.  Being new to this discussion,
I don't know whether this idea has been suggested before, but it seems
to me to merit consideration.
                                     John

23-Mar-89  7:02:41-GMT,5061;000000000401
Return-Path: <JRD@cc.usu.edu>
Received: from cc.usu.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA15639; Thu, 23 Mar 89 02:02:37 EST
Message-Id: <8903230702.AA15639@watsun.cc.columbia.edu>
Date: Thu, 23 Mar 89 00:01 MDT
From: Joe Doupnik <JRD@cc.usu.edu>
Subject: ISO commentary
To: fdc@watsun.cc.columbia.edu
X-Vms-To: IN%"fdc@watsun.cc.columbia.edu",JRD

Chris, Frank, and the group:

	A little sense is dawning on me about ISO 7 vs 8 bit issue. It seems
that ISO 2022 is nearly the superset of the various specifications which have
been mentioned. It includes the availability of four graphics sets, G0-G3,
shifts between them, allowance for display symbols composed of two or more
characters, and the shift mechanisms to operate these tables via either 7
or 8 bit communications paths.

	ISO 4873, as I read it, is similar to 2022, but omits 7 bit codes,
is more stringent about the control codes (C0, C1), uses essentially the
same shift mechanisms execpt that SI/SO are replaced with full escape
sequences. 4873 also provides levels of capability for having only two, three,
or all four alphabet sets (G0-G3) whereas 2022 presumes presence of all four.
ISO 2022 has all the features, yet retains more conventional use of control
codes; 2022 seems to be the standard of choice.

	Since the underlying available alphabets are the same in these cases,
there being normally four, the communications part becomes concerned with the
number of bytes needed to select one or the other either for a single character
or for a string of characters. ISO 4873 prefers to use escape sequences to
toggle between the two active alphabets but 2022 retains the historical SI/SO
short form. In a 7 bit environment only one alphabet is active at any instant
so that either SI/SO or escape sequences are required to access any one of
the other three. In other words, the 8 bit scheme lets two alphabets be
"on line" but the 7 bit scheme is restricted to one "on line" (but the swap
from one to a second can be accomplished via SI/SO, a whole byte versus
one bit in a character).

	As Hirofumi demonstrated so clearly, there is a considerable difference
when three alphabets are needed in rapid succession. Without question, the
8-bit channel lets the high order bit in a byte select one of two and then
the shifting codes reload one alphabet upon demand, similar to paging memory.

	All of the above standards use an escape sequence to select which
particular "language" is to be loaded into G0-G3. Thus this is no longer
a communications performance issue.

	One communications issue which still is not clear to me is the ordinary
Kermit 8-bit quoting mechanism. If ISO-XXXX were used I suspect that it would
be applied "above" the ordinary Kermit methods (meaning before packet encoding
and after packet decoding). In that case ordinary Kermit provides the illusion
of an 8-bit channel, as it must for many systems.

	When the channel is 8-bits wide, it seems clear to me that an 8-bit
ISO style code is the shortest method for all the work.

	When the channel is 7-bits wide we would need to run tests to
determine whether ISO 7-bit shifts or Kermit 8-bit quoting is more efficient.
ISO wins on long strings in one alphabet by stating a lock-shift escape
sequence, and Kermit wins on most per-character alphabet swaps by needing only
one quote byte rather than an escape sequence (it is almost a draw when SI/SO
can be used to swap, but ISO loses by requiring a Kermit control-quote prefix
on SI/SO).

	I think I've talked myself into saying that ISO 2022, or similar,
has about all the features we need for both Western and Eastern languages
and that the 8-bit version is faster by letting Kermit's 8-bit quoting
mechanism provide the channel width (if necessary). The ISO 2022 locking
shift sequences allow any of the four alphabets to be loaded into the left
(low order) character codes and any of G1-G3 to be loaded into the right
half. This flexibility can drastically reduce the need for high bits on
characters when a single Western (one byte yields one graphical symbol)
language is used.

	The last doubt in my mind then relates to terminal emulation in a
7 bit environment. Here the Kermit 8 bit quoting mechanism is not available.
I think that it is not a difficult task to allow both 7 and 8 bit ISO shifts
to be available in a terminal emulator, selected automatically by presence of
parity and overridable by existing Kermit commands. People writing terminal
emulators also will need to face the direction of writing problem; it's not
so simple. [I know I should not say much about this now; but, you know,
keyboards generate a lot of these symbols. At some point we will need to think
more about translating keystrokes to communications line characters.]

	Finally, I'd like to express my appreciation to Hirofumi for explaining
the complicated typography environment in Japan using terms that even I can
understand.

	Joe Doupnik

P.S. Frank and Chris: I've not posted this to the group because my typing
skills would collapse about half way through!


23-Mar-89 20:52:12-GMT,5316;000000000001
Return-Path: <@cuvmb.cc.columbia.edu:PEPMNT@CFAAMP.BITNET>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA22919; Thu, 23 Mar 89 15:52:05 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 1882; Thu, 23 Mar 89 15:47:50 EST
Received: from CFAAMP.BITNET (PEPMNT) by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25)
 with BSMTP id 1139; Thu, 23 Mar 89 15:47:40 EST
Date: Thu, 1989 Mar 23   15:40:14 EST
From: (John F. Chandler)   PEPMNT@cfaamp
To: Joe Doupnik <jrd@usu.bitnet>, Paul Placeway <paul@cis.ohio-state.edu>,
        Andre Pirard <A-PIRARD@bliulg11.bitnet>,
        Baruch Cochavy <baruchc@techunix.bitnet>,
        Johan Van Wingen  <MOSGLA@hlerul2.bitnet>,
        Ken-ichiro Murakami   <MURAKAMI%NTT-20.NTT.JP@relay.cs.net>,
        Kohichi Nishimoto <s153380%tkov02.DEC@decwrl.dec.com>,
        Hirofumi Fujii <KEIBUN@jpnkekvm.bitnet>,
        "Gisbert W. Selke"   <RECK@dbnuama1.bitnet>,
        Kurt Enulf <UPSKE@seguc11.bitnet>,
        Jacob Palme <jacob_palme_qz@qzcom.bitnet>,
        Per Lindberg <Per_Lindberg_ZQ@qzcom.bitnet>,
        "Bj|rn Larsen" <x_larsen_b@use.uio.uninett>,
        "Hans A. ]lien" <hans%ifi.uio.no@tor.nta.no>,
        Steve Jenkins <pdsoft@central1.lancaster.ac.uk>,
        Jean Dutertre <dutertre@padis1.dec>, Gerard Gaye <GAYE@frsac11.bitnet>,
        David Guerlet <KERMIT@czheth5a.bitnet>,
        Bernie Eiben <eiben@tops20.dec.com>,
        Volker Edelhoff   <edelhoff@unido.bitnet>,
        John Chandler <pepmnt@cfaamp.bitnet>,
        Frank da Cruz <fdc@watsun.cc.columbia.edu>
Subject: Re: Kermit / ISO Transfer Syntax: 7-bit vs 8-bit
In-Reply-To: fdc@watsun.cc.columbia.edu message
  <CMM.0.88.606586769.fdc@watsun.cc.columbia.edu> of Wed, 22 Mar 1989 11:19:29
  EST
Message-Id: <PEPMNT.890322.171627.C0@CFAAMP.BITNET>

Pardon the resending of this message -- the first time I tried, I used
the LISTSERV DISTRIBUTE feature, and at least one LISTSERV in the chain
rejected the message, so that an unknown number of copies were lost.
Rather than guess who got it, I'm just sending to everyone again...

-----------------------------------------------------------------
Date: Wed, 1989 Mar 22   17:16 EST
Subject: Re: Kermit / ISO Transfer Syntax: 7-bit vs 8-bit
In-reply-to: fdc@watsun.cc.columbia.edu message
  <CMM.0.88.606586769.fdc@watsun.cc.columbia.edu> of Wed, 22 Mar 1989 11:19:29
  EST
Message-id: <PEPMNT.890322.171627.C0@CFAAMP.BITNET>

>                  A disadvantage is that languages like Russian, Greek,
> Hebrew, and Arabic that tend to stay in the G1 set will have a very high
> prefixing overhead in the 7-bit environment.
>
> The question now becomes: should Kermit continue to allow ISO 7-bit text
> transfer with locking shifts, as originally proposed?  If we do not, then
> Kermit programs that do not implement 8th-bit prefixing will not be able to
> transfer mixed-alphabet texts.

I'm not sure the term mixed-alphabet is quite the right one here.
Wouldn't the languages cited above normally be encoded in 8-bit
alphabets that are "mixed" only in the sense that the character sets
place the Latin alphabet in the G0 slots?  I would imagine that the
typical use would consist of un-mixed Cyrillic or Hebrew or whatever.

>   the full range of ISO 4873 / 2022 code extension techniques would
> give us the greatest flexibility (e.g. efficiency for both French and
> Cyrillic), but...

I have another suggestion that just occurred to me.  First, let me
state what I am assuming about the nature of text files:

1. Truly mixed-alphabet stuff is rare, that is, stuff reguiring more
   than a single 256-entry character set.  I realize that ideographic
   text representation requires more than 256 distinct characters, but
   I think the solution to that difficulty is to represent ideograms by
   strings of bytes, rather than to define universal escape sequences
   for switching among alphabets.

2. In non-ideographic representations, the language will either be
   written either entirely in G0 (or in G1) or will switch back and
   forth frequently between G0 and G1.  This certainly fits all the
   languages represented by the various ISO 8859 alphabets mentioned
   in the draft protocol extension.  I confess that I don't know the
   situation for the various Japanese syllabaries, nor whether anyone
   would choose to use them if given a chance to use the standard
   combination of kana and kanji.

Given the above, I think it makes sense to offer a single new feature:
8th-bit-complement mode.  In that mode, ALL codes 0-127 would be swapped
with those 128-255 as the first step of Kermit encoding (and the last
step of decoding).  Such a mode could be selected via an Attribute (or
perhaps by a Capability flag).  The implementation would be simple and
would entail no overhead associated with scanning data streams for
locking shifts.  8BC mode would be the method of choice for all the
languages that use G1 exclusively, but could also be useful for
transferring certain kinds of binary files -- there's nothing about it
that need be restricted to text files.  Being new to this discussion,
I don't know whether this idea has been suggested before, but it seems
to me to merit consideration.
                                     John

24-Mar-89  0:20:05-GMT,1408;000000000001
Return-Path: <fdc>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA25079; Thu, 23 Mar 89 19:20:03 EST
Date: Thu, 23 Mar 1989 19:20:02 EST
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: "John F. Chandler" <PEPMNT%CFAAMP@cuvmb.cc.columbia.edu>
Subject: Re: Kermit / ISO Transfer Syntax: 7-bit vs 8-bit
In-Reply-To: Your message of Wed, 1989 Mar 22 17:16 EST
Message-Id: <CMM.0.88.606702002.fdc@watsun.cc.columbia.edu>

We were just rereading your suggestion about 8th-bit-complement mode.
The underlying idea seems to be that if in a given environment (like
Hebrew or Cyrillic) most characters are from GR, then complementing the
8th bit would give greater efficiency.  So this would have to be on a
per-file (or per-language) basis.  It wouldn't help out the Germans or
French, who must switch between GL and GR frequently.  And it certainly
would not bring much benefit to the Japanese (see Hirofumi Fujii's message
that I forwarded to you a few minutes ago, even though this one might
arrive first).  We're beginning to think the only way to satisfy everybody
is to allow a full implementation of ISO 2022, perhaps (as Hiro suggested)
with an announcer in the attribute packet to specify what facilities are
being used -- full 8-bit transfer with no shifts, 7-bit transfer with
locking shifts, and (sigh) even allowing for G2 and G3 sets and the
corresponding single shifts.  - Chris & Frank

24-Mar-89  3:41:59-GMT,6351;000000000001
Return-Path: <@cuvmb.cc.columbia.edu:PEPMNT@CFAAMP.BITNET>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA27154; Thu, 23 Mar 89 22:41:56 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 2055; Thu, 23 Mar 89 22:37:41 EST
Received: from CFAAMP.BITNET (PEPMNT) by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25)
 with BSMTP id 1745; Thu, 23 Mar 89 22:37:34 EST
Date: Thu, 1989 Mar 23   20:10:47 EST
From: (John F. Chandler)   PEPMNT@cfaamp.bitnet
To: Joe Doupnik <jrd@usu.bitnet>, Paul Placeway <paul@cis.ohio-state.edu>,
        Andre Pirard <A-PIRARD@bliulg11.bitnet>,
        Baruch Cochavy <baruchc@techunix.bitnet>,
        Johan Van Wingen  <MOSGLA@hlerul2.bitnet>,
        Ken-ichiro Murakami   <MURAKAMI%NTT-20.NTT.JP@relay.cs.net>,
        Kohichi Nishimoto <s153380%tkov02.DEC@decwrl.dec.com>,
        Hirofumi Fujii <KEIBUN@jpnkekvm.bitnet>,
        "Gisbert W. Selke"   <RECK@dbnuama1.bitnet>,
        Kurt Enulf <UPSKE@seguc11.bitnet>,
        Jacob Palme <jacob_palme_qz@qzcom.bitnet>,
        Per Lindberg <Per_Lindberg_ZQ@qzcom.bitnet>,
        "Bj|rn Larsen" <x_larsen_b@use.uio.uninett>,
        "Hans A. ]lien" <hans%ifi.uio.no@tor.nta.no>,
        Steve Jenkins <pdsoft@central1.lancaster.ac.uk>,
        Jean Dutertre <dutertre@padis1.dec>, Gerard Gaye <GAYE@frsac11.bitnet>,
        David Guerlet <KERMIT@czheth5a.bitnet>,
        Bernie Eiben <eiben@tops20.dec.com>,
        Volker Edelhoff   <edelhoff@unido.bitnet>,
        John Chandler <pepmnt@cfaamp.bitnet>,
        Frank da Cruz <fdc@watsun.cc.columbia.edu>
Subject: Re: Kermit / ISO Transfer Syntax: 7-bit vs 8-bit
In-Reply-To: fdc@watsun.cc.columbia.edu message
  <CMM.0.88.606702002.fdc@watsun.cc.columbia.edu> of Thu, 23 Mar 1989 19:20:02
  EST
Message-Id: <PEPMNT.890323.201047.C0@CFAAMP.BITNET>

> The underlying idea seems to be that if in a given environment (like
> Hebrew or Cyrillic) most characters are from GR, then complementing the
> 8th bit would give greater efficiency.  So this would have to be on a
> per-file (or per-language) basis.

Right.  As I pointed out, this need not even be restricted to text files,
since some binaries would have a preponderance of bytes with the 8th bit
set and could profit from the same efficiency.

>                                    It wouldn't help out the Germans or
> French, who must switch between GL and GR frequently.

I think you can easily prove that any scheme without a carefully
tailored compression will not help the French, Germans, Swedes, etc.
One possibility, of course, is to adopt a transfer coding that swaps
the essential extra characters for little-used symbols -- German, in
particular, requires only 7 extra letters, and though French technically
requires 18, a common practice is to omit diacritical marks on upper-case
vowels, thereby reducing the need to 10.  Such a scheme would obviously
have to be worked out for each language and would therefore entail a lot
of work, but it would certainly increase the "efficiency" of transfers.
I doubt, though, that the overhead involved in managing code selections
would be worthwhile, and the complexity introduced by having an extra,
generalized translation step would certainly give everyone headaches --
it's bad enough in TTY-mode IBM mainframe transfers with disk -(ETOA)->
-(TATOE)-> -(system E to A with parity)-> receiver and vice versa.

>                                                        And it certainly
> would not bring much benefit to the Japanese

Well, that depends.  Hirofumi Fujii's example was something of an
extreme case, and there is certainly one mode of operation that would
fit in very well, namely, using Kana to the exclusion of Kanji.  The
occasional word in Roman text would, in 8th-bit-complement mode, have to
get 8th-bit prefixing, but the bulk of the text would go through with
one byte transmitted per character.  The difficulty, of course, is that
Japanese has lots of homonyms, so that Kana-only text can be ambiguous.
Thus, that mode is somewhat less than desirable.

This brings me to the final point.  I agree that the exigencies of
Japanese text require more than a single 256-character alphabet, but
they may be unique in that respect -- Chinese, for example, does not
have a syllabary and so might use an entirely 2-byte representation for
ideogram-only text (I don't know what they would do with foreign words
or how frequently they would come up).  The question I would raise is
whether there is a need for a translation between the stored text and
the transmission medium.  Note the distinction: hard copy of a text file
may appear to have a mixture of fonts and alphabets, but the underlying
disk file has everything encoded into 1-byte units (that is, in all the
schemes we seem to be considering).  Kermit can transfer the bytes
without knowing how to decode them and can also transfer an attribute or
two informing the receiver how, in principle, to decode them.
Therefore, there is only one reason for Kermit to define a standard
transmission protocol that includes decoding the stored disk file and
re-encoding in the transmission protocol, namely, the fear that a
receiving machine will (A) need to use the text file locally in some
form *other* than the original, (B) have a Kermit smart enough to decode
the transmission protocol and re-encode in that *other* form, and (C)
*not* have other software smart enough to translate the coding scheme
found on the originating machine.  One possible scenario I can imagine
is the case of an IBM mainframe with one ETOA mapping for a combined
Roman+Kana set from "EBCDIC+Kana" to ASCII+Kana and a completely
different mapping (perhaps idempotent) from the mainframe Kanji codes to
the JIS ones.  If such a thing can happen, and if the file is to be
transmitted usefully to unlike machines, then it is clearly necessary
for the mainframe Kermit to decode the file before transmitting.  In
that case, I agree that an implementation of something like the full ISO
2022 is necessary.  On the other hand, it may be that a single mapping
suffices, or perhaps nobody would want to process mixed-alphabet text
files on an IBM mainframe anyway.  I think a fuller discussion of the
actual situation would be useful here.
                                      John

24-Mar-89 13:59:44-GMT,14064;000000000001
Return-Path: <@cuvmb.cc.columbia.edu:A-PIRARD@BLIULG11.BITNET>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA00903; Fri, 24 Mar 89 08:59:40 EST
Message-Id: <8903241359.AA00903@watsun.cc.columbia.edu>
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 2135; Fri, 24 Mar 89 08:55:25 EST
Received: from VM1.EARN-ULG.AC.BE by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with
 BSMTP id 2027; Fri, 24 Mar 89 08:55:23 EST
Received: by BLIULG11 (Mailer R2.02) id 1684; Fri, 24 Mar 89 14:48:29 +0100
Date:         Fri, 24 Mar 89 14:37:42 +0100
From: Andre' PIRARD <A-PIRARD%BLIULG11@cuvmb.cc.columbia.edu>
Subject:      Re: ISO/Kermit 7 vs 8 bit transfer syntax
To: Frank da Cruz <fdc@watsun.cc.columbia.edu>,
        Paul Placeway <paul@cis.ohio-state.edu>,
        Andre Pirard <A-PIRARD@bliulg11>, Baruch Cochavy <baruchc@techunix>,
        Johan Van Wingen <MOSGLA@hlerul2>,
        Ken-ichiro Murakami <MURAKAMI%NTT-20.NTT.JP@relay.cs.net>,
        Kohichi Nishimoto <s153380%tkov02.DEC@decwrl.dec.com>,
        Hirofumi Fujii <KEIBUN@jpnkekvm>, "Gisbert W.Selke" <RECK@dbnuama1>,
        Kurt Enulf <UPSKE@seguc11>, Jacob Palme <jacob_palme_qz@qzcom>,
        Per Lindberg <Per_Lindberg_ZQ@qzcom>,
        Bj|rn Larsen <x_larsen_b%use.uio.uninett@tor.nta.no>,
        "Hans A. ]lien" <hans%ifi.uio.no@tor.nta.no>,
        Steve Jenkins <pdsoft%uk.ac.lancs.cent1@nss.cs.ucl.ac.uk>,
        Jean Dutertre <dutertre%padis1.DEC@decwrl.dec.com>,
        Gerard Gaye <GAYE@frsac11>, David Guerlet <KERMIT@czheth5a>,
        Bernie Eiben <eiben@tops20.dec.com>, Volker Edelhoff <edelhoff@unido>,
        John Chandler <pepmnt@cfaamp>, Joe Doupnik <JRD@usu>
In-Reply-To:  Message of Thu, 23 Mar 1989 10:24:16 EST from
 <fdc@watsun.cc.columbia.edu>


     Despite my being in a desolated empty room because of our
moving, I interleave some comments with packing as long as I am
still able to use the network.

     You will find below a document I find essential to our
discussion. Thanks to Johan van Wingen for the information. It is
an answer to my first note: "how would systems store multilingual
character sets". In other words, "what would Kermit have to
transfer". I see this ISO 10646 project as responding to the
fundamental needs of leading software developers with
international scope and I guess they must be longing for that.
As we say, "Il n'y a pas de fum)e sans feu".
The task of Kermit would be much easier on these more efficient
grounds (I guess translating between the 4 "forms-of-use" and
between them and single-byte ISO 8859 versions).

     But I think that, despite their evident lack of performance,
the present standards will continue to hold for terminal mode
tied to 7-bit or 8-bit lines because they keep in line with
present hardware in working by "adding more fonts". It's probably
a thing to do on machines with a graphic screen even though many
users will be satisfied with a single ISO version. But for that
reason, the hosts driving the terminal might forget to specify
which version it uses and the default one should be customizable.

     I hope this unifying standard will come very soon to solve
those intricate problems of Asian languages. The fundamental
question is whether to wait or do something in the meantime that
could have to be erased in the end? These Asian people have a
strong vote weight. My own testimony is that we will use our
single ISO version while waiting and are interested by a mere
hidden byte-to-byte translation capability in file transfer in
addition to the terminal mode extensions.

     Hear you all in one week time.

Andr).

Date:         Fri, 10 Feb 89 15:27:00 CET
From:         Johan van Wingen <MOSGLA@HLERUL2>
Subject:      Informal Introduction to ISO 10646
To:           Andre' Pirard <A-PIRARD@BLIULG11>


1

  INTERNATIONAL ORGANIZATION FOR STANDARDIZATION    ISO/IEC JTC1/SC2/WG2
  INTERNATIONAL ELECTROTECHNICAL COMMISSION                       N 274

  Joint Technical Committee 1
  Subcommittee 2 Characters and Information Coding, Working Group 2


  ======================================================================
  Introduction to ISO 10646 - Multiple-Octet Coded Character Set
  ======================================================================

  A new standard is being developed within Working Group 2 of ISO/IEC
  JTC1/SC2 for the multiple-octet coded character set. Formal drafts
  will be issued during 1989.

  Its purpose is to provide a single character code which will permit
+     _______
  the written form of all present-day languages throughout the world to
  be used within computers, to be processed and interchanged. All types
  of text written in character form will be provided for, from simple
  commercial documents to publication of technical reports etc. Also the
  bibliographic requirements of librarians will be met.

  The structure of the whole code may be illustrated thus, with an octet
+     _________                                                    _____
  of bits for each dimension:


                                           ZDDDDDDDDDDDDDDDDDDD?
                                      ZDDDDDDDDDDDDDDDDDDD?    3
                                 ZDDDDDDDDDDDDDDDDDDD?    3    3
                            ZDDDDDDDDDDDDDDDDDDD?    3    3    3
     Plane             ZDDDDDDDDDDDDDDDDDDD?    3    3    3    3
    /             ZDDDDDDDDDDDDDDDDDDD?    3    3    3    3    3
   /         ZDDDDDDDDDDDDDDDDDDD?    3    3    3    3    3    3
  ZDD>  ZDDDDDDDDDDDDDDDDDDD?    3    3    3    3    3    3    3
  3Cell 3                   3    3    3    3    3    3    3    3
  3     3  ZDDDDDD?  ZDDDDDD    3    3    3    3    3    3    3
  V     3  3  A00 3  3  A01 3    3    3    3    3    3    3    3
  Row   3  DDDDDD  DDDDDD    3    3    3    3    3    3    3
        3  3      3  3      3    3    3    3    3    3    3    3
        3  3  J1  3  3  DD  3    3    3    3    3    3    3    3
        3  3      3  3      3    3    3    3    3    3    3    3
        3  @DDDDDDY  @DDDDDD    3    3    3    3    3    3    3
        3                   3    3    3    3    3    3    3DDDDY
        3  ZDDDDDD?  ZDDDDDD    3    3    3    3    3DDDDY
        3  3  A10 3  3  A11 3    3    3    3    3DDDDY (future
        3  DDDDDD  DDDDDD    3    3    3DDDDY   standardization)
        3  3      3  3      3    3    3DDDDY (Korean)
        3  3  C1  3  3  K1  3    3DDDDY (Japanese)
        3  3      3  3      3DDDDY (Chinese)
        @DDJDDDDDDJDDJDDDDDDY (bibliographic)

    Basic multi-lingual plane                  Supplementary planes


  The basic multi-lingual plane will contain four segments for graphic
+     _________________________                   ________
  characters, each holding 96 * 96 characters.

  Each segment will be divided into two zones: an alphabetic zone of
+                                       _____
  16 * 96 characters, and another zone either for the most-frequently
  used characters of the Chinese, Japanese and Korean ideographic
  scripts, or for certain special purposes.

  The shaded area outside the graphic quadrants will be used for control
+                                                                _______
  functions. All those of ISO 6429, ISO 6937 and ISO 8613 will be
+ _________
  available, with the same coding.

  The supplementary planes will accomodate characters that overflow from
+ ________________________
  the basic multi-lingual plane.
1
  A coded character anywhere in the code may be uniquely identified by
  means of three octets:

   m-s  ZDDDDDDDDDDDDDD>DDDDDDDDDDDDDD>DDDDDDDDDDDDDD?  l-s
        3 Plane-octet  3 Row-octet    3 Cell-octet   3
        @DDDDDDDDDDDDDDJDDDDDDDDDDDDDDJDDDDDDDDDDDDDDY

    NOTE: Sequences of characters run horizontally along the rows, not
          vertically as in previous code tables.

  The code may be used in different forms-of-use:
+                                   ____________

    a) A four-octet form, in which the three octets for the character
       are preceded by one for systems use. Three octet coding will
       never be used.

    b) A two-octet form, restricted exclusively to a single plane.
       Especially for users with alphabetic scripts, this will
       accomodate probably 99% of their applications.

    c) A two-octet form with extension using occasional four-octets.

    d) A compacted form, permitting strings of related characters to be
       used as single-octets.

  The basic multi-lingual plane is being designed to permit easy
  inter-working with existing 8-bit codes. Generally, conversion will be
  by the table look-up technique; however, conversion with ISO 8859
  parts 1,2,5,6,7,8 may use a simple algorithm.

  All designation, invocation and shifting as in ISO 2022 will be
  avoided.
+ _______

  It is considered that the consequent simplification of software,
+                                      __________________________
  especially for generalized applications in the OSI environment, will
  make this code economically attractive despite the the relatively
  extravagant use of bits.

  The layout of the basic multi-lingual plane may be illustrated in
+     ______        _________________________
  FIGURE 1 (next page), the axes being not drawn linearly.

    NOTE: The value of any octet is shown in simple decimal notation,
          e.g.  032, 255.

  The contents of any of the rows are set out in detailed code tables.
+                                                ____________________
  These are drawn on a pro-forma which shows a complete row in twelve
  strips, each of 16 graphic characters.

  Because the code is designed to be used as a whole, especially the
  basic multi-lingual plane, no significance attaches to whether certain
  characters are in the left hand or right-hand halves of a row, or
  early or late in the code table.

  A character once included in the code table is not duplicated
  elsewhere. Therefore for any particular application characters will
  be taken from many different places in the code table. For example
  users within Greece will find Greek letters in row 040, the equivalent
  Latin letters they use for transliteration in row 032, and some
  symbols they use in row 034.

  It will be trivially easy to adapt any equipment designed for the
  Japanese or Chinese scripts to provide all the characters of the basic
  multi-lingual plane. Therefore it is expected that suitable
  cost-effective equipment will become readily available.
+ ________________________

  The feature of fixed length coding, especially in the two-octet
+                ___________________
  mode-of-use, will make this code very easy to use in high-level
  programming languages and other software as employed for OSI and ODA.


  Hugh McG Ross, editor.                        Revised  Oct.  1988


1


  FIGURE 1    ISO 10646  Structure of the basic multi-lingual plane


        /   /                      /  /                       /
  Row. /000/032   Cell-octet   126/  /160                 255/
  oct.ZDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD?
   0003                                                     3
      3  ZDDDDDDDDDDDDDDDDDDDDDDD?  ZDDDDDDDDDDDDDDDDDDDDDDD
   0323  3   Latin script for    3  3 European languages    3 \
   0333  3   ISO 8859-1 and -2   3  3 and ISO 6937-2        3  \
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD   \
   0343  3   Extended symbols    3  3 from ISO 8879         3    \
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD     \
   0353  3   Extended Latin      3  3 script for            3      \
      3  3     all world         3  3 languages             3       \
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD        \
   0373  3   Special African and 3  3 phonetic letters      3         \
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD Alphabetic
   0383  3   Cyrillic script for 3  3 major languages       3
      3  3     Cyrillic for all  3  3 minority languages    3   scripts
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD          /
   0403  3       Greek script    3  3 for all               3         /
      3  3          forms of     3  3  writing              3        /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD       /
   0423  3   Arabic script for   3  3 all languages         3      /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD     /
   0433  3            Hebrew     3  3 script                3    /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD   /
   0443  3             Other     3  3 scripts               3  /
      3  3                       3  3                       3 /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD
   0483  3     Japanese          3  3    Special Purpose    3 Ideographs
      3  3     JIS X 0208        3  3                       3
   1263  3                       3  3                       3
      3  @DDDDDDDDDDDDDDDDDDDDDDDY  @DDDDDDDDDDDDDDDDDDDDDDD
      3                                                     3
      3  ZDDDDDDDDDDDDDDDDDDDDDDD?  ZDDDDDDDDDDDDDDDDDDDDDDD \
   1603  3                       3  3                       3  \
      3  3             Indian    3  3 scripts               3   \
      3  3                       3  3                       3
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD Alphabetic
      3  3         Mathematical  3  3 symbols               3   /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD  /
      3  3           Oriental    3  3 scripts               3 /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD
   1763  3      Chinese          3  3    Korean             3 Ideographs
      3  3      GB 2312          3  3   KS C 5601           3
   2553  3                       3  3                       3
      @DDJDDDDDDDDDDDDDDDDDDDDDDDJDDJDDDDDDDDDDDDDDDDDDDDDDDY

24-Mar-89 17:35:49-GMT,2564;000000000001
Return-Path: <fdc>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA02622; Fri, 24 Mar 89 12:30:49 EST
Date: Fri, 24 Mar 1989 12:30:49 EST
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: ISO/Kermit Discussion Group <isokermit@watsun.cc.columbia.edu>
Subject: New ISO/Kermit Mailing List
Message-Id: <CMM.0.88.606763849.fdc@watsun.cc.columbia.edu>

A new mailing list has been set up at Columbia for the benefit of those who
are interested in taking part in the discussion of adding mechanisms for
transfer of multi-language text to the Kermit file transfer protocol.  You
can send e-mail to everyone in this group by using any of the following
addresses:

    isokermit@watsun.cc.columbia.edu
    isokermit@cunixc.cc.columbia.edu
    ISOKERM@CUVMA.BITNET

(It might take a few days for the BITNET/EARN address to be established but
the others are working now.)  The isokermit mailing list currently has the
following members:

    Christine M Gianone <cmg@watsun.cc.columbia.edu>
    Joe Doupnik <jrd@usu.bitnet>
    Paul Placeway <paul@cis.ohio-state.edu>
    Andre Pirard <A-PIRARD@BLIULG11.BITNET>
    Gisbert W. Selke <RECK@DBNUAMA1.BITNET>
    Ken-ichiro Murakami <MURAKAMI%NTT-20.NTT.JP@RELAY.CS.NET>
    Hirofumi Fujii <KEIBUN@JPNKEKVM.BITNET>
    Kohichi Nishimoto <s153380%tkov02.DEC@decwrl.dec.com>
    Dvorah <DVORAH@HUJIAGRI.BITNET>
    Baruch Cochavy <baruchc@techunix.bitnet>
    Johan Van Wingen <MOSGLA@HLERUL2.BITNET>
    Kurt Enulf <UPSKE@SEGUC11.BITNET>
    Jacob Palme <jacob_palme_qz@QZCOM.BITNET>
    Per Lindberg <Per_Lindberg_ZQ@QZCOM.BITNET>
    Frithjov Iversen <FI@NORUNIT.BITNET>
    "Bj|rn Larsen" <x_larsen_b%use.uio.uninett@TOR.NTA.NO>
    "Hans A. ]lien" <hans%ifi.uio.no@TOR.NTA.NO>
    Steve Jenkins <pdsoft%uk.ac.lancs.cent1@nss.cs.ucl.ac.uk>
    Jean Dutertre <dutertre%padis1.DEC@decwrl.dec.com>
    Gerard Gaye <GAYE@FRSAC11.BITNET>
    David Guerlet <KERMIT@CZHETH5A.BITNET>
    Volker Edelhoff <edelhoff@UNIDO.BITNET>
    Frank da Cruz <fdc@watsun.cc.columbia.edu>

This group was selected to provide a cross-section of our international
Kermit correspondents, and also to get some expertise from active members
of the ISO8859 discussion group.  Once the members of this group have
come to a reasonable concensus, the proposal will be placed before the
(much) larger Kermit and ISO8859 discussions groups.

Please respond to this message, so we will know if the mailing list is
working.  Let us know if you want to be removed, or if you know of someone
else who should be added.  Thanks!

- Christine and Frank

25-Mar-89  2:58:09-GMT,1696;000000000001
Return-Path: <MAILER@cuvmb.cc.columbia.edu>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA08103; Fri, 24 Mar 89 21:58:08 EST
Resent-Message-Id: <8903250258.AA08103@watsun.cc.columbia.edu>
Message-Id: <8903250258.AA08103@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA10105; Fri, 24 Mar 89 21:57:59 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 2491; Fri, 24 Mar 89 21:53:49 EST
Received: by CUVMB (Mailer X1.25) id 3215; Fri, 24 Mar 89 21:53:48 EST
Received: from JPNKEKVM by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 3214; Fri, 24 Mar 89 21:53:47 EST
Received: by JPNKEKVM (Mailer R2.02) id 4600; Sat, 25 Mar 89 11:57:14 JST
Date:         SAT, 25 MAR 89 11:56:41 JST
From: Hirofumi Fujii <KEIBUN%JPNKEKVM@cuvmb.cc.columbia.edu>
Subject:      Kermit 8th bit quoting
To: Joe Doupnik <JRD@usu>, Frank da Cruz <FDCCU@cuvmb.cc.columbia.edu>,
        Ken-Ichirou Murakami <murakami%ntt-20.ntt.jp@jpntsuku>
Resent-Date: Fri, 24 Mar 89 21:53:47 EST
Resent-From: Network Mailer <MAILER@cuvmb.cc.columbia.edu>
Resent-To: fdc@cunixc.cc.columbia.edu


Sorry, I did not know that the 8-th bit quoting of the Kermit is OPTIONAL.

Please give me a few days to consider the transfer method for Japanese text
file in 7-bit environment( of course it is possible if we use the ISO-2022
for 7-bit environmet... but I want to check the efficiency etc....).

                                         Hirofumi Fujii
                            National Laboratory for High Energy Physics
                                             (KEK)

26-Mar-89 22:19:24-GMT,3916;000000000001
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA00995; Sun, 26 Mar 89 17:14:32 EST
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA27610; Sun, 26 Mar 89 17:14:19 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 2782; Sun, 26 Mar 89 17:10:11 EST
Received: from techunix.bitnet (BARUCHC) by CUVMB.CC.COLUMBIA.EDU (Mailer
 X1.25) with BSMTP id 4155; Sun, 26 Mar 89 17:10:09 EST
Return-Path: <baruchc%techunix.bitnet@CUVMB.COLUMBIA.EDU>
Date: Mon, 27 Mar 89 00:13:26 +0200
From: Baruch Cochavy <baruchc%techunix.bitnet@cuvmb.cc.columbia.edu>
Comments:  Domain style address is "baruchc@techunix.technion.ac.il"
Message-Id: <8903262213.AA26074@techunix.bitnet>
To: isokermit@cunixc.cc.columbia.edu
Subject: ISO/Kermit - some remarks.

Here is a comment I had sent to Joe Doupnik and his reply.

Hello Joe,

        I mail this query to you and not to the list, since my view would
be a bit controversial.

        First, let me see if I get the situation right:

     1. We all have local file formats.
     2. We wish there was a way Kermit could support local file format
        exchange.
     3. So, ISO (whatever ..) is considered as a common ground.

        Now, this all means that the local Kermit would have to have some
knowledge of the local file format and local character sets
representation. Since no common file structure, nor data representation
exists, this means that each and every Kermit would have to know not
only local file formats, but also remote file formats.
        Take MS-Kermit and C-Kermit, for example. If I transfer an
Alef-Bet (A Hebrew word processor) file, than *both* the sender and the
receiver must be aware of the file format and representation, for this
file represents Hebrew characters in the 80h-9ah, per IBM code page
972. Else, the receiver side would have to store the data in some common
format and representation, hoping that it would be usable at it's side.
        Now, given the number of different file formats available, and
local data representation, I can see no way we can produce anything
meaningful.

        I'm sure I got things wrong somewhere along the line. Could you
please enlighten me ?

        Many thanks,
        Baruch Cochavy

        baruchc@techunix.BITNET
        baruchc@techunix.technion.ac.il

>From jrd@usu.bitnet Sun Mar 26 23:31:35 1989

Baruch,
        Yes, you are quite correct. If there are zillions of local file
formats then the local Kermit would need to know about some of them, on
a local basis. The idea is to convert them, as much as possible, to some
"standard" format for transmission. Needless to say, we both agree that
the local formats are application specific and unless there is a simple
minded way of writing a filter program for use at run time then the Kermits
would be, ah, rather large!
        I'll forward one of my messages to Frank on this particular item. We
see eye to eye on this one.
        Columbia's view is, I think: yes, this is true right now. However, in
the near future vendors might shift to the "standard" forms and the Kermit
project will be one of the forces being applied. In addition, local filter
programs could be written as standalone items, to convert applications output
to ISO XXXX or another acceptable format.
        The terminal emulation part is not bad for many situations, but even
that does not accomodate some programs now in existence (such as Alef-Bet).
        You may want to repeat your comments for the group since the
consequences are substantial.
        Meanwhile, I'm preparing to add ISO ???? support in the terminal
emulator as an extension of the current material (won't lose anything).
I want to make any new mechanism more pleasant for your situation at the
same time.
        Thanks for the comment,
        Joe D.

-----


From fdc@watsun.cc.columbia.edu  Sun Mar 26 17:38:32 1989
Return-Path: <fdc@watsun.cc.columbia.edu>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA01322; Sun, 26 Mar 89 17:38:32 EST
Received: from watsun.cc.columbia.edu by cunixc.cc.columbia.edu (5.54/5.10) id AA28224; Sun, 26 Mar 89 17:38:20 EST
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA01319; Sun, 26 Mar 89 17:38:27 EST
Date: Sun, 26 Mar 1989 17:38:26 EST
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: Baruch Cochavy <baruchc%techunix.bitnet@cuvmb.cc.columbia.edu>
Cc: isokermit@cunixc.cc.columbia.edu
Subject: Re: ISO/Kermit - some remarks.
In-Reply-To: Your message of Mon, 27 Mar 89 00:13:26 +0200
Message-Id: <CMM.0.88.606955106.fdc@watsun.cc.columbia.edu>

In response to Baruch's message...  We had hoped it would be clearer that we
are proposing a "common intermediate representation" or "transfer syntax" for
transfer of multi-language (multi-alphabet, multi-character-set) text files.
Therefore, any particular Kermit program will only have to know one or more
file formats for its own local computer, plus the standard Kermit transfer
syntax.  No Kermit program will have to know another computer's file formats.

The situation for terminal emulation is obviously a little bit trickier.
However, terminal emulation is not part of the Kermit file transfer protocol,
but rather a feature of many Kermit programs.  It is presumed (and hoped) that
any PC-based Kermit program that supports the new multi-language text file
transfer syntax (whatever it finally turns out to be) will also support
terminal emulation in the same languages.  - Chris & Frank

26-Mar-89 23:48:48-GMT,3216;000000000001
Return-Path: <fdc>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA01645; Sun, 26 Mar 89 18:39:46 EST
Date: Sun, 26 Mar 1989 18:39:45 EST
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: protocols@rutgers.edu
Subject: Multi-alphabet text files
Message-Id: <CMM.0.88.606958785.fdc@watsun.cc.columbia.edu>

We are looking for information on any standards -- corporate, de-facto,
national, international, or lack of any of these -- for storage (as opposed to
transmission) of textual data that contains a mixture of alphabets, for
example, Roman, Hebrew, Arabic, Cyrillic, Greek, Japanese, Chinese, Korean,
Cherokee, ...

Commonly-used computer alphabets today include the well-known 7-bit US ASCII
and its "national" variations (UK ASCII, ISO-646 with various national
characters substituted for ASCII brackets, etc), the ISO 8859 family of 8-bit
alphabets (Latin 1-5, Cyrillic, Hebrew, Arabic, Greek, etc), the several
Japanese alphabets (JIS X 0201, JIS X 0208, etc), and so on.

For transmission of text composed of more than one alphabet, we convert from
local storage conventions to the international standard alphabets (e.g. ISO
or JIS) and then use the mechanisms and escape sequences defined in ISO 4873
and ISO 2022 (or JIS X 0202) for switching between them.

But for storing mixed-alphabet text within a computer file, what do we have?
We have the "corporate standard" alphabets, such as the EBCDIC and ASCII "code
pages" used on IBM mainframes and PCs, DEC Kanji, the Xerox character sets,
the Macintosh character sets, and so on...  Does anyone know anything about
"8-bit UNIX" -- the extension of UNIX to languages other than English?  How
about national versions of VAX/VMS, like French, German, or Hebrew VMS?

Is it true that most multi-language text files are those created by word
processing programs, and are therefore in special proprietary or private
formats, which include not only mechanisms for alphabet switching, but also
special effects like font selection, highlighting, page formatting, etc?
What are some popular multi-language word processing programs (for the PC,
PS/2, Macintosh, etc), and what do their file formats look like?  How
difficult is it to separate the alphabet selection from the page formatting?

This query is connected with an effort to extend the Kermit file transfer
protocol to include a transfer syntax for multi-language text.  This transfer
syntax will probably wind up using the ISO 4873 and 2022 mechanisms for
switching among ISO 8859 alphabets, with similar mechanisms applied to
Japanese and other multi-byte character sets.  Meanwhile, real-world examples
of multi-language file formats are needed to test the proposed (and evolving)
Kermit file transfer syntax against.

Please respond to any of the following addresses:

  cmg@watsun.cc.columbia.edu
  KERMIT@CUVMA.BITNET
  fdc@watsun.cc.columbia.edu
  FDCCU@CUVMA.BITNET

If you are interested in participating in the ensuing discussion, also ask
to be added to the "isokermit" mailing list.  Thanks for your help!

  Christine Gianone                   Frank da Cruz
  cmg@watsun.cc.columbia.edu          fdc@watsun.cc.columbia.edu
  KERMIT@CUVMA.BITNET                 FDCCU@CUVMA.BITNET

27-Mar-89 12:25:13-GMT,1925;000000000001
Return-Path: <campbell@redsox.bsw.com>
Received: from rutgers.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA03044; Mon, 27 Mar 89 07:25:11 EST
Received: from think.UUCP by rutgers.edu (5.59/SMI4.0/RU1.1/3.04) with UUCP 
	id AA27232; Mon, 27 Mar 89 07:25:00 EST
Received: by news.think.com; Mon, 27 Mar 89 06:53:59 EST
Received: by redsox.bsw.com (smail2.5)
	id AA13748; 27 Mar 89 06:37:51 EST (Mon)
Received: by redsox.bsw.com (5.51/smail2.5/09-10-88)
	id AA13744; Mon, 27 Mar 89 06:37:50 EST
Date: Mon, 27 Mar 89 06:37:50 EST
From: campbell@redsox.bsw.com (Larry Campbell)
Message-Id: <8903271137.AA13744@redsox.bsw.com>
To: fdc@watsun.cc.columbia.edu
Subject: Re: Multi-alphabet text files
Newsgroups: comp.protocols.misc
In-Reply-To: <CMM.0.88.606958785.fdc@watsun.cc.columbia.edu>
Organization: The Boston Software Works, Inc.

You didn't mention T.61.  That's what we use in our email gateway products
(Wang/DEC/UNIX), mainly because it's specified in X.400.  We would have
preferred ISO 8859 because it's simpler, but the ISO document specifically
says ISO 8859 is *not* to be used in any "CCITT telematic application".
T.61 does have the advantage of allowing you to apply diacritic marks
to *any* character, but the disadvantage that characters with diacritics
take two bytes, so when you translate from a typical character set into
T.61 the output can be longer than the input.

I'm not familiar with PC word processors, but in the DEC and Wang word
processors with which I am familiar, alphabet selection is completely
orthogonal (as it should be) to font and style selection.

Anyway, I have a considerable interest in this whole question, so I would
appreciate being included in any mailing list you might construct.
-- 
Larry Campbell                          The Boston Software Works, Inc.
campbell@bsw.com                        120 Fulton Street
wjh12!redsox!campbell                   Boston, MA 02146

28-Mar-89  6:10:14-GMT,5077;000000000401
Return-Path: <@cuvmb.cc.columbia.edu:KEIBUN@JPNKEKVM.BITNET>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA08095; Tue, 28 Mar 89 01:10:13 EST
Message-Id: <8903280610.AA08095@watsun.cc.columbia.edu>
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 3343; Tue, 28 Mar 89 01:05:52 EST
Received: from JPNKEKVM by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6085; Tue, 28 Mar 89 01:05:50 EST
Received: by JPNKEKVM (Mailer R2.02) id 1574; Tue, 28 Mar 89 15:02:28 JST
Date:         TUE, 28 MAR 89 15:01:22 JST
From: Hirofumi Fujii <KEIBUN%JPNKEKVM@cuvmb.cc.columbia.edu>
Subject:      Re: ISO / Kermit
To: Frank da Cruz <fdc@watsun.cc.columbia.edu>
In-Reply-To:  Your message of Mon, 27 Mar 1989 10:48:23 EST

Dear Frank

Yes, I agree to the 8bit proposal.
In 8bit environment, it is obvious that 8bit data transfer is more efficient
than 7bit one.
However, as Joe mentioned, it is also true that in the case of 7bit
environment, Kermit 8bit quoting mechanism may not be
efficient (is equivalent to single shift of ISO-2022 at best).  For Japanese,
locking shift mechanism (or SI/SO) is more efficient than shingle shift
(8bit quoting) because characters are appeared in word unit in most
cases.

So, how about the followings ?
-----------------(beginning of my proposal)----------------------

  Instead of 'I8' of the original proposal, use
    I<protocol>
  where <protocol> is the final character used in the announcer of the
  ISO-2022.  For example,

                <protocol>                   Meaning
                     A                  Only G0 is used.  All characters are
                                        mapped into left half.  Shift mechanism
                                        is not used.

                     B                  Map both G0 and G1 into the left half.
                                        SI and SO are used to switch between
                                        G0 and G1.
                                        (original proposal).

                     D                  In 7bit environment, map both G0 and G1
                                        into G.  SI and SO are used to switch
                                        between G0 and G1.
                                        In 8bit environment, both GL (left half
                                        ) and GR (right half) are used.
                                        Locking shift is not used.

                  etc.,

           in addition to the above

                     8 (or something)   Full ISO-2022 is used. (G0, G1, G2,
                                        G3, Locking-shift, Single-shift etc.)

                     X (or something)   ISO-10646 ?!

                  etc., etc., etc.

  (note)
     'A', 'B' and 'D' are the final characters of the announcer, <ESC><SP>F.

The above transfer protocol is initiated by sender.  If the receiver
Kermit does not support the protocol requested by sender, but support one
of the above, return the Y packet with that data.  If the receiver
Kermit does not support any of the above protocol, return N packet.

A Kermit which supports international character sets, MUST SUPPORT AT LEAST
PROTOCOL 'D'.
-----------------------( end of my proposal )----------------------

Locking-shift vs Single-shift
-----------------------------
In the case of Japanese, if the communication line is 7-bit, it is
more efficient to use 'B' even if Kermit support 8bit quoting because
many of the characeters are appeared in word unit, i.e., locking-shift
(or SI/SO) is more efficient than single-shift(8bit quoting).

G2 and G3 character sets
------------------------
It is also better to use G2 or G3 for Japanese. However, it requires more
complicated shift mechanism. So, I think it should be optional. Actually,
I have checked my Japanese mail, and found that the use of the Katakana
character set (Hankaku katakana) is quite few.  Therfore, I think G0 and G1
is enough in many cases.

ISO-10646
---------
I have no opinion about ISO-10646. The above 'D' in 8bit environment
uses both GL and GR, and does not use locking-shift mechanism. This is
very similar to ISO-10646.  The only difference is that the ISO-10646
is the multi-byte code. Therfore, I think it is easy to extend the
ISO-10646 feature within the above scheme.  This is one of the reasons
why I choose 'D' protocol as default.

Terminal Emulator
-----------------
I think, in Kermit protocol, it is not necessary to say about the terminal
emulator.  It is machine dependent and can be handled within the local
routines.
Actually, my Kermit (MSVP98) already has ISO-2022 features (supports
G0, G1, G2 and G3 character sets, all locking-shift and single-shift
mechanisms) within the scheme of MS-Kermit.  I have not modified any
machine-indepent routines of the MS-Kermit.  Joe has separated Kermit
modules very nicely and clearly.

--------------
Hirofumi Fujii
National Laboratory for High Energy Physics (KEK)
KEIBUN@JPNKEKVM.BITNET

From @cuvmb.cc.columbia.edu:A-PIRARD@BLIULG11.BITNET  Thu Mar 30 10:25:51 1989
Return-Path: <@cuvmb.cc.columbia.edu:A-PIRARD@BLIULG11.BITNET>
Received: from columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA18213; Thu, 30 Mar 89 10:25:51 EST
Received: from cuvmb.cc.columbia.edu by columbia.edu (5.59++/0.3) with SMTP 
	id AA16472; Thu, 30 Mar 89 10:24:07 EST
Resent-Message-Id: <8903301524.AA16472@columbia.edu>
Message-Id: <8903301524.AA16472@columbia.edu>
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 4455; Thu, 30 Mar 89 10:19:38 EST
Received: from VM1.EARN-ULG.AC.BE by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with
 BSMTP id 0380; Thu, 30 Mar 89 10:19:36 EST
Received: by BLIULG11 (Mailer R2.03B) id 0757; Thu, 30 Mar 89 17:21:37 +0200
Resent-Date:  Thu, 30 Mar 89 17:16:27 +0200
Resent-From: Andre' PIRARD <A-PIRARD%BLIULG11@cuvmb.cc.columbia.edu>
Resent-To: ISO/Kermit Discussion Group <ISOKERMIT@watsun.cc.columbia.edu>
Date:         Fri, 24 Mar 89 14:37:42 +0100
From: Andre' PIRARD <A-PIRARD%BLIULG11@cuvmb.cc.columbia.edu>
Subject:      Re: ISO/Kermit 7 vs 8 bit transfer syntax
To: Frank da Cruz <fdc@watsun.cc.columbia.edu>, Joe Doupnik <JRD@usu>
In-Reply-To:  Message of Thu, 23 Mar 1989 10:24:16 EST from
 <fdc@watsun.cc.columbia.edu>

Well, I am back to the network on and off and I see a mailing list setup.
I am not sure this message I sent before being disconnected made it through.
It is the test reply asked for anyway.
Andre.
----------------------------Original message----------------------------

     Despite my being in a desolated empty room because of our
moving, I interleave some comments with packing as long as I am
still able to use the network.

     You will find below a document I find essential to our
discussion. Thanks to Johan van Wingen for the information. It is
an answer to my first note: "how would systems store multilingual
character sets". In other words, "what would Kermit have to
transfer". I see this ISO 10646 project as responding to the
fundamental needs of leading software developers with
international scope and I guess they must be longing for that.
As we say, "Il n'y a pas de fum)e sans feu".
The task of Kermit would be much easier on these more efficient
grounds (I guess translating between the 4 "forms-of-use" and
between them and single-byte ISO 8859 versions).

     But I think that, despite their evident lack of performance,
the present standards will continue to hold for terminal mode
tied to 7-bit or 8-bit lines because they keep in line with
present hardware in working by "adding more fonts". It's probably
a thing to do on machines with a graphic screen even though many
users will be satisfied with a single ISO version. But for that
reason, the hosts driving the terminal might forget to specify
which version it uses and the default one should be customizable.

     I hope this unifying standard will come very soon to solve
those intricate problems of Asian languages. The fundamental
question is whether to wait or do something in the meantime that
could have to be erased in the end? These Asian people have a
strong vote weight. My own testimony is that we will use our
single ISO version while waiting and are interested by a mere
hidden byte-to-byte translation capability in file transfer in
addition to the terminal mode extensions.

     Hear you all in one week time.

Andr).

Date:         Fri, 10 Feb 89 15:27:00 CET
From:         Johan van Wingen <MOSGLA@HLERUL2>
Subject:      Informal Introduction to ISO 10646
To:           Andre' Pirard <A-PIRARD@BLIULG11>


1

  INTERNATIONAL ORGANIZATION FOR STANDARDIZATION    ISO/IEC JTC1/SC2/WG2
  INTERNATIONAL ELECTROTECHNICAL COMMISSION                       N 274

  Joint Technical Committee 1
  Subcommittee 2 Characters and Information Coding, Working Group 2


  ======================================================================
  Introduction to ISO 10646 - Multiple-Octet Coded Character Set
  ======================================================================

  A new standard is being developed within Working Group 2 of ISO/IEC
  JTC1/SC2 for the multiple-octet coded character set. Formal drafts
  will be issued during 1989.

  Its purpose is to provide a single character code which will permit
+     _______
  the written form of all present-day languages throughout the world to
  be used within computers, to be processed and interchanged. All types
  of text written in character form will be provided for, from simple
  commercial documents to publication of technical reports etc. Also the
  bibliographic requirements of librarians will be met.

  The structure of the whole code may be illustrated thus, with an octet
+     _________                                                    _____
  of bits for each dimension:


                                           ZDDDDDDDDDDDDDDDDDDD?
                                      ZDDDDDDDDDDDDDDDDDDD?    3
                                 ZDDDDDDDDDDDDDDDDDDD?    3    3
                            ZDDDDDDDDDDDDDDDDDDD?    3    3    3
     Plane             ZDDDDDDDDDDDDDDDDDDD?    3    3    3    3
    /             ZDDDDDDDDDDDDDDDDDDD?    3    3    3    3    3
   /         ZDDDDDDDDDDDDDDDDDDD?    3    3    3    3    3    3
  ZDD>  ZDDDDDDDDDDDDDDDDDDD?    3    3    3    3    3    3    3
  3Cell 3                   3    3    3    3    3    3    3    3
  3     3  ZDDDDDD?  ZDDDDDD    3    3    3    3    3    3    3
  V     3  3  A00 3  3  A01 3    3    3    3    3    3    3    3
  Row   3  DDDDDD  DDDDDD    3    3    3    3    3    3    3
        3  3      3  3      3    3    3    3    3    3    3    3
        3  3  J1  3  3  DD  3    3    3    3    3    3    3    3
        3  3      3  3      3    3    3    3    3    3    3    3
        3  @DDDDDDY  @DDDDDD    3    3    3    3    3    3    3
        3                   3    3    3    3    3    3    3DDDDY
        3  ZDDDDDD?  ZDDDDDD    3    3    3    3    3DDDDY
        3  3  A10 3  3  A11 3    3    3    3    3DDDDY (future
        3  DDDDDD  DDDDDD    3    3    3DDDDY   standardization)
        3  3      3  3      3    3    3DDDDY (Korean)
        3  3  C1  3  3  K1  3    3DDDDY (Japanese)
        3  3      3  3      3DDDDY (Chinese)
        @DDJDDDDDDJDDJDDDDDDY (bibliographic)

    Basic multi-lingual plane                  Supplementary planes


  The basic multi-lingual plane will contain four segments for graphic
+     _________________________                   ________
  characters, each holding 96 * 96 characters.

  Each segment will be divided into two zones: an alphabetic zone of
+                                       _____
  16 * 96 characters, and another zone either for the most-frequently
  used characters of the Chinese, Japanese and Korean ideographic
  scripts, or for certain special purposes.

  The shaded area outside the graphic quadrants will be used for control
+                                                                _______
  functions. All those of ISO 6429, ISO 6937 and ISO 8613 will be
+ _________
  available, with the same coding.

  The supplementary planes will accomodate characters that overflow from
+ ________________________
  the basic multi-lingual plane.
1
  A coded character anywhere in the code may be uniquely identified by
  means of three octets:

   m-s  ZDDDDDDDDDDDDDD>DDDDDDDDDDDDDD>DDDDDDDDDDDDDD?  l-s
        3 Plane-octet  3 Row-octet    3 Cell-octet   3
        @DDDDDDDDDDDDDDJDDDDDDDDDDDDDDJDDDDDDDDDDDDDDY

    NOTE: Sequences of characters run horizontally along the rows, not
          vertically as in previous code tables.

  The code may be used in different forms-of-use:
+                                   ____________

    a) A four-octet form, in which the three octets for the character
       are preceded by one for systems use. Three octet coding will
       never be used.

    b) A two-octet form, restricted exclusively to a single plane.
       Especially for users with alphabetic scripts, this will
       accomodate probably 99% of their applications.

    c) A two-octet form with extension using occasional four-octets.

    d) A compacted form, permitting strings of related characters to be
       used as single-octets.

  The basic multi-lingual plane is being designed to permit easy
  inter-working with existing 8-bit codes. Generally, conversion will be
  by the table look-up technique; however, conversion with ISO 8859
  parts 1,2,5,6,7,8 may use a simple algorithm.

  All designation, invocation and shifting as in ISO 2022 will be
  avoided.
+ _______

  It is considered that the consequent simplification of software,
+                                      __________________________
  especially for generalized applications in the OSI environment, will
  make this code economically attractive despite the the relatively
  extravagant use of bits.

  The layout of the basic multi-lingual plane may be illustrated in
+     ______        _________________________
  FIGURE 1 (next page), the axes being not drawn linearly.

    NOTE: The value of any octet is shown in simple decimal notation,
          e.g.  032, 255.

  The contents of any of the rows are set out in detailed code tables.
+                                                ____________________
  These are drawn on a pro-forma which shows a complete row in twelve
  strips, each of 16 graphic characters.

  Because the code is designed to be used as a whole, especially the
  basic multi-lingual plane, no significance attaches to whether certain
  characters are in the left hand or right-hand halves of a row, or
  early or late in the code table.

  A character once included in the code table is not duplicated
  elsewhere. Therefore for any particular application characters will
  be taken from many different places in the code table. For example
  users within Greece will find Greek letters in row 040, the equivalent
  Latin letters they use for transliteration in row 032, and some
  symbols they use in row 034.

  It will be trivially easy to adapt any equipment designed for the
  Japanese or Chinese scripts to provide all the characters of the basic
  multi-lingual plane. Therefore it is expected that suitable
  cost-effective equipment will become readily available.
+ ________________________

  The feature of fixed length coding, especially in the two-octet
+                ___________________
  mode-of-use, will make this code very easy to use in high-level
  programming languages and other software as employed for OSI and ODA.


  Hugh McG Ross, editor.                        Revised  Oct.  1988


1


  FIGURE 1    ISO 10646  Structure of the basic multi-lingual plane


        /   /                      /  /                       /
  Row. /000/032   Cell-octet   126/  /160                 255/
  oct.ZDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD?
   0003                                                     3
      3  ZDDDDDDDDDDDDDDDDDDDDDDD?  ZDDDDDDDDDDDDDDDDDDDDDDD
   0323  3   Latin script for    3  3 European languages    3 \
   0333  3   ISO 8859-1 and -2   3  3 and ISO 6937-2        3  \
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD   \
   0343  3   Extended symbols    3  3 from ISO 8879         3    \
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD     \
   0353  3   Extended Latin      3  3 script for            3      \
      3  3     all world         3  3 languages             3       \
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD        \
   0373  3   Special African and 3  3 phonetic letters      3         \
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD Alphabetic
   0383  3   Cyrillic script for 3  3 major languages       3
      3  3     Cyrillic for all  3  3 minority languages    3   scripts
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD          /
   0403  3       Greek script    3  3 for all               3         /
      3  3          forms of     3  3  writing              3        /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD       /
   0423  3   Arabic script for   3  3 all languages         3      /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD     /
   0433  3            Hebrew     3  3 script                3    /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD   /
   0443  3             Other     3  3 scripts               3  /
      3  3                       3  3                       3 /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD
   0483  3     Japanese          3  3    Special Purpose    3 Ideographs
      3  3     JIS X 0208        3  3                       3
   1263  3                       3  3                       3
      3  @DDDDDDDDDDDDDDDDDDDDDDDY  @DDDDDDDDDDDDDDDDDDDDDDD
      3                                                     3
      3  ZDDDDDDDDDDDDDDDDDDDDDDD?  ZDDDDDDDDDDDDDDDDDDDDDDD \
   1603  3                       3  3                       3  \
      3  3             Indian    3  3 scripts               3   \
      3  3                       3  3                       3
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD Alphabetic
      3  3         Mathematical  3  3 symbols               3   /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD  /
      3  3           Oriental    3  3 scripts               3 /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD
   1763  3      Chinese          3  3    Korean             3 Ideographs
      3  3      GB 2312          3  3   KS C 5601           3
   2553  3                       3  3                       3
      @DDJDDDDDDDDDDDDDDDDDDDDDDDJDDJDDDDDDDDDDDDDDDDDDDDDDDY

30-Mar-89 22:51:43-GMT,18689;000000000001
Return-Path: <cmg>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA19916; Thu, 30 Mar 89 17:40:00 EST
Date: Thu, 30 Mar 1989 17:39:59 EST
From: Christine M Gianone <cmg@watsun.cc.columbia.edu>
To: isokermit
Subject: ISO/Kermit Proposal Draft #2, Part 1 of 4
Message-Id: <CMM.0.88.607300799.cmg@watsun.cc.columbia.edu>


	 A KERMIT PROTOCOL EXTENSION FOR INTERNATIONAL CHARACTER SETS

		     Christine Gianone and Frank da Cruz
	     Columbia University Center for Computing Activities
			    612 West 115th Street
			   New York, NY 10025, USA

				DRAFT NUMBER 2
				March 30, 1989

ABSTRACT

An extension to the Kermit file transfer protocol is proposed to allow
transfer of multi-language text files between unlike computer systems.  The
new transfer syntax uses the 8-bit character sets defined in the ISO 8859 and
similar standards, and mechanisms for switching among them defined in ISO
2022.  Japanese and other multi-byte character sets are handled by similar
mechanisms.


SUMMARY OF CHANGES SINCE DRAFT #1, March 1, 1989

 - Summary of current standards expanded and clarified.
 - Kermit file transfer syntax expanded to allow full range of ISO 2022
   mechanisms in both 7-bit and 8-bit environments.
 - ISO 2022 announcers added to attribute packet.
 - Incorporation of Japanese character sets.


ACKNOWLEDGEMENTS

Many thanks to these people for their helpful and constructive comments on
the first draft.  In most cases, their suggestions or the information they
provided have been incorporated into the second draft.

  John Chandler (Harvard/Smithsonian Center for Astrophysics, USA)
  Joe Doupnik (Utah State University, USA)
  Hirofumi Fujii (Japan National Laboratory of High Energy Physics)
  Ken-ichiro Murakami (Nippon Telephone and Telegraph Research Labs, Tokyo)
  Jacob Palme (Stockholm University, Sweden)
  Andre Pirard (University of Liege, Belgium)
  Paul Placeway (Ohio State University, USA)
  Gisbert W. Selke (University of Bonn, West Germany)
  Johan van Wingen (Leiden, Netherlands)


PREFACE

This is a DRAFT proposal, based upon a reading of current character-set
standards, some familiarity with the issues involved, and limited testing with
devices that claim to implement these standards (such as the DEC VT340
terminal).  Readers are urged to correct us if we have misinterpreted the
standards, to fill in missing information, and to make any comments or
criticisms they desire.  Readers with knowledge of real-world multi-alphabet
applications and file formats are especially urged to comment on the
suitability of this proposal.

THIS IS NOT A FINAL REVISION.  ANY AND ALL PROPOSED TECHNIQUES AND MECHANISMS
MAY CHANGE, BASED UPON FURTHER DISCUSSION.  In fact, this draft is still
quite rough and may contain inconsistencies and mistakes -- please point them
out!

Even after the draft has reached the "final" stage, it is fully expected that
changes will be necessary once programmers start to actually transform the
protocol description into working code.


INTRODUCTION

The Kermit file transfer protocol makes a distinction between text and binary
files, and it defines a particular transfer syntax for text files, namely
7-bit ASCII characters with carriage return and linefeed (CRLF) after each
line, so that text may be stored in useful fashion on any computer to which it
is transferred.  Each Kermit program knows how to translate from the local
text-file storage conventions to Kermit's transfer syntax, and vice versa.  In
this way, text files can be transferred between unlike systems (say, an EBCDIC
card-oriented system and an ASCII stream file system) and remain useful after
transfer.

Now that the world's computer users have begun to find US ASCII insufficient
for their uses, and standards organizations are adopting standard codes for
the world's other alphabets, and vendors like IBM, DEC, and Apple have begun
to make these characters available on their displays (albeit in different
positions), and people are beginning to produce increasing numbers of
multilingual documents, Kermit's text file transfer syntax must be extended to
allow for texts in a mixture of alphabets.

It is best if this can be done in line with currently existing and evolving
information interchange standards, including ANSI X3.4 (ASCII), ISO 646, ISO
4873, ISO 2022, ISO 8859, etc.  These and other standards which we believe to
be pertinent are listed in Appendix A.

To transfer text files containing a mixture of alphabets, we propose to treat
Kermit data transfer in the same manner as ISO 2022 treats a terminal-to-host
data transmission, by embedding specific escape sequences and control
characters in the data stream for the purposes of alphabet identification and
switching.  Any of the world's standard registered alphabets (Table 5) may be
included in this scheme, no matter whether they are single-byte codes (such as
ASCII or ISO Latin Alphabet 1) or multi-byte codes (such as Japanese Kanji),
and any number of alphabets may be used within a single text file.

This extension to the Kermit protocol will be called ISO-2022 Transfer Syntax.
Like all other Kermit protocol extensions, this one will be optional.  In a
Kermit program that supports ISO-2022 Transfer Syntax, commands will be
included for the user to enable and disable this feature.

When ISO-2022 Transfer Syntax is enabled, the sending Kermit program will
translate from the local storage formats and conventions for multi-language
text into ISO-2022 Transfer Syntax, and the receiving Kermit will translate
from the transfer syntax into its own local storage formats and conventions
(or it may elect not to do so).  Therefore, each Kermit program will have to
know only about the transfer syntax and its own computer's local formats.


WHY ISO 2022?

Many different multi-alphabet transfer syntaxes are imaginable.  Our aim has
been to settle on a single syntax that achieves the best balance among the
following requirements:

  1. The ability to represent any character in any coded character set
  2. The ability to uniquely identify each coded character set
  3. The ability to switch among different coded characters sets
  4. The ability to work in both the 7-bit and 8-bit transmission environments
  5. Minimization of transmission overhead within the Kermit encoding scheme
  6. Compatibility with existing applicable standards
  7. Fairness to all nationalities

ISO 2022, when used in conjunction with the ISO 8859 and other registered
single-byte and multi-byte character sets, would seem to fulfill all the
listed requirements.  Any character in any registered character set (see
Appendix C) can be transmitted unambiguously, any number of character sets can
be used in a single transmission, mechanisms are available for both 7-bit and
8-bit transmission, transmission overhead can be minimized by selection of
single and locking shifts.  And finally, the flexibility of ISO 2022 can
result in fair treatment for each alphabet.  For example, a language like
Russian can be transmitted efficiently in the 7-bit environment because
shifting between G0 and G1 is relatively infrequent, whereas a language like
French shifts very frequently between G0 and G1 to get at the accented
characters.  ISO 2022 allows locking shifts in the Russian case and single
shifts in the French case.

And since ISO 2022 allows for both single-byte and multi-byte character sets,
and because it has compatible counterparts in Asia, the same scheme can apply
to Asian character sets.  This will allow Kermit to transfer computer text
containing virtually any mixture of languages.


HOW THE STANDARDS WORK

ASCII and ISO 646 give us a 128-character 7-bit character set.  This set is
divided into two parts:

  1. 33 "control characters" (characters 0 through 31, and character 127).
  2. 95 "graphic characters" (32-126).

ISO 646 allows for national variations (explained later), but an International
Reference Version (IRV) is defined, which is identical to US ASCII except in
the appearance of the graphic used for character 36 ("$" in US ASCII and
currency sign in the IRV) and for character 126 (tilde "~" in US ASCII,
overline in the IRV).  "Graphics" means printing characters -- characters that
make ink appear on the page or phosphor glow on the screen (as opposed to
pixel- or line-oriented picture graphics).  The ASCII / IRV character set is
shown in Figure 1, arranged in a table of 16 rows and 8 colums.

_____________________________________________________________________________

      00  01  02  03  04  05  06  07
     +---+---+---+---+---+---+---+---+
  00 |NUL DLE| SP  0   @   P   `   p |
  01 |SOH DC1| !   1   A   Q   a   q |
  02 |STX DC2| "   2   B   R   b   r |
  03 |ETX DC3| #   3   C   S   c   s |
  04 |EOT DC4| $   4   D   T   d   t |
  05 |ENQ NAK| %   5   E   U   e   u |
  06 |ACK SYN| &   6   F   V   f   v |
  07 |BEL ETB| '   7   G   W   g   w |
  08 |BS  CAN| (   8   H   X   h   x |
  09 |HT  EM | )   9   I   Y   i   y |
  10 |LF  SUB| *   :   J   Z   j   z |
  11 |VT  ESC| +   ;   K   [   k   { |
  12 |LF  FS | ,   <   L   \   l   | |
  13 |CR  GS | -   =   M   ]   m   } |
  14 |SO  RS | .   >   N   ^   n   ~ |
  15 |SI  US | /   ?   O   _   o  DEL|
     +---+---+---+---+---+---+---+---+

  Figure 1: The ASCII / ISO-646 International
      Reference Version Character Set
_____________________________________________________________________________

Characters are often referred to by their column and row position in this type
of table.  For example, character 05/08 in Figure 1 is "X".  Columns 00-01,
plus character 07/15, comprise the control set.  Columns 02-07, minus
character 07/15, comprise the graphics.

8-bit character sets are described in ISO 4873 and ANSI X3.41 (see Appendix
A).  An 8-bit character set has two sides.  Each side has a control set and a
graphics set.  The "left half" consists of the control set C0 and the graphics
set GL (Graphics Left).  GL has 94 characters, and corresponds to ASCII (and
ISO 646) positions 02/01-07/14.  SP (space) and DEL are not considered part of
GL.  All the characters in the left half have their high-order, or 8th, bit
set to zero, and are therefore representable in 7 bits.  The "right half"
consists of the control set C1 and the graphics set GR (Graphics Right).  All
characters in the right half have their 8th bits set to one.  Figure 2 shows
the layout of an 8-bit character set.

_____________________________________________________________________________

     <--C0--> <---------GL---------->  <--C1--> <---------GR---------->
       00  01  02  03  04  05  06  07    08  09  10  11  12  13  14  15
     +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+
  00 |NUL DLE| SP  0   @   P   `   p | |    DCS|---+                   |
  01 |SOH DC1| !   1   A   Q   a   q | |    PU1|                       |
  02 |STX DC2| "   2   B   R   b   r | |    PU2|                       |
  03 |ETX DC3| #   3   C   S   c   s | |    STS|                       |
  04 |EOT DC4| $   4   D   T   d   t | |IND CCH|                       |
  05 |ENQ NAK| %   5   E   U   e   u | |NEL MW |                       |
  06 |ACK SYN| &   6   F   V   f   v | |SSA SPA|                       |
  07 |BEL ETB| '   7   G   W   g   w | |ESA EPA|                       |
  08 |BS  CAN| (   8   H   X   h   x | |HTS    |      (special         |
  09 |HT  EM | )   9   I   Y   i   y | |HTJ    |       graphics)       |
  10 |LF  SUB| *   :   J   Z   j   z | |VTS    |                       |
  11 |VT  ESC| +   ;   K   [   k   { | |PLD CSI|                       |
  12 |LF  FS | ,   <   L   \   l   | | |PLU ST |                       |
  13 |CR  GS | -   =   M   ]   m   } | |RI  OSC|                       |
  14 |SO  RS | .   >   N   ^   n   ~ | |SS2 PM |                       |
  15 |SI  US | /   ?   O   _   o  DEL| |SS3 APC|                   +---|
     +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+
     <--C0--> <---------GL---------->  <--C1--> <---------GR---------->

		      Figure 2: An 8-Bit Character Set
_____________________________________________________________________________

GR character sets can have either 94 or 96 characters.  A 94-character GR set
begins in position 10/01 and ends in position 15/14, with Space (SP) occupying
position 10/00 and DEL in position 15/15, just like G0 (the corners shown in
GR in the diagram).  A 96-character set has graphic characters in all 96
positions, 10/00 through 15/15.

An 8-bit alphabet, therefore, has up to 94 + 96 = 190 graphic characters.
This number is sufficient to represent the characters in many of the world's
written languages, but not necessarily sufficient to represent all the graphic
symbols required in a given application, for instance a multi-language
document.

To represent a greater number of graphic characters, ISO 4873 defines four
"intermediate sets" of graphic characters, of either 94 or 96 characters each.
These are called G0, G1, G2, and G3.  The G0 set never has more than 94
graphic characters, and G1-G3 can have up to 96 each.  Therefore there can be
up to:

  (2 x 32) + 94 + (3 x 96) = 446

characters simultaneously within the repertoire of a given device.

These intermediate graphics sets are kept in tables in the memory of the
terminal or computer.  One of the intermediate sets is assigned to GL, and (in
the 8-bit communications environment) another may be assigned to GR.  When the
terminal or computer receives a data byte, the numeric value of its bits
denote the position of the character in GL or GR.  For example, the byte
01000001 binary = 65 decimal = 04/01 = uppercase A in ASCII.  In the 8-bit
environment, any byte with its 8th bit set to zero is from GL, and byte with
its 8th bit set to one is from GR.

A language like English can be represented adequately GL, because all the
required characters fit there.  When a language has more than 94 characters,
two techniques are used to represent all the characters:

  1. For Roman-alphabet languages, put ASCII (or the ISO-646 IRV) in GL and
     the special characters (like accented letters) in GR.  French and German
     are examples.

  2. For languages with many symbols (e.g. where a symbol is assigned
     to each word, rather than to each sound), represent each character
     with multiple bytes rather than one byte.  Japanese Kanji, for example,
     uses a 2-byte code.  A multibyte code may be assigned to G0, G1, G2, or
     G3, just like a single-byte code. 

So far we have a terminal or computer with an "active" GL/GR character set,
and four intermediate character sets G0, G1, G2, and G3.  How do we assign
actual character sets to G0-G3, and how do we associate the intermediate
character sets with the active character set?

Selection of character sets is accomplished using special control characters
or escape sequences embedded within the data stream as described in ISO 2022.
An escape sequence is used to DESIGNATE a particular alphabet (such as Roman,
Cyrillic, Hebrew, Arabic, Kanji, etc) to a particular intermediate graphics
set (G0, G1, G2, or G3).  A shift function is used to INVOKE a particular
intermediate graphics set into GL or GR.

In our discussion, we use the following notation (numbers are decimal unless
otherwise noted):

  <ESC> Escape (ASCII 27, character 01/11)
  <SP>  Space  (ASCII 32, character 02/00)
  <SO>  Shift Out (Ctrl-N, ASCII 14, character 01/14)
  <SI>  Shift In  (Ctrl-O, ASCII 15, character 01/15)

Table 1 shows the alphabet designators and shift functions for single-byte and
multi-byte character sets.  The same escape sequences are used for character
set designation in both the 7-bit and 8-bit environments.

The character which is substituted for "F" identifies the actual character set
to be used; these are listed in Table 5.  The shift functions may be either
locking or single.  "Locking shift" is like shift-lock on a typewriter.  It
means that all subsequent characters until the next shift are to be taken from
the designated intermediate character set.  "Single shift" applies only to the
character (either single or multibyte) that follows it immediately, but single
shift functions are only available for the G2 and G3 sets.  Locking shift
functions remain in effect across alphabet changes.

_____________________________________________________________________________

  Escape            
 Sequence     Function                                         Invoked By

  <ESC>(F     assigns 94-character graphics set "F" to G0.     SI or LS0
  <ESC>)F     assigns 94-character graphics set "F" to G1.     SO or LS1
  <ESC>*F     assigns 94-character graphics set "F" to G2.     SS2 or LS2
  <ESC>+F     assigns 94-character graphics set "F" to G3.     SS3 or LS3
  <ESC>-F     assigns 96-character graphics set "F" to G1.     SO or LS1
  <ESC>.F     assigns 96-character graphics set "F" to G2.     SS2 or LS2
  <ESC>/F     assigns 96-character graphics set "F" to G3.     SS3 or LS3
  <ESC>$(F    assigns multibyte character set "F" to G0.       SI or LS0
  <ESC>$)F    assigns multibyte character set "F" to G1.       SO or LS1
  <ESC>$*F    assigns multibyte character set "F" to G2.       SS2 or LS2
  <ESC>$+F    assigns multibyte character set "F" to G3.       SS3 or LS3

	     Table 1: Escape Sequences for Alphabet Designation

(Note, <ESC>$F was used in earlier versions of ISO 2022 to assign a multibyte
character set to G0, and no provisions were made to assign multibyte character
sets to G1-G3.)
_____________________________________________________________________________


In the 7-bit environment, only one character set, GL, can be active at a time.
The active character set can be selected from among the intermediate sets
G0-G3 by the shifts shown in Table 2.  Control characters from C0 are
transmitted as is, and those from the C1 set are sent prefixed by <ESC>
followed by the character value, minus 64.  For example, the C1 character
10000001 binary (129 decimal) becomes <ESC>A (129 - 64 = 65 = "A").

_____________________________________________________________________________

 Shift  Representation  Name              Function

  SI       Ctrl-O       Shift In          invoke G0 into GL
  SO       Ctrl-N       Shift Out         invoke G1 into GL
  LS2      <ESC>n       Locking Shift 2   invoke G2 into GL
  LS3      <ESC>o       Locking Shift 3   invoke G3 into GL
  SS2      <ESC>N       Single Shift 2    select single character from G2
  SS3      <ESC>O       Single Shift 3    select single character from G3

	       Table 2: Shifts Used in the 7-Bit Environment
_____________________________________________________________________________

30-Mar-89 22:48:55-GMT,19888;000000000001
Return-Path: <cmg>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA19919; Thu, 30 Mar 89 17:40:55 EST
Date: Thu, 30 Mar 1989 17:40:55 EST
From: Christine M Gianone <cmg@watsun.cc.columbia.edu>
To: isokermit
Subject: ISO/Kermit Proposal Draft #2, Part 2 of 4
Message-Id: <CMM.0.88.607300855.cmg@watsun.cc.columbia.edu>


In the 8-bit environment two character sets, GL and GR, can be active at once.
A GL character is selected by a byte whose 8th bit is zero, and a GR character
by a byte whose eighth bit is one.  The actual character sets assigned to GL
and GR are selected by the shifts shown in Table 3.  Control characters from
both the C0 and C1 sets are sent as is.

_____________________________________________________________________________

 Shift  Representation  Name                   Function

  LS0      Ctrl-O       Locking Shift 0        invoke G0 into GL
  LS1      Ctrl-N       Locking Shift 1        invoke G1 into GL
  LS2      <ESC>n       Locking Shift 2        invoke G2 into GL
  LS3      <ESC>o       Locking Shift 3        invoke G3 into GL
  LS1R     <ESC>~       Locking Shift 1 Right  invoke G1 into GR
  LS2R     <ESC>}       Locking Shift 2 Right  invoke G2 into GR
  LS3R     <ESC>|       Locking Shift 3 Right  invoke G3 into GR
  SS2       08/14       Single Shift 2         select single character from G2
  SS3       08/15       Single Shift 3         select single character from G3

	       Table 3: Shifts Used in the 8-Bit Environment
_____________________________________________________________________________

So we have a 3-tiered system.  At the bottom tier lie all the world's coded
character sets.  We can designate up to four of them to each of the
intermediate graphics sets G0, G1, G2, and G3 using the escape sequences shown
in Table 1.  The terminal or computer keeps each of the selected intermediate
sets in memory.  The terminal or computer also has one active set, composed of
GR and GL.  The intermediate sets are invoked to GL or GR (one at a time) by
the shifts SO, SI, LS0, LS1, etc.  A simplified diagram for the 8-bit
environment is shown in Figure 3 (see ISO 2022 for detailed diagrams of both
the 7-bit and 8-bit environments).  On a more sophisticated output device,
Figure 3 would contain numerous arrows pointing upwards to demonstrate the
operation of the designators and shifts.

_____________________________________________________________________________

                   +--+--------+  +--+--------+
                   |C0|   GL   |  |C1|   GR   |
                   |  |        |  |  |        |                  8-Bit
                   |  |        |  |  |        |                  Code
                   |  |        |  |  |        |                  In Use
                   +--+--------+  +--+--------+
                     
                   
         LS0          LS1,LS1R      LS2,LS2R      LS3,LS3R       Shifts
                                      SS2           SS3
       +--------+    +--------+    +--------+    +--------+      Intermediate
       |        |    |        |    |        |    |        |      Graphics
       |   G0   |    |   G1   |    |   G2   |    |   G3   |      Sets
       |        |    |        |    |        |    |        |
       +--------+    +--------+    +--------+    +--------+
                                                                 Alphabet
                                                                 Designation
 <ESC>(B      <ESC>-A      <ESC>-B      <ESC>-L      <ESC>$)B    Sequences
                                                    +---------+
+--------+   +--------+   +--------+   +--------+  +--------+ |  The world's
| ISO    |   | ISO    |   |  ISO   |   |  ISO   |  | JIS X  | |  registered
| 646    |   | Latin  |   |  Latin |   |  Latin |  | 0208   | |  character
|(ASCII) |   | 1      |   |  2     |   |Cyrillic|  | Kanji  | +  sets
+--------+   +--------+   +--------+   +--------+  +--------+

	  Figure 3: The ISO 2022 Character Set Selection Mechanisms
_____________________________________________________________________________


To understand the three-tiered design of ISO 2022, imagine a computer
programmed to display a mixture of character sets on its screen.  A large
collection of fonts might be stored on the disk, one font per file.  These are
the character sets of the bottom tier.  When a font is needed, it will be read
from the disk and stored in memory in an array, for rapid access.  If several
fonts are needed, they will be stored in several arrays.  These arrays are the
intermediate character sets, G0-G3.  When a data byte arrives to be displayed,
the actual graphic representation is taken from GL or GR (depending on the
byte's 8th bit).  GL is associated with one of the intermediate graphic sets,
and GR with another.  If no more than four character sets are used, then each
one needs to be read from the disk only once, and display is rapid and
efficient thereafter.


ANNOUNCING ISO 2022 FACILITIES

A large portion of ISO 2022 is devoted to describing how 8-bit characters may
be transmitted on a 7-bit communication path, for example when parity is in
use.  In the 7-bit environment, there is only GL -- no GR.  Therefore, all
characters are transmitted with their 8th bit removed, and shifts are used to
specify which intermediate set they belong to.

In fact, there are many possible ways to use the ISO 2022 code extension
facilities within both 7-bit and 8-bit environments.  For example, the sender
may inform the receiver in advance whether G1, G2, or G3 will be used, etc.
At the beginning of any particular data transfer, the facilities that actually
will be used can be announced using a sequence of the form <ESC><SP>F.  These
sequences are listed in Appendix B.


CHARACTER SETS

Many coded character sets are used in the transmission of textual data between
computers and terminals, in telegraphy, in the international Teletex system,
and in many other telecommunications applications.  From our point of view,
these fall into two categories: those that are standardized, well-defined, and
registered in the International Register of Coded Character Sets under the
provisions of ISO 2375, "Data Processing - Procedure for Registration of
Escape Sequences", under the authority of the ECMA (European Computer
Manufacturers Association).  And those that are not.

Kermit's ISO 2022 Transfer Syntax will be able to transfer files containing
REGISTERED 8-bit single-byte or multi-byte character sets.  Registration
implies that international standards bodies -- which typically include
representatives of major computer companies -- have agreed upon the character
set, so that variations are unlikely to occur.  There must be an unambiguous
way for the sender to identify each alphabet for the receiver, and
registration of an approved character set results in a unique designating
escape sequence.

Currently, there are several major categories of registered 8-bit character
sets: the ISO 8859 family of 8-bit alphabets, the CCITT telegraphy alphabets,
and the Asian multibyte alphabets.  These are summarized in Appendix C.


KERMIT FILE TRANSFER

The Kermit file transfer protocol currently defines two syntaxes for data
transfer:

TEXT, in which characters are represented in ASCII, and records are
  terminated by Carriage-Return Linefeed.  The sending program translates
  from local text storage format (e.g. EBCDIC card images) to CRLF-terminated
  ASCII records, and the receiving program translates from CRLF-terminated
  ASCII records into its own local text file storage format (e.g.
  LF-terminated ASCII records).  This is Kermit's default file transfer
  syntax, and it may also be selected by the Kermit command SET FILE TYPE
  TEXT.

BINARY, in which no transformations are done at all.  This mode must be
  requested explicitly by the Kermit user with the command SET FILE TYPE
  BINARY.

The original assumption was that the file transfer syntax would never change,
and would therefore be a function of the file type, text or binary.
Henceforth, this mode of operation will be called Kermit's NORMAL TRANSFER
SYNTAX.

Different computer systems and software packages have different conventions
for representing, storing, and displaying mixed-alphabet textual data.  Such
data can be transferred by Kermit using normal transfer syntax, but it will
only make sense when transferred to a system that uses the same
representational conventions (analogous to binary file transfer).

To transfer mixed-alphabet textual data between systems that use different
conventions, a new mechanism is required.  By specifying ISO-2022 TRANSFER
SYNTAX as the common intermediate representation, we ensure that any
particular Kermit program will only have to know about its own local file
formats and the standard transfer syntax -- it will never have to know
anything about file formats on other kinds of computers (in the same way that
Kermit's normal text file syntax works now).

The extension proposed here will allow a Kermit program that has specific
knowledge of the local file format (or formats) for storing multilingual or
multi-alphabet text to translate between these system- and
application-specific formats and the new syntax to be used during file
transfer.


SELECTING ISO-2022 TRANSFER SYNTAX

Kermit's default transfer syntax is NORMAL (meaning either ASCII text, or
binary, according to SET FILE TYPE).  Kermit's ISO-2022 transfer syntax
must therefore be enabled in some way, either automatically or explicitly by
the user.  In the automatic case, the Kermit program recognizes (somehow) that
it is to transfer a multi-alphabet text file.  In the manual case, the user
issues a SET command:

  SET TRANSFER-SYNTAX ISO-2022

It must also be possible to override the automatic use of ISO-2022 syntax
via the command:

  SET TRANSFER-SYNTAX NORMAL

The sending Kermit may inform the receiving Kermit of the selected transfer
syntax by means of the Kermit File Attribute (A) packet, whose use is
negotiated in the Kermit Initialization exchange.  There is an attribute "*"
(ASCII 42) which represents "encoding", with values like "A" for Normal
Kermit ASCII encoding, "B" for binary, "E" for EBCDIC (so far, never used).

The proposed new value for this attribute is "I", followed by one or more ISO
2022 announcer letters (the letters after <ESC><SP> shown in Table 4), for
example IA, IB, IC, etc.  The receiver can agree to accept the file or refuse
it using Kermit's attribute reply mechanism.  Refusal could occur because the
receiving Kermit does not support the ISO 2022 facilities announced by the
sender, or because the receiver does not support the ISO 2022 transfer syntax
at all.  To refuse, the receiver puts the character "N" in the data field
of its acknowledgement to the A packet, followed by the character "*" (along
with any other Kermit attribute designators it objects to).

If the receiver does not do attribute packets, then the sender may still elect
to send the file (with a warning to the user), as either a binary file or an
8-bit text file, for storing (and perhaps forwarding) purposes only.  In this
case, the file will be stored on the receiving computer in ISO-2022 transfer
syntax.

The advantage of using Attribute packets is that the sending Kermit can
automatically inform the receiving Kermit of the file transfer syntax, so that
the user does not have to type a SET command to both Kermits.  On a computer
system where the Kermit program can recognize the attributes and encoding of a
file automatically, this mechanism will allow files of different types (ASCII
text, binary, multi-alphabet text) to be sent together as a group, even
between unlike systems.  The drawback is that the attribute mechanism must be
programmed into a Kermit program that doesn't already have it.


DESCRIPTION OF KERMIT'S ISO-2022 TRANSFER SYNTAX

Transfer of a multi-character-set text file in ISO-2022 transfer syntax by
Kermit is similar to transfer of a 7-bit ASCII text file, except that it may
contain embedded control characters and escape sequences to identify and
switch between character sets.  The file sender translates the file's
characters (if necessary) into one or more registered alphabets, and
terminates lines of text (records) with CRLF, as in ASCII text mode.  The
file receiver translates from ISO-2022 transfer syntax into the format
demanded by the local system or application.  All of this occurs before
Kermit packet encoding by the sender, and after Kermit packet decoding by the
receiver, as shown in Figure 4.

_____________________________________________________________________________

                  +----------------------------------+
		  |  File data                       |
		  |     |                            |
Sending Kermit	  |  Conversion to transfer syntax   |
		  |     |                            |
	  	  |  Kermit encoding                 |
                  +----------------------------------+
		        |
		     Transmission of Kermit packets
		        |
                  +----------------------------------+
		  |  Kermit decoding                 |
		  |     |                            |
Receiving Kermit  |  Conversion from transfer syntax |
		  |     |                            |
		  |  File data                       |
                  +----------------------------------+

	Figure 4: ISO 2022 Transfer Syntax and Kermit Packet Encoding
_____________________________________________________________________________


ISO 2022 states that "at the beginning of information interchange, except
where the interchanging parties have agreed otherwise, all designations shall
be defined by use of the appropriate escape sequences, and the shift status
shall be defined by the use of the appropriate locking-shift functions."
Kermit programs should "agree otherwise" that the default G0 character set is
the US ASCII / ISO-646 / ECMA-6 7-bit set; thus ISO-2022 transfer syntax can
be identical to Normal Kermit transfer syntax when transferring 7-bit text
files.  There are no defaults for G1, G2, or G3, in the interest of fairness
to all countries and peoples.

When the text contains characters outside the ASCII alphabet, an escape
sequence from Table 1 must be issued, designating the alphabet to which they
belong (using the identification letters shown in Table 5) to the desired
intermediate character set G0, G1, G2, or G3.  This sequence must be given
before the first occurrence of a character in that alphabet.  If no such
sequence is given, then all characters are treated as ASCII data, including
<ESC>, the shift characters, and bytes with their 8th bits set to one.  In
other words, the file transfer behaves in the normal Kermit fashion for text
files.

ISO 2022 escape sequences are inserted into the data, and are
indistinguishable by the Kermit packet encoder/decoder from the data itself.
Therefore these escape sequences may be broken across packets, just as any
other data may be.


CHOOSING THE APPROPRIATE ISO 2022 FACILITIES

This proposal allows Kermit programs to use the full range of ISO 2022 code
extension techniques, including use of G0, G1, G2, and G3 in both the 7-bit
and 8-bit environments, with both single-byte and multibyte character sets.
In the general case, G0 will be used for ASCII and English, G1 for the "native
language" of the local country or region, G2 for a third language, and G3 for
a fourth.  Additional character sets may be swapped in and out of G2 and G3 as
required.

Transmission of 8-bit data in the 7-bit environment is accomplished by Kermit
using 8th-bit prefixing, which is an optional feature of the Kermit protocol.
However, most popular implementations of Kermit do include this feature.

If a Kermit program cannot do 8th-bit prefixing, then it must operate in
the ISO 2022 7-bit environment, shifting GL among the intermediate graphics
sets G0-G3.

If the Kermit program can do 8th-bit prefixing, the choice of the ISO 2022
7-bit or 8-bit environment is entirely independent of the communication
channel.  8-bit communication may be used on a 7-bit channel (in which case
Kermit does the required 8th-bit prefixing), or 7-bit communication can be
done on an 8-bit channel.  Or any other combination.  Selection of the ISO
2022 7-bit or 8-bit environment should be made on other grounds, such as
transmission efficiency or program simplicity.  For example, if the ISO 2022
8-bit environment is used on a 7-bit channel, then Kermit will have to do
8th-bit prefixing, which can be much less efficient than locking shifts.

Taken in their entirety, the ISO 2022 facilities are quite complex and may be
"overkill" for many applications.  Let's look at some specific examples in
which a subset will do.

1. ASCII text

Only a single G0 character set is active.  Normal Kermit ASCII text file
transfer syntax is used, in conjunction with normal Kermit packet encoding,
in which control characters are translated to printable characters (e.g.
Ctrl-A becomes #A).  In the 7-bit environment (when parity must be used on
the communication line), characters which have their 8th bit set to one are
transmitted with the 8th bit replaced by a parity bit, and prefixed by the
character "&" (ASCII 38).  In the 8-bit environment (no parity), the 8th bit
of the transmitted character (after control prefixing) is the same as the 8th
bit of the original data character.  No ISO 2022 escape sequences are
necessary, but if ASCII files are to be transferred when using ISO-2022
syntax, they may be prefixed with the announcer <ESC><SP>A to specify that
only the G0 set is used and <ESC>(B to designate the left half of ISO
8859/2 to the G0 set, and the attribute packet may contain "*IA".

2. A single ISO 8859 alphabet

This method will be quite common in countries like France, Germany, Italy,
Poland, the USSR, or Greece, where a single alphabet (such as ISO Latin 1,
ISO Latin/Cyrillic, ISO Latin/Greek) can be used to represent all text in the
native language (plus, in most cases, also English, and most computer
programming languages).

The choice of ISO 2022 facilities depends upon two factors: whether
transmission is to be in the 7-bit or 8-bit environment, and whether the
language is predominently "one-sided".  A one-sided language confines itself
mostly to either the G0 set (like English) or the G1 set (like Russian,
Greek, Hebrew, or Arabic).  A two-sided language jumps frequently between the
two sets (like French).

In the 8-bit environment, the G0 and G1 sets are used without shifts.  That
is, the alphabet should be transmitted in its full 8-bit form.  C0 and C1
characters are also transmitted as-is.

In the 7-bit environment, we must choose between Kermit's 8th-bit prefixing
and ISO 2022 locking shifts.  For left-sided languages (like English), the
question is largely irrelevant since few G1 character will be encountered.
For right-sided languages like Russian, it is clear that locking shifts will
result in far less transmission overhead than Kermit's per-character
8th-bit-prefixing.  For two-sided languages like French, Kermit's
8th-bit-prefixing is equivalent to an ISO 2022 single shift function, and will
probably result in less overhead than locking shifts.  The situation is
summarized in Table 6.

_____________________________________________________________________________

                      8-bit Environment           7-bit Environment
 Language Type   Assignments     Announcer    Assignments     Announcer

  Left-sided     G0->GL, G1->GR; <ESC><SP>C   G0->GL, G1->GR; <ESC><SP>C
  Right-sided    G0->GL, G1->GR; <ESC><SP>C   G0->GL, G1->GL; <ESC><SP>B
  Two-sided      G0->GL, G1->GR; <ESC><SP>C   G0->GL, G1->GR; <ESC><SP>C

 Table 6: ISO 2022 Facilities for Single Alphabet Text Transfer with Kermit
_____________________________________________________________________________


30-Mar-89 22:49:59-GMT,17722;000000000001
Return-Path: <cmg>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA19922; Thu, 30 Mar 89 17:42:48 EST
Date: Thu, 30 Mar 1989 17:42:48 EST
From: Christine M Gianone <cmg@watsun.cc.columbia.edu>
To: isokermit
Subject: ISO/Kermit Proposal Draft #3, Part 3 of 4
Message-Id: <CMM.0.88.607300968.cmg@watsun.cc.columbia.edu>


3. ISO 646 text

ISO 646 has many national variations, in which national characters are
substituted for ASCII brackets, etc.  Some examples are shown in Table 7.
When transferring such text between systems that use the same encoding,
normal ASCII text file syntax may be used (as is commonly done today).

_____________________________________________________________________________

              ASCII
Column/Row   Graphic  German      Finnish     Norwegian     French

  04/00         @     section     @           @             a grave
  05/11         [     A umlaut    A umlaut    AE diphthong  degree
  05/12         \     O umlaut    O umlaut    O slash       c cedilla
  05/13         ]     U umlaut    A circle    A circle      section  
  06/00         `     `           e acute     `             `
  07/11         {     a umlaut    a umlaut    ae diphthong  e aigu
  07/12         |     o umlaut    o umlaut    o circle      u grave
  07/13         }     u umlaut    a circle    a circle      e grave
  07/14         ~     ess-zet     u umlaut    ~             diaresis

		Table 7: ISO 646 Usage in Selected Countries
_____________________________________________________________________________

When transferring ISO 646 text with a system that uses a different encoding,
ISO 2022 transfer syntax should be used, along with the appropriate ISO 8859
alphabet, for instance ISO Latin 1 for German, Finnish, Norwegian, or French.
The special characters from Table 7 should be translated into the ISO 8859
alphabet equivalents.  In this case, the comments about the "sidedness" of
the language vs the 7-bit environment also apply.

Users should be cautioned to make a distinction between text documents and
computer program source.  For program source, normal Kermit ASCII text syntax
should be used (SET TRANSFER-SYNTAX NORMAL), otherwise programs in C, Pascal,
etc, will have their brackets, braces, not-signs, and logical ORs appear on
the target system as accented letters, etc.

4. Japan

Many Japanese computer systems use at least three character sets, Roman
(close to ASCII), Katakana (a 1-byte code), and Kanji (a 2-byte code).  Kanji
is specified in JIS X 0208, which also includes Roman, Hiragana, Katakana, and
some other character sets, but these are double width and not normally used.
Roman characters are usually taken from the left half of JIS X 0201, and
Katakana from the right half.  Japanese text frequently shifts between Roman,
Kana, and Kanji, and therefore requires three active character sets, for
example G0 (Roman), G1 (Kana), and G2 or G3 (Kanji).  In the 8-bit
environment, data transfer can be quite efficient: locking shifts are used to
shift GL between Roman and Kana, and any bytes with the 8th bit set to one
automatically invoke Kanji in GR as a multi-byte character set.  In the 7-bit
environment, locking shifts would also be used to select and Kanji.  Note that
locking shifts are more efficient in this case than Kermit 8th-bit prefixing
because Kanji characters consist of more than one byte.

It is an open matter whether the ISO 2022 7-bit or 8-bit environment should
be chosen by the programmer, based on the language, or whether the user
should be given the option of choosing the environment using a possibly
confusing SET command option.


ADDITIONAL ESCAPES

Since ISO 8859 character sets are subject to revision from time to time, an
alphabet selector may be preceded by <ESC>&F, where F is the revision number
(@ = 1, A = 2, B = 3, etc).  For example, <ESC>&@<ESC>-A means Latin Alphabet
Number One, Revision One.


LOCAL FILE REPRESENTATION

This proposal assumes nothing about the representation of the file on the
local storage medium.  It may be ASCII, EBCDIC, a proprietary word processor
format, IBM code page, or anything else.  It is an implementation "detail" for
Kermit programmer to convert between the local file representation for
multi-alphabet text files, and Kermit's file transfer syntax.

In some cases, the file itself (or its directory entry) might contain the
necessary identifying information, in which case the sending Kermit program
can automatically emit the appropriate escape sequences during file transfer.
In others, the user will have to tell the sending program how the file is
encoded.  The suggested command is:

  SET FILE TYPE <xxx>

where <xxx> specifies how the file is (or when receiving, is to be) encoded on
disk.  This will necessarily be highly dependent on the system's conventions,
or the conventions of the applications to be used with the file (e.g. a
multi-language word processing program).  Possibilities for <xxx> might
include application names like WORDPERFECT, XYWRITE, NOTA-BENE, MACWRITE,
ALEPH-BET, PC-HANGUL.

It may be that a file is encoded entirely in a single ISO-8859 alphabet, e.g.
Latin Alphabet No. 1, or Latin/Cyrillic, but the file itself contains no
information to that effect.  Therefore, it must be possible for the user to
specify the alphabet, independent of the application, using the new command:

  SET FILE CHARACTER-SET <xxx>

where <xxx> might be one of the following:

  LATIN1-ISO8      ARABIC-ISO8      IBM-CODE-PAGE-437
  LATIN2-ISO8      CYRILLIC-ISO8    IBM-CODE-PAGE-850
  LATIN3-ISO8      GREEK-ISO8       IBM-CODE-PAGE-865
  LATIN4-ISO8      HEBREW-ISO8      KANJI-SHIFJIS
  LATIN5-ISO8      KANJI-JIS        KANJI-EUC

The part before the dash is the name of the alphabet, and the "-ISO8"
says that the alphabet belongs to the ISO family of 8-bit character sets.
This allows for the possibility of other encoding methods for the same
languages, e.g. GREEK-DEC, where the Greek letters are taken from the DEC
technical character set.

If the local file is not encoded according to ISO 2022 rules, it may contain
<ESC>, <SO>, and <SI> characters.  It is up to the Kermit program to know
what these characters mean in the context of the file's format, and to either
strip them from the file or translate them to something else.  The ISO 2022
rules forbid the use of these characters as data to be transferred.


MISMATCHED CAPABILITIES

Each Kermit program should be informed -- either by SET command or through an
Attribute packet -- that ISO 2022 transfer syntax is in use.

Once the two Kermit programs have been instructed to use ISO 2022 syntax, it
is still possible that the sender will announce an ISO 2022 facility that the
receiver does not support, or will designate an alphabet that the receiver is
not familiar with.

If this happens, the receiver can cancel the file transfer by putting an "X"
in the data field of its acknowledgement to the data packet which contained
the unknown announcer or designator, and the sender will stop sending.  At
that point, the user can find some workaround like sending the file using
normal transfer syntax, etc.

To prevent useless data transfer, it is recommended that all announcers and
alphabet designators be transmitted at the beginning of the file, so that
cancellation can take place as early as possible.  The announcers should be
included in the data transfer even though they appear in the Attribute packet,
for two reasons:

  1. The receiver might not support Attribute packets.

  2. The receiver might want to store the file in the ISO 2022 transfer
     syntax, e.g. for display on a terminal, or for postprocessing by
     another program.


SPECIAL EFFECTS

Today, most multi-alphabet files are produced by proprietary text processing
programs.  These programs have many functions besides switching among
alphabets.  They may also endow text with special attributes such as boldface,
italic, underline, super- or subscript, color, etc, and render characters in a
variety of type styles and sizes.  Each text processing program may have its
own unique formats and conventions.

These special effects are not addressed by this proposal.  Nevertheless, it is
likely that a multi-alphabet file produced by a text processing program also
contains special effects.  In order for a Kermit program to send a
multi-alphabet file, it must have detailed knowledge of the file's format and
coding conventions.  Therefore, the Kermit program should be able to strip out
the special effects, and send only the text.  Otherwise the result would be
meaningless when received on an unlike system or for use with a different
application.  (When transferring such files between like systems or compatible
applications, Kermit binary mode transfers will suffice.)

At some future time, it might be possible to adapt one of the popular document
description languages to the Kermit protocol, so that Kermit will be able to
transfer formatted documents between unlike systems and applications.
Presently, there are many competing would-be standards including IBM DCA and
DIA, DEC DDIF, US Navy DIF, Postscript.  There are also two ISO standards
emerging in this area: Standard Generalized Markup Language (ISO 8879, 9069,
and 9573), and Office Document Architecture (ISO 8613).  This is an area for
further study.


ARCHIVING

The Kermit protocol includes a so-far little-used archiving function.  In this
mode, Kermit stores incoming file data together with the attribute packets
that precede it, so that the file can be retrieved and reconstituted on
another system at a later time.  In archive mode, the alphabet escapes and
shifts should not be interpreted by the receiving Kermit, but simply stored as
data.


FILE TRANSFER SYNTAX EXAMPLES

A simple 7-bit ASCII text file can be transmitted in the normal Kermit manner
for text files, without any escapes or shifts, even in ISO 2022 mode.

A text file containing characters from a language or languages covered by a
single ISO 8859 alphabet will require an <ESC>-F sequence to identify the
alphabet.  In the 7-bit environment, <SO> and <SI> are used to shift between
the G0 and G1 sets.  The following lines are all produce the same result:

  A dangerous German word is "gef<ESC>-A<SO>d<SI>hrlich".
  <ESC><SP>B<ESC>-AA dangerous German word is "gef<SO>d<SI>hrlich".
  <ESC><SP>B<ESC>-A<SI>A dangerous German word is "gef<SO>d<SI>hrlich".
  <ESC>&@<ESC>-A<ESC>(B<SI>A dangerous German word is "gef<SO>d<SI>hrlich".

In this case, the only extended character is the umlaut-a in "gefaehrlich"
(where ae is a way of writing umlaut-a without an umlaut).

For clarity and consistency with the ISO-2022 recommendations, the latter form
is preferred: the text begins with an announcement of the G0 and G1 sets in
use, including the version number, and then explicitly shifts into the G0 set,
rather than defaulting to it.

A text file containing characters from multiple ISO 8859 alphabets requires
an <ESC>-F sequence to identify each alphabet.  In the 7-bit environment, SO
and SI can be used to shift between G0 and G1 of the current alphabet, and
<ESC>(B can be used to select G0 of any of the alphabets, since these are all
the same.  For example, the following text contains the same word in English,
French, and Russian:

  <ESC>-A<SI>Disappointed, d<SO>ig<SI>u, <ESC>-L<SO>`PW^gP`^RP]]kY<SI>.

The first escape sequence assigns Latin Alphabet No. 1 to G1, and the
subsequent <SO> and <SI> shifts apply to its G0 and G1 set, which is used to
form the English and French words.  The second escape sequence assigns the
Latin/Cyrillic 96-character set to G1, and the subsequent shifts apply to this
new set.

Another 7-bit example, in which the same word is repeated in English,
Russian, and German, shows how a locking shift remains in effect when the
alphabet is changed.  We begin in Latin/Cyrillic, start with an English word
from G0, shift to G1 for the Russian word, and while still in G1 switch to
Latin Alphabet No. 1 for German to get the umlaut-A at the beginning of
Aenderung (where Ae = umlaut-uppercase-A), and shift back to G0 for the rest
of the word:

  <ESC>-LAlteration <SO>_U`UTU[ZP <ESC>-AD<SI>nderung.

The following example (contributed by Hirofumi Fujii) illustrates why 8-bit
operation is desirable when there are more than two character sets:

Usually, Japanese text files contain Roman, Kana, and Kanji characters.  In
this case, switching the character set in the 7-bit environment is very
expensive.  For example, suppose the following sentence contains characters
from all three sets:

            1234567890123456789012345678
            ----------------------------
            This is an English sentence.
            ----------------------------
            NNNNRKKRNNRRRRRRRRRNNNNNNNNR

where N is Kanji, K is Katakana and R is Roman (of course this is not
the real Japanese sentence but the character set in the sentence looks
like this).  In 7-bit enviroment, we usually assign: 

            G0:Roman, G1:Katakana

so the above sentence is translated as

<ESC>$BThis<ESC>(J <SO>is<SI> <ESC>$Ban<ESC>(J English <ESC>$Bsentence<ESC>(J.

28-byte text needs additional 20 bytes in this case!  In the 8-bit environment,
we usually assign at the beginning:

           GL=G0:Roman
              G1:Katakana
           GR=G3:Kanji

so the above sentence becomes

           This <LS1>is<LS0> an English sentence.
           ^^^^              ^^         ^^^^^^^^

where ^ means 8th bit = one (GR character set).  In this case, only 2 bytes
are required to switch the character set. (Note that the locking-shift
mechanism is required even in the 8-bit environment.)


TERMINAL EMULATION

While not part of the Kermit file transfer protocol, terminal emulation is a
feature of many Kermit programs.  It is hoped that these terminal emulators
will evolve along the lines of the ISO standards mentioned above.  In some
cases, this is already a fact, insofar as DEC VT200 and 300 series terminals
already follow these standards.

In this regard, it is important to note that not all languages are written
from left to right, top to bottom.  Hebrew and Arabic are two examples of
right-to-left languages, and Japanese and Chinese may be written top to
bottom.  The order of the text characters on disk or on the transmission line
do not necessarily reflect their order on the screen or the printed page.

Kermit should be as easy to use as possible, but should still give the user
the ability to specify exactly what character codes are in use for both
terminal emulation and file transfer.  There should also be a consistent set
of commands for all Kermit programs.

The following command should specify what character set is sent and received
on the transmission medium during terminal emulation.  The Kermit program must
translate between this character set and the one that is used locally.

SET TERMINAL CHARACTER-SET <name> [{GL, GR}]
  This command already exists, but is currently used only in MS-DOS Kermit, and
  only to switch between US and UK ASCII.  We should extend this command to
  select any character code, and to assign it to GL (default) or GR, and we
  should have a standard set of <name>'s including the currently defined ISO
  8-bit alphabets: 

    LATIN1-ISO8, ..., LATIN5-ISO8, CYRILLIC-ISO8, GREEK-ISO8,
    HEBREW-ISO8, ARABIC-ISO8, etc.

  7-bit ASCII and its national variants (ISO-646):

    ASCII-US, ASCII-UK, ASCII-FR, ASCII-DE, ASCII-IT, ASCII-NL, ASCII-ES,
    ASCII-DK, ASCII-FI, ASCII-IS, ASCCI-SE, ASCII-NO, ASCII-TR, etc.

  And for Japanese:

    KANJI-JIS, KANJI-SHIFTJIS, KANJI-EUC, etc.

For example, an MS-DOS computer might use SHIFTJIS locally, but a VAX
communicates using EUC, so the MS-DOS Kermit user would give the command SET
TERM CHAR KANJI-EUC.

For keyboard character input, in addition to the current per-key SET KEY
mechanism, there should be a way to assign an entire translation table to the
entire keyboard.  This command would be:

  SET TERMINAL KEYMAP <name> [{GL, GR}]


THE IBM PC

How can ISO 2022 transfer syntax be used on the IBM PC (USA version)?  It
happens that the original IBM PC (without graphics adapter) contains many of
the characters needed for Latin Alphabet 1 in its character ROM.  GL is
equivalent to ASCII, and GR has the accented vowels, etc, but in nonstandard
positions.  For example, the PC has A-umlaut in 08/14 (a C1 position!),
whereas Latin 1 has it in 12/04.  Therefore, translation tables must be
written to convert from Latin 1 to IBM PC, and vice versa.  The PC's character
ROM does not contain letters from other sets, so the PC would only be able to
handle ISO Latin 1.

PCs with certain graphics adapters (and all PS/2's), on the other hand, can
load different character sets from disk files into their character generators.
IBM calls these files "code pages".  USA Code Page 437 (the one used on the
original PC) was capable of supporting 5 languages, whereas the new
Multinational Code Page 850 can support 11 (according to the DOS 3.3 manual).
There are also special code pages for Portuguese, French-Canadian, and
Norwegian.  A file can be created or displayed using any one of these code
pages.  However, the file itself contains no information about which code page
was used, so it's up to the user to switch to the appropriate code page before
accessing the file.  For this reason, the Kermit program would need a SET FILE
CHARACTER-SET command.

There is no mechanism defined in IBM PC-DOS for switching code pages within a
file.  Therefore, mixed alphabet files are only possible within the private
environment of proprietary PC-based multilingual word processors.  A Kermit
program would need to know the details...

30-Mar-89 22:50:04-GMT,15421;000000000001
Return-Path: <cmg>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA19925; Thu, 30 Mar 89 17:43:37 EST
Date: Thu, 30 Mar 1989 17:43:37 EST
From: Christine M Gianone <cmg@watsun.cc.columbia.edu>
To: isokermit
Subject: ISO/Kermit Proposal Draft #2, Part 4 of 4
Message-Id: <CMM.0.88.607301017.cmg@watsun.cc.columbia.edu>


APPENDIX A: STANDARDS

(Also see Appendix C)

ANSI X3.4 (1986), "Coded Character Sets - 7-bit American Standard Code for
  Information Interchange" (US ASCII), is the 7-bit code currently used by
  Kermit for transferring text files. 

ISO 646 (1983) (= ECMA-6), "Information Processing - ISO 7-bit Coded Character 
  Sets for Information Interchange", gives us a 7-bit character set equivalent
  to ASCII with provision for substituting "national characters" in selected
  positions.

ISO 4873 (1986) (= ECMA-43), "Information Processing - ISO 8-bit Code for
  Information Interchange - Structure and Rules for Implementation", defines
  8-bit character sets, their graphic and control regions, and how to extend
  an 8-bit character set by using multiple graphics sets.

ISO 2022 (1986) (= ECMA-35), "Information Processing - ISO 7-bit and 8-bit
  Coded Character Sets - Code Extension Techniques", describes how to use
  8-bit character sets in both 7-bit and 8-bit environments, and how to switch
  among different character sets and alphabets.

JIS X 0202, "Code Extension Techniques for Use the the Code for Information
  Interchange", the Japanese counterpart of ISO 2022.

ANSI X3.41-1974, "Code Extension Techniques for Use with the 7-Bit Coded
  Character Set of the American National Standard Code for Information
  Interchange", describes 7- and 8-bit codes and extension techniques in
  approximately the same manner as ISO 4873 and ISO 2022.

ISO 8859 (1987-present) (see Table 5 for ECMA equivalents), "Information
  Processing - 8-Bit Single-Byte Coded Graphic Character Sets", defines the
  actual 8-bit character sets to be used for many of the world's languages.
  The left half of each of these is the same as ASCII and ISO 646.  Each
  character, including those with diacritics, is represented by a single byte.


ISO is the Internation Standardization Organization, ANSI is the American
National Standards Institute, ECMA is the European Computer Manufacturers
Association.

The ISO/ECMA standards discussed in this proposal may be obtained free of
charge in their ECMA form by writing to:

  ECMA Headquarters
  Rue du Rhone 114
  CH-1204 Geneva
  SWITZERLAND

Be sure to specify the title and the ECMA number of each standard requested.
We tried this ourselves, and got delivery within about two weeks.

ISO standards can also be ordered from the UN bookstore, but not for free:

  CCITT
  United Nations Bookstore
  United Nations Building
  New York, NY  10017

ANSI standards may be ordered, for a fee, from:

  Sales Department
  American National Standards Institute
  1430 Broadway
  New York, NY  10018


APPENDIX B:  ISO 2022 ANNOUNCERS

At the beginning of data transfer, the actual ISO 2022 facilities that will
be used may be announced by means of escape sequences.  Several of the most
important ones are described here.  Table 4 lists all the defined announcers
in summary form.  For details, see ISO 2022.

<ESC><SP>A means that only the G0 set will be used, invoked into GL.  No
  shift functions will be used.  In the 8-bit environment, GR is not used.
  In other words, only a single 7-bit character set is used.

<ESC><SP>B means the G0 and G1 sets will be used with locking shifts.  In the
  7-bit environment <SI> invokes G0 into GL, <SO> invokes G1 into GL.  In the
  8-bit environment, LS0 invokes G0 into GL, LS1 invokes G1 into GL.  In other
  words, two character sets are used, with characters from both sets always
  sent as 7-bit values, with locking shifts used to specify the 8th bit.

<ESC><SP>C means that G0 and G1 will be used in the 8-bit environment, with G0
  invoked in GL and G1 in GR.  No locking shift functions are used.  In other
  words, a single 8-bit character set is used, with all 8 bits transmitted as
  data.  GL is selected when the character's 8th bit is zero, GR is selected
  when the 8th bit is one.

<ESC><SP>D means that G0 and G1 will be used with locking shifts.  In the
  7-bit environment, <SI> invokes G0 into GL and <SO> invokes G1 into GL.  In
  the 8-bit environment, all 8 bits of each character are transmitted with no
  shifts.

<ESC><SP>L means that Level 1 of ISO 4873 will be used.  That is, a single
  8-bit character set with C0, G0, C1, and G1, with no shift functions.
  This is like <ESC><SP>C.

<ESC><SP>M means that Level 2 of ISO 4873 will be used.  This is equivalent
  to Level 1, with the addition of G2 and G3.  Characters from G2 and G3 are
  invoked only by the single-shift functions SS2 and SS3.

<ESC><SP>N means that Level 3 of ISO 4873 will be used.  This is equivalent
  to Level 2 with the addition of the locking shift functions LS1R, LS2R, and
  LS3R. (Note that ISO 4873 does not concern itself with the 7-bit
  environment, and therefore does not discuss the use of LS0, LS2, LS2, or
  LS3.) 

_____________________________________________________________________________

Esc Sequence  7-Bit Environment          8-Bit Environment 

<ESC><SP>A    G0->GL                     G0->GL
<ESC><SP>B    G0-(SI)->GL, G1-(SO)->GL   G0-(LS0)->GL, G1-(LS1)->GL
<ESC><SP>C    (not used)                 G0->GL, G1->GR
<ESC><SP>D    G0-(SI)->GL, G1-(SO)->GL   G0->GL, G1->GR
<ESC><SP>E    Full preservation of shift functions in 7 & 8 bit environments
<ESC><SP>F    C1 represented as <ESC>F   C1 represented as <ESC>F
<ESC><SP>G    C1 represented as <ESC>F   C1 represented as 8-bit quantity
<ESC><SP>H    All graphic character sets have 94 characters
<ESC><SP>I    All graphic character sets have 94 or 96 characters
<ESC><SP>J    In a 7 or 8 bit environment, a 7 bit code is used
<ESC><SP>K    In an 8 bit environment, an 8 bit code is used
<ESC><SP>L    Level 1 of ISO 4873 is used
<ESC><SP>M    Level 2 of ISO 4873 is used
<ESC><SP>N    Level 3 of ISO 4873 is used
<ESC><SP>P    G0 is used in addition to any other sets:
              G0 -(SI)-> GL              G0 -(LS0)-> GL
<ESC><SP>R    G1 is used in addition to any other sets:
              G1 -(SO)-> GL              G1 -(LS1)-> GL
<ESC><SP>S    G1 is used in addition to any other sets:
              G1 -(SO)-> GL              G1 -(LS1R)-> GR
<ESC><SP>T    G2 is used in addition to any other sets:
              G2 -(LS2)-> GL             G2 -(LS2)-> GL
<ESC><SP>U    G2 is used in addition to any other sets:
              G2 -(LS2)-> GL             G2 -(LS2R)-> GR
<ESC><SP>V    G3 is used in addition to any other sets:
              G3 -(LS2)-> GL             G3 -(LS3)-> GL
<ESC><SP>W    G3 is used in addition to any other sets:
              G3 -(LS2)-> GL             G3 -(LS3R)-> GR
<ESC><SP>Z    G2 is used in addition to any other sets:
              SS2 invokes a single character from G2
<ESC><SP>[    G3 is used in addition to any other sets:
              SS3 invokes a single character from G3

		     Table 4: ISO 2022 Announcer Summary
_____________________________________________________________________________


APPENDIX C:  CHARACTER SET STANDARDS AND DESIGNATION SEQUENCES


ISO 8859 defines a series of 8-bit character sets.  In each of these, the left
half (called G0 in this appendix) is the same as US 7-bit ASCII.  Because of
this, the left half of any ISO 8859 character set may be used to represent
English or any other Latin-alphabet language that can make do without
diacritical marks (e.g.  German without umlauts or ess-zet, Dutch with ij
considered two letters, etc.).

By convention, the G0 set can be selected with <ESC>(B.  When we say "by
convention" we mean that each of the ISO 8859 standards says to select the G0
set using this sequence, even if the G1 set (right half) is selected using
some other letter, like A, C, L, etc (see below).  Theoretically, <ESC>(A
could also be used to select the G0 set of "alphabet A", <ESC>(L could select
the G0 set of "alphabet L", etc.

Languages with special characters (i.e. non-ASCII graphics) must use specific
ISO 8859 G1 sets.  These sets are specified (to date) in ISO 8859-1 through
8859-9:

ISO 8859-1 is Latin Alphabet No. 1.  The right half (G1) contains all the
  special characters needed for Dutch, Faeroese, Finnish, French, German,
  Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish, and Swedish.
  Select G1 with <ESC>-A. 

ISO 8859-2 is Latin Alphabet No. 2.  G1 contains special characters for
  Albanian, Czech, German, Hungarian, Polish, Romanian, Serbocroation, Slovak,
  and Slovene.  Select G1 with <ESC>-B.

ISO 8859-3 is Latin Alphabet No. 3, for Afrikaans, Catalan, Esperanto,
  French, Galician, German, Italian, Maltese, and Turkish.  Select G1 with
  <ESC>-C.

ISO 8859-4 is Latin Alphabet No. 4, for Danish, Estonian, Finnish, German,
  Greenlandic, Lappish, Latvian, Lithuanian, Norwegian, and Swedish.  Select
  G1 with <ESC>-D.

ISO 8859-5 is the Latin/Cyrillic Alphabet, for Bulgarian, Byelorussian,
  Macedonian, Russian, Serbocroation, and Ukrainian (Comptible with USSR GOST
  Standard 19768-1987 and ECMA-113).  Select G1 with <ESC>-L.

ISO 8859-6 is the Latin/Arabic Alphabet.  Select G1 with <ESC>-G.

ISO 8859-7 is the Latin/Greek Alphabet.  Select G1 with <ESC>-F.

ISO 8859-8 is the Latin/Hebrew Alphabet.  Select G1 with <ESC>-H.

ISO DIS 8859-9 is Latin alphabet No. 5, in which six Icelandic letters from
  Latin Alphabet No. 1 were replaced by 6 letters needed for Turkish.  Select
  G1 with <ESC>-M.

OTHER CHARACTER SET STANDARDS:

ISO 646 (1983), "Information Processing - ISO 7-bit Coded Character Sets for
  Information Interchange", gives us a 7-bit character set equivalent to
  ASCII, and says we can substitute "national characters" for for ASCII
  characters #$@[\]^`{|}.  Different languages put different characters in
  these positions, and there's no mechanism defined to specify which language
  is being used.  ISO 646 is commonly used in Europe, and much confusion
  results from the substitution of national characters for brackets and other
  symbols that are used in programming languages like C.

CCITT T.61 (1984), "Character repertoire and coded character sets for the
  international Teletex service".  This is an extension of ISO-646 into the
  8-bit arena, but unlike ISO 8859, T.61 uses character combinations to
  represent letters with diacritical marks.  For example, the 2-byte sequence
  ^o would represent the single character o-circumflex.  The left half of this
  set is equivalent to ASCII and ISO 646, except that the following characters
  are left undefined: 05/12 (ASCII "\"), 05/14 (ASCII "^"), 07/11 (ASCII "{"),
  07/13 (ASCII "}"), and 07/14 (ASCII "~").  The right half contains currency
  signs, mathematical symbols, diacritical marks, and characters used in
  roman-alphabet languages that cannot be formed by combining A-Z with a
  diacritical mark (like Dutch "ij", Icelandic thorn, German Ess-Zet, etc).

ISO 6937, "Coded character sets for text communication".  ISO 6937/2-1987,
  "Latin alphabetic and non-alphabetic graphic characters" is the ISO
  equivalent of CCITT T.61.  The right half of this set may be selected using
  <ESC>-J.  Note that when this alphabet is used, special procedures must
  be used to translate between its two-byte sequences for accented letters and
  the single-byte representation of these characters in other sets.

JIS X 0201, "Code for Information Interchange", a 1-byte code containing
  ASCII in the left half and Japanese Katakana in the right half.  Select
  G0 with <ESC>-J, and G1 with <ESC>-I (See POSSIBLE PROBLEM, below).

JIS X 0208, "Code of the Japanese Graphic Character Set for Information
  Interchange", a 2-byte code containing Japanese Kanji, Katakana, Hiragana,
  Roman, Greek, and Russian characters, plus special symbols, etc.  Select
  with <ESC>$)B.

CAS GB 2312-80, Chinese.  ISO Reg 58.

KS C 5601 (1987), Korean.  ISO Reg 149.

ISO DIS 10646, "Multiple-Octet Coded Character Set".  This is a new standard
under development by Working Group 2 of ISO/IEC JTC1/SC2.  Formal drafts have
not yet been issued.  Its purpose is to provide a single character code for
all present-day languages throughout the world, with provision also for
technical and bibliographic documents.  According to a preliminary
description, the basic multilingual plane will permit easy interworking with
existing 8-bit codes, but all designation, invocation and shifting as in ISO
2022 will be avoided.  When this standard becomes formalized, it can be
incorporated into Kermit as a new transfer syntax.

CCITT is the International Telephone and Telegraph Consultative Committee,
GOST is the USSR standards organization, and JIS means Japan Industrial
Standard.

The alphabet selection escape sequences are registered in the International
Register of Coded Character Sets under the provisions of ISO 2375, "Data
Processing - Procedure for Registration of Escape Sequences".  The
registration authority is the ECMA, which periodically issues updates.  Some
registered character sets are shown in Table 5; the ISO Number is the number
of the ISO standard, ECMA Ref is the corresponding ECMA standard number, and
ECMA Registration is the ECMA character set registration number (currently
unused, but which will be incorporated into future revisions of ISO 8824,8825:
ASN.1).  The escape sequences shown (except in the ASCII entry) assign the
given set to G1.

There may also be "private alphabets", such as those found on DEC terminals.
In the DEC environment only, these may be selected using escape sequences
listed in the DEC manuals, e.g. <ESC>)> to select the DEC Technical
94-character set and assign it to G1.

_____________________________________________________________________________

 Alphabet Name             Esc Seq   ISO Number   ECMA Ref   ECMA Registration

  ASCII (ANSI X3.4-1986)    <ESC>(B   ISO 646      ECMA-6      ?
  Latin Alphabet No. 1      <ESC>-A   ISO 8859-1   ECMA-94     100
  Latin Alphabet No. 2      <ESC>-B   ISO 8859-2   ECMA-94     101
  Latin Alphabet No. 3      <ESC>-C   ISO 8859-3   ECMA-94     109
  Latin Alphabet No. 4      <ESC>-D   ISO 8859-4   ECMA-94     110
  Latin/Cyrillic            <ESC>-L   ISO 8859-5   ECMA-113    144
  Latin/Arabic              <ESC>-G   ISO 8859-6   ECMA-114    127
  Latin/Greek               <ESC>-F   ISO 8859-7   ECMA-118    126
  Latin/Hebrew              <ESC>-H   ISO 8859-8   ECMA-121    138
  Latin Alphabet No. 5      <ESC>-M   ISO 8859-9   ECMA-128    148
  Czech Standard            <ESC>-I   ?            ?           139
  Right Half, ISO 6937-2    <ESC>-J   ISO 6937-2   ?           142
  Math/Technical Set        <ESC>-K   ?            ?           143
  Chinese (CAS GB 2312-80)  <ESC>$)A  ?            ?           ?
  Japanese (JIS 0208)       <ESC>$)B  ?            ?           ?
  Korean (KS C 5601-1987)   ?         ?            ?           ?

     Table 5: Alphabets, Selectors, Standards, and Registration Numbers
_____________________________________________________________________________


POSSIBLE PROBLEM: There seems to be conflict between ISO/ECMA alphabet codes
and the codes used in Japan:

  Letter  Europe   Japan  
    I     Czech    JIS-Katakana
    J     ISO6937  JIS-Roman

THE END

30-Mar-89 20:32:22-GMT,6876;000000000411
Return-Path: <@cuvmb.cc.columbia.edu:ISO8859@JHUVM.BITNET>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA19127; Thu, 30 Mar 89 15:32:15 EST
Message-Id: <8903302032.AA19127@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA22424; Thu, 30 Mar 89 15:29:26 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 4600; Thu, 30 Mar 89 15:27:44 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 0924; Thu, 30 Mar 89 15:27:43 EST
Received: by BITNIC (Mailer X1.25) id 0137; Thu, 30 Mar 89 15:28:38 EST
Date:         Thu, 30 Mar 89 12:47:24 CST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: Michael Sperberg-McQueen <U18189%UICVM@cuvmb.cc.columbia.edu>
Subject:      query about overstruck characters in ISO 8859
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>

Johan van Wingen has pointed out several times in this forum that in ISO
8859, as opposed to ISO 6937, 646, and other earlier coded character
sets, it is illegal to use backspaces to overstrike two characters as a
method of obtaining a new character.  At least, that's what I understood
him to say.  ISO 8859-1 : 1987 (E) says (paragraph 7) "The use of
control functions, such as BACKSPACE or CARRIAGE RETURN for the coded
representation of composite characters is prohibited by ISO 8859."

I have two questions:  (1) just what sorts of activities are supposed to
be forbidden here? and (2) why?

To be more specific:  if I need to print a Serbo-Croatian word
containing a 'c' with an acute accent, I could probably do any of the
following things (depending on my system environment).  Which of them
are legal, and which illegal?  And can we construct a rationale for the
legality and illegality of each?  (= *should* they be legal?)

(a) embed the sequence 'c' BACKSPACE &acute. (hex 63 07 B4) in my file
(if I'm using an editor that allows me to embed backspace characters, as
some do and some don't) and let the printer, the display, and other
devices deal with it as best they can.  The display will probably show
me the acute, and the printer will do an overstrike, unless it's a line
printer, in which case I may get a variety of things but almost
certainly not what I want.

(b) use a Script command like ".dc bs <" and then use the combination
'c<&acute.' in my file.  Script will arrange to have the acute and the
'c' overstruck, either by issuing a backspace or by doing something
else.

(c) use the same Script command, and also define a Script symbol with
".sr cacute = 'c<&acute.'" or ".sr cacute = 'c&sysbs.&acute."  and then
in my file use "&cacute." instead of "c<&acute."

(d) use some relevant system facility (either in Script or in a
microcomputer word processor) to define the width of hex B4 as 0.  Then
send the sequence hex B4 63 to the printer.

(e) use the editor or some (imaginary) Script facilities to embed a
sequence like ESC '-' 'B' (hex 1B 2D 42) at the beginning of my file to
set up ISO 8859-2 as my G1 character set, and then in my file embed
SHIFT-IN X'B6' SHIFT-OUT (hex 0F B6 0E) for the acute-accented 'c'

(f) embed the ESC '-' 'B' sequence in some way, use Script's symbol
facility to define ".sr cacute = &x'0FB60E' " and then use "&cacute."
in my file as usual.

If I understand the text of paragraph 7, approach (a) is clearly in
violation of the spirit and letter of the standard.  What about approach
(b)?  In my file, I'm not using any control characters to create
composite characters:  only graphics.  I don't expect any editor to
resolve the multi-character encoding for me and display an accented 'c'.
But I am, I admit, using backspace or CR in the printer stream (or if
the printer is more sophisticated, maybe something even more devious).
Or perhaps I'm not.  I don't know what Script97 does with the Xerox
9700; all I know is that the ".sr" command given should give me
something resembling the character I want on my output.

Approach (c) is much the same as (b), except that a lot of these symbols
are already defined at installation.  Is it a violation of the standard
to use them, if they produce backspaces in the printer data stream?

Approach (d) avoids the backspace in the data stream, but probably
violates another part of paragraph 7:  "None of these characters are
<q>non-spacing<eq>."

Approach (e) and (f) sound as though they are what the standards
committee expects us to do.  But given that very few pieces of software
will handle such escape sequences, I am not sure what paragraph 7 can
mean or is supposed to mean for sites, developers, or end users.  If I
cannot use character 11/4 (acute accent) to form composite characters,
why is it there?  For use in mathematics to distinguish symbols (K and
K' = K-prime)?  In that case it would be far better to use slots 11/4,
10/8, 11/8, and 10/15 to include Turkish, and define another single
character set for all sorts of mathematical symbols.  ("Lead us
not into temptation.")

I imagine the point of paragraph 7 must be to say that extension of the
character set to handle things like accented 'c' should be done through
the extension techniques defined by other ISO standards, and not by
overstriking characters of the ISO 8859 sets.  In an ideal world, all
the equipment would support ISO 8859-1 through -9, and ISO 2022 and so
on.  But in the real world -- is it considered a violation of ISO 8859
to use non-standard code extension techniques in order to make
non-conforming equipment produce appropriate results?  Our printer
probably doesn't have a-umlaut as a separate character.  Is it a
violation of paragraph 7 to write a printer driver that reads character
14/4 from a file and sends an overstrike sequence including BACKSPACE to
the printer?  Would it be a violation if the printer driver translated
from ISO 8859 to ISO 6937?

Frankly, I find the blanket prohibition against use of BACKSPACE and CR
in paragraph 7 a bit confusing and don't believe I understand the logic
behind it.

I am involved in a large international project to formulate methods for
encoding literary and linguistic data in machine-readable form.  It is
important that we be able to recommend sound practice for encoding
diacritics.  To me, that means practice which agrees with relevant
standards.  But it is also essential that the recommended practice be
something that people can actually work with using the software that
exists.  So I am particularly interested in finding out what the
character set committee had in mind when they wrote paragraph 7.

-Michael Sperberg-McQueen
 Editor in Chief, ACH / ACL / ALLC Text Encoding Initiative
 University of Illinois at Chicago

30-Mar-89 23:54:00-GMT,1277;000000000001
Return-Path: <fdc>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA20163; Thu, 30 Mar 89 18:53:57 EST
Date: Thu, 30 Mar 1989 18:53:57 EST
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Cc: Christine M Gianone <cmg@watsun.cc.columbia.edu>
Subject: Re: query about overstruck characters in ISO 8859
In-Reply-To: Your message of Thu, 30 Mar 89 12:47:24 CST
Message-Id: <CMM.0.88.607305237.fdc@watsun.cc.columbia.edu>

We share your curiosity about the ISO8859 prohibition on composite
characters.  Not that it doesn't make sense -- ISO 8859 wants a character
to be a character, so that it is possible for character and string
oriented software to deal with text in a uniform way.  Hence ISO 8859
shuns the composite "character building" allowed by ISO 646, and *required*
by CCITT T.61.  Our curiosity, like yours, is about how mixed-alphabet
data is to be stored on disk.  This relates closely to an extension to the
Kermit file transfer protocol that we're working on, for transferring text
in mixed alphabets between unlike systems.  If you'd like to read & comment
on it, or want to be added to the "isokermit" discussion group, let us
know.  - Christine Gianone and Frank da Cruz

From @cunyvm.cuny.edu:RECK@DBNUAMA1.BITNET  Fri Mar 31 20:01:29 1989
Return-Path: <@cunyvm.cuny.edu:RECK@DBNUAMA1.BITNET>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA24487; Fri, 31 Mar 89 20:01:29 EST
Message-Id: <8904010101.AA24487@watsun.cc.columbia.edu>
Received: from CUNYVM.CUNY.EDU ([128.228.1.2]) by cunixc.cc.columbia.edu (5.54/5.10) id AA09964; Fri, 31 Mar 89 19:57:35 EST
Received: from DBNUAMA1.BITNET by CUNYVM.CUNY.EDU (IBM VM SMTP R1.1) with BSMTP id 2534; Fri, 31 Mar 89 19:59:46 EST
Date: Sat, 01 Apr 89 01:49:08 SET
To: isokermit@cunixc.cc.columbia.edu
From: RECK%DBNUAMA1.BITNET@cunyvm.cuny.edu
Comment: CROSSNET mail via SMTP@INTERBIT
Subject: Comments on ISO/Kermit proposal, 2nd draft

Date: 1 April 1989, 00:37:03 SET
From: Gisbert W.Selke           +49 228 225888       <RECK@DBNUAMA1.BITNET>
To:   ISOKERM at CUVMA
Re:   ISO/Kermit proposal, 2nd draft

Here are a few musings after reading the second draft; as usual, let me
start with what I'm uneasy about:

(i) In the less enthusiastic moments, I ask myself, "How likely is all this
ever to be implemented? Isn't it too much to expect an aspiring, or even an
accomplished, Kermit implementor to know about a whole bunch of proprietary
word processor formats?" With the advent of SGML, if ever, that may become
less of an obstacle, but I don't think we're near now... - So, why not do
the whole thing via extraneous filters, converting, on one side, from
BlunderWrite 4.F format to ISO-something, transmit that via ordinary Kermit,
and re-encode to WonderType 0.07 on the other side? That would only require
the standardization of multi-lingual text file transfer - which, as we see
right now, is a non-trivial task in itself -, but wouldn't put any further
burden on Kermit programmers. Writing such filters, on the other hand, is
not so hard a task for any moderately skilled programmer who has access to
the word processor specs; hence, the scheme is easily extended to any and all
word processor in the world. And, with many Kermits allowing to write macros
and/or scripts and run programmes from within, running such a filter may be
virtually transparent to the end user.
OK, so that gets us back in time, when we had to boo/uu/hexify binary files
in order to mail them. So what.
(Well, I'm not particularly sure if I really mean what I have written. Don't
crucify me.)

(ii) If, contrary to what I seem to suggest in (i), the translation
mechanism is included into Kermit implementations (as I expect it to happen),
should there be some standard syntax for telling Kermit to employ a
user-specified translation table that helps coping with a particular text
format that the implementor didn't know about? I'm not at all sure that
such a reasonably flexible mechanism could be specified; I'm thinking vaguely
of something like the 'input translation'/'set key ...' feature of MS-Kermit,
which has been of great help to us here in Germany (thanks, Joe!). The
input translation works on a character-per-character basis only, though;
that wouldn't be complex enough for text files, and that's where the problems
start. Anyway, giving the local wizards a chance to customize Kermit would
probably help.

(iii) The draft mentions the problem of mis-matched Kermit implementations,
i.e., two Kermits knowing about different subsets of ISO. If the subsets
are disjoint, then one can but fall back on plain ASCII (or forsake
readability on an unlike target system); but, in the case of partially
different subsets, we can do better. Imagine an originating Kermit knowing
about all of ISO, and a receiving Kermit not knowing about Latin-4 and
Latin-9. Then, if the sender tries to negotiate to send a, say Turkish
(or German) text encoded as Latin-9, the receiver will not accept, and the
whole transfer has to fall back on plain ASCII, losing all the national
characters on the way. - As an alternative, if, in the initial exchange
(say, the A packet) the sender lists all the pertaining variants it knows
about, then the receiver may choose one of those variants that it in turn
knows about, and, in the ACK, tells the sender which variant it actually
should use. So this mechanism is quite similar to the matching of
capabilities as negotiated in the CAPAS byte(s) - essentially, there should
be a way for the receiver to transmit information on its abilities back to
the sender. (This remains within the one-step negotiation scheme of Kermit -
no extended prattling is involved.)

(iv) On what grounds should a Kermit choose 7 or 8 bit environment: I don't
think it should be left to the programmer. MS-Kermit (and most others) is
written in the US, but is used in Western Europe, in Israel, in Japan,...
It should be left to the user, really - probably by tieing it to some
'set language' command, maybe with additional options to override this
standard (which I will use if and when I start writing dadaist poems
consisting mainly of umlauts).

(v) A minor point on the wording of the draft: the term 'n-bit environment'
seems to me to be used inconsistently in various places; e.g., in and near
to table 6, it refers to the properties of the communications path (table 6
talks about using <Esc><Sp>C in a 7-bit environment), whereas appendix B
uses the term with respect to the subsection of ISO 2022 that is being
employed (and, consequently, states that <Esc><Sp>C must not be used in
a 7-bit environment). This had me confused for a while; maybe it should be
made clearer that these are slightly different concepts, and that Kermit,
under certain conditions, provides a mock-8-bit environment as seen from
the ISO level.

(vi) Another, even lesser point: in appendix C, the use of 'left (rsp. right)
half' for G0 (rsp. G1) left me briefly puzzled, too, given that G1 may
reside in GL, etc. It would seem clearer to me if all talk about 'left'
and 'right' is dropped in this context.

In spite of the evidence I have just given, let me tell you that I like the
whole ISO/Kermit idea, and I did enjoy reading the second draft, which is
a big improvement on the first version (and I liked that one, too). Taking
the full power of ISO 2022 is certainly the right thing to do - it can be
done, and that way, we're less likely to outgrow the restrictions we put
on ourselves now. I appreciate your work, Chris and Frank!

\Gisbert          <reck@dbnuama1.bitnet>


From MURAKAMI@ntt-20.ntt.jp  Tue Apr  4 23:40:10 1989
Return-Path: <MURAKAMI@ntt-20.ntt.jp>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA13049; Tue, 4 Apr 89 23:40:10 EDT
Received: from ntt-sh.ntt.jp ([129.60.57.1]) by cunixc.cc.columbia.edu (5.54/5.10) id AA14324; Tue, 4 Apr 89 23:32:46 EDT
Received: by ntt-sh.ntt.jp (3.2/ntt-sh-03c) with TCP; Wed, 5 Apr 89 12:36:16 JST
Date: Wed, 5 Apr 89 12:36:13 I
From: ken-ichiro murakami <MURAKAMI@ntt-20.ntt.jp>
Subject: Re: ISO/Kermit Proposal Draft #2
To: isokermit@watsun.cc.columbia.edu
Cc: murakami@ntt-20.ntt.jp
In-Reply-To: <CMM.0.88.607300799.cmg@watsun.cc.columbia.edu>
Message-Id: <12483598785.15.MURAKAMI@NTT-20.NTT.JP>

Chris, Frank, Fujii-san and kermit wizards,

Your revised draft greatly helped me to understand ISO 2022 facilities.
Thank you. I think it's very important to clarify the background
technology and the reason why the Kermit standard representation is
adopted. I believe the draft is also helpful to extend other protocols
such as FTP.

Here is my opinion for the second draft and comments for the previous
messages on mailing-list. My opinion is based on the policy that
Kermit protocol and its command should be clearly structured(layered)
and simple.  I believe well structured system helps user to understand
its usage.

My opinion is summarized as follows;

 (1) simplify terminal emulation commands
 (2) separate coding into two categories, generic coding and
     application specific coding.
 (3) proposal for new command SET FILE FILTER to offer flexible
     interface for application specific coding
 (4) compatibility with conventional kermit program
 (5) 8bit transparency(8th-bit quoting) which should be ensured NOT as
     optional but as mandatory facility.

When I was writing my opinion, I asked myself a simple question.  "It
seems new facility makes kermit complex and lose kermit the simplicity
and the beauty. Is it possible to reduce the number of new command?"
If user requires only to convert files which are created by some
applications such as MACWRITE and WORDPERFECT, SET FILE FILTER command
might do everything.  How about non-standard Japanese Kanji coding?
It seems SET FILE FILTER command can do it. If ISO2022 is adopted
in Kermit, we have to modify both server and client kermit. Do we have
enough wizards who can modify them in Japan? .......
I'm still confused by these questions. I hope I can make my idea clear
in the next mail. Sorry.

I expect you to correct my proposal and idea.

-Ken

	Ken-ichiro Murakami
	NTT Laboratories
	Tokyo, Japan


1. TERMINAL EMULATION

>FROM: HIROFUMI FUJII <KEIBUN@JPNKEKVM.BITNET>
>
>1. JAPANESE CODE SYSTEMS:
>   JAPANESE COMPUTER SYSTEM HAS AT LEAST THREE CHARACTER SETS,
>   ROMAN (ALMOST ASCII), KATAKANA(1BYTE CODE), AND KANJI
>   (2byte code).  Kanji set also includes the Roman characters
>   but the face of the character is double width.  Therefore it
>   should be considered as different characters.
>   The local representation for these character sets are
>
>       OS           Roman       Katakana          Kanji
>      ------       ----------  -------------    -------------------
>      MS-DOS       JIS X 0201  JIS X 0201 in GR MS-Kanji (SHIFTJIS)
>      VAX/VMS      US-national  (see note 1)    DEC-Kanji (see note 2)
>      IBM/VM/CMS   EBCDIC      EBCDIC-Katakana  IBM-Kanji
>      UNIX         JIS X0201   EUC (see note 3) EUC (see note 4)
>                                                ^^^^^^^^^^^^^^^^
>      Elis         JIS X0201   EUC (see note 3) EUC (see note 4)
>
>  (Note 1) Invoked by LS2 (Locking shift two).
>  (Note 2) JIS X 0208 in GR (i.e., 8th bit on).
>  (Note 3) Invoked by SS2 (Single shift two) and 8th bit on.
>  (Note 4) JIS X 0208 in GR (i.e., 8th bit on).

In Japan, each vendor extended original UNIX with different Kanji
code. There are three kinds of Kanji UNIX systems now. There are no standard
Kanji code in Kanji UNIX.

       OS           Roman       Katakana          Kanji
      ------       ----------  -------------    -------------------
      MS-DOS       JIS X 0201  JIS X 0201 in GR MS-Kanji (SHIFTJIS)
      VAX/VMS      US-national  (see note 1)    DEC-Kanji (see note 2)
      IBM/VM/CMS   EBCDIC      EBCDIC-Katakana  IBM-Kanji
      UNIX         JIS X0201   EUC (see note 3) EUC (see note 4)
                               JIS X 0201 in GR MS-Kanji (SHIFTJIS)
                               JIS X0208(7bit)  JIS X0208(7bit code)
      Elis         JIS X0201   EUC (see note 3) EUC (see note 4)


>FROM: HIROFUMI FUJII <KEIBUN@JPNKEKVM.BITNET>
>
>TERMINAL EMULATION
>
>SET TERMINAL CHARACTER-SET <name>
>
>My Kermit (MSVP98) have another command, SET TERMINAL KANJI CODE <name>.
>The SET TERMINAL CHARACTER-SET specifies GL character set.  However,
>Kanji is mainly used in GR character set as described above.  This is
>because we need another command to specify the Kanji code.  There is one
>more reason we need another command.  It is the code for keyinput.
>SET TERMINAL KANJI CODE also used for keyinput character conversion.
>To unify these command, I propose
>
>  SET TERMINAL CHARACTER-SET <name> [as {GL,GR}]
>
>where the default is GL.  And for keyinput
>
>  SET KEYINPUT CHARACTER-SET <name> [as {GL,GR}]

Is it necessary to specify GL and GR parameter in separate manner?
This makes it complex to specify character code for terminal emulation.
Of course, this problem may be improved by MACRO. However, FOR
SIMPLICITY, we should not add new option which is not used so often.
How about to consider the CHARACTER-SET as a set of codes which consists
Roman, Katakana and Kanji. For example, when we interact
with UNIX, we issue SET TERMINAL CHARACTER-SET EUC command. The
parameter EUC means Roman=JIS X0201, Katakana=EUC and Kanji=EUC.

Usually, remote host uses the same character code for input and
output.  In addition, we don't type Kanji code directly.  In Japan, we
usually uses Front End Processor which is a device driver or a
resident program on PC and convert Roman or Katakana to Kanji code.
The converted Kanji code is passed to operating system such as MS-DOS.
This means it's not necessary to have SET KEYINPUT(KEYMAP) command.
So, SET TERMINAL CHARACTER-SET should be applied also for output from
PC. Even if you want to redirect and transmit characters not from
keyboard but from file(by TRANSMIT command), this approach could work
well.

In our environment, we are satisfied with the conventional SET
KEY command and we don't need yet another command to specify keyboard
character mapping.


>From: Joe Doupnik <JRD@cc.usu.edu>
>
>	The last doubt in my mind then relates to terminal emulation in a
>7 bit environment. Here the Kermit 8 bit quoting mechanism is not available.
>I think that it is not a difficult task to allow both 7 and 8 bit ISO shifts
>to be available in a terminal emulator, selected automatically by presence of
>parity and overridable by existing Kermit commands.

In Japan, we have character encoding standard JIS X0208. However, it's
NON-standard and one of Kanji codes. Actually, we have more than three
Kanji encoding non-standard and it's difficult to find JIS x0208
oriented machine.  So, we have to specify the Kanji encoding
explicitly in TERMINAL EMULATION as well as FILE TRANSFER.  This means
standard representation in communication channel is related BOTH
terminal emulation AND file transfer. Especially, Japanese Kermit
lovers are eager for the standardization.  Of course, it's possible to
use SET TRANSFER-SYNTAX command to specify character coding to
terminal emulator as well as file transfer to reduce the number of
kermit command set. But, Frank thinks that two commands (SET
TERMINAL CHARACTER-SET and SET TRANSFER SYNTAX) should be prepared for
terminal emulation and file transfer respectively.

>From: Hirofumi Fujii <KEIBUN@JPNKEKVM.bitnet>
>
>Terminal Emulator
>-----------------
>I think, in Kermit protocol, it is not necessary to say about the terminal
>emulator.  It is machine dependent and can be handled within the local
>routines.
>Actually, my Kermit (MSVP98) already has ISO-2022 features (supports
>G0, G1, G2 and G3 character sets, all locking-shift and single-shift
>mechanisms) within the scheme of MS-Kermit.  I have not modified any
>machine-independent routines of the MS-Kermit.  Joe has separated Kermit
>modules very nicely and clearly.

I know terminal emulator can support both ISO-2022 and EUC as Kanji.
However, we cannot specify SHIFTJIS code without SET TERMINAL
CHARACTER-SET.  Actually, we cannot use UNIX manufactured by SONY
without the command. So, it's necessary to implement SET TERMINAL
CHARACTER-SET command. We cannot do without it!


>The second DRAFT says:
>
>TERMINAL EMULATION
>
>In this regard, it is important to note that not all languages are written
>from left to right, top to bottom.  Hebrew and Arabic are two examples of
>right-to-left languages, and Japanese and Chinese may be written top to
>bottom.  The order of the text characters on disk or on the transmission line
>do not necessarily reflect their order on the screen or the printed page.

In our(Japanese) case, we usually write in left-to-right manner. So,
the order of text character will reflect their order. I don't know
the order in other languages.

>The following command should specify what character set is sent and received
>on the transmission medium during terminal emulation.  The Kermit program must
>translate between this character set and the one that is used locally.
>
>SET TERMINAL CHARACTER-SET <name> [{GL, GR}]
> This command already exists, but is currently used only in MS-DOS Kermit, and
>  only to switch between US and UK ASCII.  We should extend this command to
>  select any character code, and to assign it to GL (default) or GR, and we
>  should have a standard set of <name>'s including the currently defined ISO
>  8-bit alphabets: 
>
>    LATIN1-ISO8, ..., LATIN5-ISO8, CYRILLIC-ISO8, GREEK-ISO8,
>    HEBREW-ISO8, ARABIC-ISO8, etc.
>
>  7-bit ASCII and its national variants (ISO-646):
>
>   ASCII-US, ASCII-UK, ASCII-FR, ASCII-DE, ASCII-IT, ASCII-NL, ASCII-ES,
>    ASCII-DK, ASCII-FI, ASCII-IS, ASCCI-SE, ASCII-NO, ASCII-TR, etc.
>
>  And for Japanese:
>
>    KANJI-JIS, KANJI-SHIFTJIS, KANJI-EUC, etc.
>
>For example, an MS-DOS computer might use SHIFTJIS locally, but a VAX
>communicates using EUC, so the MS-DOS Kermit user would give the command SET
>TERM CHAR KANJI-EUC.

As I pointed out, it's better not to have [{GL, GR}]. Instead, we
should consider the specified parameter as a set of Roman, Katakana
and Kanji.

> The second DRAFT says;
>
>For keyboard character input, in addition to the current per-key SET KEY
>mechanism, there should be a way to assign an entire translation table to the
>entire keyboard.  This command would be:
>
>  SET TERMINAL KEYMAP <name> [{GL, GR}]

As I pointed out, we can do without KEYMAP command. If user want to
change keyboard mapping, the user should use macro which includes
several SET KEY commands.

2. PROPOSAL for NEW or EXTENDED KERMIT COMMANDS

In the DRAFT, several new commands are defined. In addition, there are
some extended conventional commands related to character encoding.
They are;

	SET FILE TYPE {TEXT|BINARY|WORDPERFECT......}
	SET FILE CHARACTER-SET {KANJI-EUC|KANJI-SHIFTJIS|....}
	SET TRANSFER-SYNTAX {NORMAL|ISO-2022}
	SET TERMINAL CHARACTER-SET {KANJI-EUC|KANJI-SHIFTJIS|....}
        ( I think we don't need SET TERMINAL KEYMAP command)

I'm confused by these commands, because it seems two level
representation(coding) are mixed. I think there are two protocol
layers in the proposal, (1) generic character coding and (2)
application oriented coding. We should distinguish these two layers
clearly. As for (1), it's common problem in all files and have
relation to communication channel. However, (2) is specific to some
word processor such as WORDPERFECT and considered as a upper layer on
(1). (1) is stable and clearly defined, but (2) is unstable since it's
application defined and new software may use other coding.  As for
(2), Kermit should be flexible. So, it's better to translate
presentation by filter in batch manner as Mr.Gisbert W.Selke said in
his mail. SET FILE TYPE command in the draft confused me, since the
command includes both (1) and (2) specification in one command.

So, I think we don't have to extend SET FILE TYPE command. Rather,
we propose new SET FILE FILTER command to convert application oriented
coding every time a file is transferred. Our proposal is as follows;


	SET FILE TYPE {TEXT|BINARY} specifies local file translation
		If TEXT is specified, character may be converted.
		If BINARY is specified, character is not converted.
	SET FILE FILTER {NONE|conversion-program-name}
		Specified program converts application oriented coding
		prior to transfer file in batch manner, if SET FILE TYPE
		TEXT is specified.
	SET FILE CHARACTER-SET {KANJI-EUC|KANJI-SHIFTJIS|....}
		specifies local file coding
	SET TRANSFER-SYNTAX {NORMAL|ISO-2022}
		specifies communication channel coding in file transfer
	SET TERMINAL CHARACTER-SET {KANJI-EUC|KANJI-SHIFTJIS|....}
		specifies communication channel coding in terminal emulation

We believe SET FILE FILTER command offers flexible interface to
application oriented coding. If two programs are necessary for receive
and transmit, we may need argument {RECEIVE|TRANSMIT} after the
program name. The following figure shows layers related to character
coding.


	+-------------+--------------------+
	| WORDPERFECT | XYWRITE | MACWRITE |
	|  (application specific coding)  |    
	+-------------+---------+----------+--------------------------+
	| TEXT file (coding is converted)  | BINARY (never converted) |
	+=============================================================+
	|                                                             |
	| 8bit-transparent channel compensated by 8th-bit quoting      |
	|                                                             |
	+-------------------------------------------------------------+
	|                                                             |
	|     raw transmission channel (non-transparent channel)      |
	|                                                             |
	+-------------------------------------------------------------+

		Kermit presentation layer


 As for SET TERMINAL CHARACTER-SET command, We could integrate SET
TRANSFER-SYNTAX command and SET TERMINAL CHARACTER-SET command if SET
TRANSFER-SYNTAX command is extended, since SET TERMINAL CHARACTER-SET
is considered as a coding specification in communication channel.
(This is described in detail in the following section.)


3. CONSIDERATION FOR THE CONVENTIONAL KERMIT PROGRAMS (COMPATIBILITY)

Indeed, we should make progress toward common coding system described
in the draft. Currently, there is no such implementation. However, such
automatic conversion function, Kanji coding conversion, is strongly
requested in Japan. NTT has already developed local Kanji coding
conversion facility and has distributed the improved Kermit in Japan.
This local conversion requires no modification on server side.

It's impossible to implement the proposed function in all
kermit server immediately. So, we must consider how we should
implement LOCAL Kanji coding translation and integrate these commands
with the proposed new commands for the present.

Consider what character coding is adopted in conventional kermit
transmission channel. It's server's coding. If the remote host is IBM,
the coding is IBM-Kanji. If the remote host is VAX, the coding on
transmission channel is KANJI-EUC. If the remote host is UNIX, the
coding may be KANJI-JIS(equivalent to ISO-2022). So, we can consider
it as if we specified command SET TRANSFER-SYNTAX
{KANJI-EUC|KANJI-IBM|ISO-2022(JIS)}. Local kermit can convert the
coding if the remote host's coding is specified by SET TRANSFER-SYNTAX
command.  Therefore, we propose to extend the command for the present.
Of course, these extended options will be deleted in then future. We
must consider initial file attribute negotiation for extended
parameters. (But, I have no idea now.)

	SET TRANSFER-SYNTAX {NORMAL|ISO-2022(JIS)|KANJI-EUC|KANJI-IBM...}

If the remote host Kanji coding is specified by this command, we can
utilize the information also to terminal emulation. This is because
the same Kanji coding is used both in file system and in interaction
with remote host. So, we can offer a macro to set coding for both SET
TRANSFER-SYNTAX and SET TERMINAL CHARACTER-SET. (Indeed, we can
integrate these two commands if file transfer and interaction use
the same coding in the future.  But, Frank recommended us to consider
these coding as different function. I understand his recommendation.
(If it's impossible to modify shell, we can use the same coding
for file transfer and terminal emulation. But, it's hard to modify shell.)


4. CONFLICT BETWEEN ISO/ECMA AND THE CODES IN JAPAN

In the second draft, the conflict is reported. According to an article
in a magazine in Japan, 4/7(G) and 4/8(H) are allocated to Swedish
Roman Character, 4/9(I) and 4/10(J) are allocated to Japanese Roman
Character. I'll ask a scientist in NTT who is an member of one of ISO
committee about this conflict and report it on this mailing-list.


5. 8bit transparency

The important function which kermit offers is that 8bit transparency
on any transmission channel. ISO 2022 also offers the same mechanism.
In the proposal, 7bit coding is allowed to keep transparency for
kermit which has no 8th-bit quoting facility.  In my opinion, kermit
should offer 8th-bit quoting by default.  This makes kermit layer
model simple. The 8bit transparent channel is considered as
a common base for both ISO2022 coding TEXT data and binary file data.
 If 7bit coding is employed for transparency, it violates natural
kermit layer model. We should separate transmission channel and data
encoding clearly. Some people has already pointed out that 8bit coding
make file transfer fast.  So, I think 8th-bit coding is suitable for
Kermit. However, 8th-bit quoting must be defined NOT as optional BUT
as mandatory function for 8bit encoding.

	+-------------------------------------+
	|   application oriented coding       |
	+-----------------------+-------------+
	| 8bit(7bit) ISO coding | binary data |
	+-------------------------------------+
	|      8bit transparent channel       |
	+-------------------------------------+
        |        non-transparent line         |
	+-------------------------------------+

6. Announcer letter in attribute packet

I agree with this beautiful negotiation. 
-------
-------

From MURAKAMI@ntt-20.ntt.jp  Fri Apr  7 09:43:04 1989
Return-Path: <MURAKAMI@ntt-20.ntt.jp>
Received: from ntt-20.NTT.JP by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA23205; Fri, 7 Apr 89 09:43:04 EDT
Date: Fri, 7 Apr 89 22:42:20 I
From: ken-ichiro murakami <MURAKAMI@ntt-20.ntt.jp>
Subject: conflict between ECMA/ISO alphabet codes and the codes in Japan
To: isokermit@watsun.cc.columbia.edu
Cc: murakami@ntt-20.ntt.jp
Message-Id: <12484233414.26.MURAKAMI@NTT-20.NTT.JP>

Chris and Frank,

In the second DRAFT, conflict between ISO/ECMA alphabet codes and the
codes used in Japan is pointed out. I asked Mr.Kouichi Suzuki in NTT,
who knows ISO specification very well, about it and confirmed that
there is NO conflict.

-Ken

	Ken-ichiro Murakami
	NTT Laboratories
	Tokyo, Japan

> 
> POSSIBLE PROBLEM: There seems to be conflict between ISO/ECMA alphabet codes
> and the codes used in Japan:
> 
>   Letter  Europe   Japan  
>     I     Czech    JIS-Katakana
>     J     ISO6937  JIS-Roman


(1) What is assigned by ISO/ECMA?

ECMA assigns both alphabet codes and escape sequences for it. Therefore,
even if the same alphabet code number is assigned to two code set, 
different escape code ensures its uniqueness.

(2) 94 character set and 96 character set

ISO2022 defines two kind of character set. One is 94 character code
set which doesn't include 2/0 and 7/15, the other is 96 character set
such as JIS X0201 and ISO646(IRV). Different escape sequence for
designation is used for 94 and 96 character set as follows;

	<ESC> 2/8 F	; designate 94 character set to G0
	<ESC> 2/9 F	; designate 94 character set to G1
	<ESC> 2/10 F	; designate 94 character set to G2
	<ESC> 2/11 F	; designate 94 character set to G3
	<ESC> 2/12 F	; designate 96 character set to G0
	<ESC> 2/13 F	; designate 96 character set to G1
	<ESC> 2/14 F	; designate 96 character set to G2
	<ESC> 2/15 F	; designate 96 character set to G3

Therefore, the designation is corrected as follows;

> Alphabet Name             Esc Seq   ISO Number   ECMA Ref   ECMA Registration
>
>  ASCII (ANSI X3.4-1986)    <ESC>(B   ISO 646      ECMA-6      ?
	Registration No. 6
		G0 set: <ESC> 2/8 4/2
		G1 set: <ESC> 2/9 4/2
		G2 set: <ESC> 2/10 4/2
		G3 set: <ESC> 2/11 4/2
>  Latin Alphabet No. 1      <ESC>-A   ISO 8859-1   ECMA-94     100
	Registration No. 100
		G0 set:  --- (not defined because this set is specified not to
			      be invoked into G0; see ISO 8859 part 1)
		G1 set: <ESC> 2/13 4/1
		G2 set: <ESC> 2/14 4/2
		G3 set: <ESC> 2/15 4/2
>  Latin Alphabet No. 2      <ESC>-B   ISO 8859-2   ECMA-94     101
	Registration No. 101
		G0 set:  --- (not defined because this set is specified not to
			      be invoked into G0; see ISO 8859 part 2)
		G1 set: <ESC> 2/13 4/2
		G2 set: <ESC> 2/14 4/2
		G3 set: <ESC> 2/15 4/2
>  Latin Alphabet No. 3      <ESC>-C   ISO 8859-3   ECMA-94     109
	Registration No. 109
		G0 set:  --- (not defined because this set is specified not to
			      be invoked into G0; see ISO 8859 part 3)
		G1 set: <ESC> 2/13 4/3
		G2 set: <ESC> 2/14 4/3
		G3 set: <ESC> 2/15 4/3
>  Latin Alphabet No. 4      <ESC>-D   ISO 8859-4   ECMA-94     110
	Registration No. 110
		G0 set:  --- (not defined because this set is specified not to
			      be invoked into G0; see ISO 8859 part 4)
		G1 set: <ESC> 2/13 4/4
		G2 set: <ESC> 2/14 4/4
		G3 set: <ESC> 2/15 4/4
>  Latin/Cyrillic            <ESC>-L   ISO 8859-5   ECMA-113    144
>  Latin/Arabic              <ESC>-G   ISO 8859-6   ECMA-114    127
	Registration No. 127
		G0 set:  --- (not defined because this set is specified not to
			      be invoked into G0; see ISO 8859 part 6)
		G1 set: <ESC> 2/13 4/7
		G2 set: <ESC> 2/14 4/7
		G3 set: <ESC> 2/15 4/7
>  Latin/Greek               <ESC>-F   ISO 8859-7   ECMA-118    126
	Registration No. 126
		G0 set:  --- (not defined because this set is specified not to
			      be invoked into G0; see ISO 8859 part 7)
		G1 set: <ESC> 2/13 4/5
		G2 set: <ESC> 2/14 4/5
		G3 set: <ESC> 2/15 4/5
>  Latin/Hebrew              <ESC>-H   ISO 8859-8   ECMA-121    138
	Registration No. 138
		G0 set:  --- (not defined because this set is specified not to
			      be invoked into G0)
		G1 set: <ESC> 2/13 4/8
		G2 set: <ESC> 2/14 4/8
		G3 set: <ESC> 2/15 4/8
>  Latin Alphabet No. 5      <ESC>-M   ISO 8859-9   ECMA-128    148
>  Czech Standard            <ESC>-I   ?            ?           139
	Registration No. 139
		G0 set:  --- 
		G1 set: <ESC> 2/13 4/9
		G2 set: <ESC> 2/14 4/9
		G3 set: <ESC> 2/15 4/9
>  JIS-Roman                 <ESC>-I   ?            ?            14
	Registration No. 14
		G0 set: <ESC> 2/8  4/9
		G1 set: <ESC> 2/9  4/9
		G2 set: <ESC> 2/10 4/9
		G3 set: <ESC> 2/11 4/9
>  Right Half, ISO 6937-2    <ESC>-J   ISO 6937-2   ?           142
	Registration No. 142
		G0 set:  --- (not defined because this set is specified not to
			      be invoked into G0; see ISO 6937 part 2)
		G1 set: <ESC> 2/13 4/10
		G2 set: <ESC> 2/14 4/10
		G3 set: <ESC> 2/15 4/10
>  JIS-Katakana              <ESC>-I   ?            ?            13
	Registration No. 13
		G0 set: <ESC> 2/8  4/10
		G1 set: <ESC> 2/9  4/10
		G2 set: <ESC> 2/10 4/10
		G3 set: <ESC> 2/11 4/10
>  Math/Technical Set        <ESC>-K   ?            ?           143
>  Chinese (CAS GB 2312-80)  <ESC>$)A  ?            ?           ?
>  Chinese (CAS GB 2312-80)  <ESC>$)A  ?            ?           58
>  Japanese (JIS 0208)       <ESC>$)B  ?            ?           ?
>  Japanese (JIS 0208)       <ESC>$)B  ?            ?           87
>  Korean (KS C 5601-1987)   ?         ?            ?           ?
> 
>      Table 5: Alphabets, Selectors, Standards, and Registration Numbers

-------

From @cuvmb.cc.columbia.edu:JPALME@COM.QZ.SE  Sat Apr  8 08:35:17 1989
Return-Path: <@cuvmb.cc.columbia.edu:JPALME@COM.QZ.SE>
Received: from columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA27194; Sat, 8 Apr 89 08:35:17 EDT
Received: from cuvmb.cc.columbia.edu by columbia.edu (5.59++/0.3) with SMTP 
	id AA06855; Sat, 8 Apr 89 08:35:13 EDT
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 8506; Sat, 08 Apr 89 08:35:23 EDT
Received: from SEARN.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 5206; Sat, 08 Apr 89 08:34:19 EDT
Received: from QZCOM by SEARN.BITNET (Mailer X1.25) with BSMTP id 1868; Sat, 08
 Apr 89 13:34:00 EDT
Message-Id:  <409118@QZCOM>
Date:        08 Apr 89 12:45 +0200
From: "Jacob Palme QZ" <JPALME%COM.QZ.SE@cuvmb.cc.columbia.edu>
Reply-To: "Jacob Palme QZ" <JPALME%COM.QZ.SE@cuvmb.cc.columbia.edu>
To: "ISO/Kermit Discussion Group"
              <isokermit@watsun.cc.columbia.edu>
Subject:     ODA/ODIF

The ISO standard for exchange of data between word processors
is called ODA/ODIF. If KERMIT is to include a facility for such
transfer, the natural way would be to send the text in ODA/ODIF,
and translate at either end to ODA/ODIF format.

From MURAKAMI@ntt-20.ntt.jp  Mon Apr 10 09:56:04 1989
Return-Path: <MURAKAMI@ntt-20.ntt.jp>
Received: from ntt-20.NTT.JP by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA09406; Mon, 10 Apr 89 09:56:04 EDT
Date: Mon, 10 Apr 89 19:44:55 I
From: ken-ichiro murakami <MURAKAMI@ntt-20.ntt.jp>
Subject: Kermit meeting report from Tokyo
To: isokermit@watsun.cc.columbia.edu
Message-Id: <12484987549.15.MURAKAMI@NTT-20.NTT.JP>


		KERMIT MEETING REPORT from TOKYO

						April 7, 1989

						Ken-ichiro Murakami
						NTT laboratories
						Tokyo, Japan

Kermit experts in Japan had a meeting in Tokyo on April 4 and discussed
the Kermit Extension for International Character Sets.  The meeting was
sponsored by DECUS Japan.  About 10 people were present.  This short
report summarizes their opinions and the controversial points.


DATE

	Tuesday, April 4, 1989, 15:20 pm - 17:20 pm
	DEC Japan Executive Room, Tokyo

MEMBER
	
	Akira Itoh		(University of Tokyo)
	Hirofumi Fujii		(National Lab. for High-energy Physics)
	Hideaki Mikami		(NTT Laboratories)
	Hideki Nakakita		(NICON Systems)
	Kazuhisa Ohta		(Nihon UNISYS)
	Koichi Nishimoto	(DECUS Japan)
	Kenji Rikitake		(University of Tokyo)
	Ken-ichiro Murakami	(NTT Laboratories)
	Mamoru Ushimaru		(University of Tokyo)
	Yutaka Ogawa		(NTT Laboratories)
	Youichi Kazama		(NTT Laboratories)


SUMMARY

In Japan, we have many non-standard Kanji codes and we are always bothered
by this confusion.  Therefore, we think it is important to have a standard
representation like ISO-2022 for file transfer.  In particular, this
standardization will bring us a convenient function, that is, automatic
Kanji code conversion.  To realize this function, we should consider
further the user interface (command set) and implementation.


CONTROVERSIAL POINTS

1. NEW COMMANDS for TERMINAL EMULATION

	Is it necessary to have a new SET TERMINAL KEYMAP command?
	Is it necessary to have an argument {GR|GL} for SET TERMINAL
        CHARACTER-SET?

    A: User interface should be as simple as possible.  This means we must
minimize the number of new commands.  Usually, the same Kanji code is used
both input and output.  Therefore, we don't need the KEYMAP command.

    B: Some UNIX machines use 8-bit EUC code as the internal Kanji
encoding.  However, these machines cannot receive EUC because of 8th-bit
non-transparency.  For such systems, it's necessary to have the KEYMAP
command.

    A: Even if the system uses both EUC and JIS as input and output
respectively, the terminal emulator can display both EUC and JIS
simultaneously.  If you issue SET TERMINAL CHARACTER-SET JIS, you can
receive both EUC and JIS.  Therefore, we don't need KEYMAP command.

    B: How about setting both CHARACTER-SET and KEYMAP automatically, when
user issues SET TERMINAL CHARACTER-SET command?

    A: You can receive both EUC and JIS simultaneously.  Would you really
use the KEYMAP command?  We should not add a new Kermit command if we
don't need it.

2. SET FILE FILTER is a good idea. But, is it allowed to use non-ISO-2022
code?

   Using ISO-2022 on the communication channel is a very good idea and we
should adopt ISO-2022 transfer syntax.  However, we also need the SET FILE
FILTER command until the ISO-2022 facility is implemented in the popular
Kermit programs.  We can convert Kanji code locally by this command for the
present.
   For local conversion, we have to specify Kanji code in remote host.
For this purpose, SET TRANSFER-SYNTAX will be used.  The current draft
specification only allows NORMAL and ISO-2022 as the argument.  We may
need additional arguments for other non-standard Kanji code such as EUC
and SHIFTJIS.


OPINIONS

1. We should keep Kermit commands simple.

Novice users are often confused by a bunch of Kermit commands.  (Since
experts have a mental model for Kermit, they tend not to notice novice
users' problems.)  So, we should keep the set of Kermit commands as small
as possible.  For example, the SET TERMINAL KEYMAP command should be
deleted, if we have an alternative way.

2. Is it possible for Japanese to modify Kermit for ISO2022?

We Japanese have a few Kermit experts and contributors.  Even if the
ISO-2022 transfer standard is adopted, we cannot expect somebody to modify
kermit for Japanese.  This means we cannot use ISO-2022 right away.  We
should also consider yet another way to convert Kanji code in local.  How
we can merge the ISO-2022 and the conventional local Kanji conversion?

3. We have the same requirement to convert special files created by
   application programs.

The popular Japanese word-processor ICHITARO creates special files like
WORDPERFECT.  Many users want to transfer these files and share them on
mainframes or workstations.  However, these files contain special format
control characters and require conversion prior to transfer.  For this
purpose, automatic conversion facility is desirable.  The SET FILE FILTER
command proposed by NTT will be convenient for this purpose.


CONCLUSION

We have not reached a conclusion.  We must consider how we can integrate
SET FILE FILTER command with SET TRANSFER-SYNTAX command and Attribute
negotiation.  ISO 2022 might bring us automatic Kanji conversion in the
future.  We have to find yet another way for local Kanji conversion which
can be merged with the ISO-2022 mechanism.


ACKNOWLEDGMENT

We would like to express our appreciation to Ms. Christine Gianone and
Mr. Frank da Cruz for their help and consideration for the Japanese Kanji
inconsistency problem.  We also express special thanks to Mr. Koichi
Nishimoto, Administrator of DECUS Japan, for supporting this meeting.


 ---< cut here >---


<< Questions and Comments from Chris and Frank. >>

Prior to delivering of this report, I asked Chris and Frank to correct
the English in my report. Thank you very much.  Chris and Frank also
gave me comments and questions. So, I'll answer them. Please note that
the answer is MY OWN opinion. Other members may have different
opinions.

>CONTROVERSIAL POINTS
>
>1. NEW COMMANDS for TERMINAL EMULATION
>
>	Is it necessary to have a new SET TERMINAL KEYMAP command?
>	Is it necessary to have an argument {GR|GL} for SET TERMINAL
>        CHARACTER-SET?
>
>    A: User interface should be as simple as possible.  This means we must
>minimize the number of new commands.  Usually, the same Kanji code is used
>both input and output.  Therefore, we don't need the KEYMAP command.

    [But the KEYMAP command might be useful for other reasons, too.  For
    example, in MS-DOS Kermit, you must enter many SET KEY commands to
    change the key map.  Then, if you want to switch to another language,
    you must enter many more SET KEY commands.  To switch back and forth
    between languages, you must have big macro definitions or TAKE files
    full of SET KEY commands, and you must execute them frequently.  A
    more convenient approach for language switching is to build several
    complete keymaps into the Kermit program, assign names to them, and
    give a command to conveniently select an entire keymap.  For example,
    SET TERMINAL KEYMAP EUC, SET TERMINAL KEYMAP NORWEGIAN, SET TERMINAL
    KEYMAP FRENCH...]

OK. Some of Kermit users may need this function. As Frank said before,
we need macro to set character code both in emulator and in keyboard
since the same code is used for input and output usually.


>    B: Some UNIX machines use 8-bit EUC code as the internal Kanji
>encoding.  However, these machines cannot receive EUC because of 8th-bit
>non-transparency.  For such systems, it's necessary to have the KEYMAP
>command.

    [We don't understand.  How does the terminal transmit 8-bit EUC
    keystrokes to the UNIX system in the 7-bit environment?  Does it use
    shifts like <SI> and <SO>?]

It's impossible to transmit 8bit-EUC keystrokes to the UNIX system in
the 7-bit environment. Therefore, we have already adopted 7bit ISO2022 in
communication channel such as TCP/IP and UUCP. However, I heard some
CRAZY machines offered inconsistent environment, that is, 7bit for
input and 8bit for output. I don't know these systems in detail.
Mr.Fujii at KEK might explain us about this strange story. :-)


>    A: Even if the system uses both EUC and JIS as input and output
>respectively, the terminal emulator can display both EUC and JIS
>simultaneously.  If you issue SET TERMINAL CHARACTER-SET JIS, you can
>receive both EUC and JIS.  Therefore, we don't need KEYMAP command.

    [How can the terminal display two character sets simultaneously?  How
    does it know which set an incoming character belongs to?  Are the
    codes compatible?]

It's easy to distinguish between EUC Kanji and JIS Kanji(JIS means
ISO2022 in 7bit environment.), because EUC Kanji doesn't overlaps with JIS
Kanji. As for KataKana and Roman ASCII, they use the same code.

>    B: How about setting both CHARACTER-SET and KEYMAP automatically, when
>user issues SET TERMINAL CHARACTER-SET command?

    [You mean that SET TERMINAL CHARACTER-SET should do two things: (1)
    assign a table that maps communication line input bytes to screen
    graphics, and (2) that assigns a table of transmission codes to the
    keyboard.  This is OK, so long as the user still has a way to change
    the keyboard layout to correspond to her or his typing preferences.]

Yes. We need macro to set both SET TERMINAL CHARACTER-SET and SET
KEYMAP simultaneously.

>2. SET FILE FILTER is a good idea. But, is it allowed to use non-ISO-2022
>code?
>
>   Using ISO-2022 on the communication channel is a very good idea and we
>should adopt ISO-2022 transfer syntax.  However, we also need the SET FILE
>FILTER command until the ISO-2022 facility is implemented in the popular
>Kermit programs.  We can convert Kanji code locally by this command for the
>present.

    [Do we understand the SET FILE FILTER command?  It seems to mean that
    a separate program must run to translate between the file format and
    the transfer syntax.  Of course, this command can only be used on
    multiprocessing computer systems like UNIX, where one program can run
    another one, and their input and output can be piped together.  On
    systems that can't do this, the user must run a preprocessor program
    before running Kermit, and then use Kermit to transfer the file in the
    normal way (TRANSFER-SYNTAX NORMAL).  If the file is very big, then
    this can be most inconvenient -- disks will fill up, processing time
    will triple, etc.  But on systems like UNIX, the SET FILE FILTER command
    is definitely a good idea.  It can be used not only for international
    characters, but for compression, etc.]

The filter program runs IN BATCH MANNER before file transfer or after
file transfer on single process system such as MS-DOS. As you pointed
out, this may inconvenient -- disks will fill up, processing time will
triple, etc. In this case, user should keep enough room for code
conversion. 
   After the meeting, we further considered this function and noticed
that powerful macro facility and take command might enable us to
implement the same function as SET FILE FILTER. (MS-DOS Kermit
supports the powerful macro facility.) This will never affect ISO2022
standardization. I'm trying to write such code conversion scenario
using MACRO and TAKE command.

>   For local conversion, we have to specify Kanji code in remote host.
>For this purpose, SET TRANSFER-SYNTAX will be used.  The current draft
>specification only allows NORMAL and ISO-2022 as the argument.  We may
>need additional arguments for other non-standard Kanji code such as EUC
>and SHIFTJIS.

    [We did not intend to imply that NORMAL and ISO-2022 were the only
    allowable transfer syntaxes.  Any REGISTERED character set should
    be transferred using ISO-2022.  Any UNREGISTERED character set,
    such as EUC or SHIFTJIS (or even HEXADECIMAL or EBCDIC), can be
    used as a transfer syntax by itself, without any inline control
    codes for alphabet selection or shifting.  So yes, SET
    TRANSFER-SYNTAX {EUC, SHIFTJIS} should also be allowed so that
    existing Japanese Kermit programs can interoperate with new ones.]

If we can implement local Kanji code conversion using macro and TAKE
command, our requirement never affect ISO2022 standardization.


>OPINIONS

>2. Is it possible for Japanese to modify Kermit for ISO2022?
>
>We Japanese have a few Kermit experts and contributors.  Even if the
>ISO-2022 transfer standard is adopted, we cannot expect somebody to modify
>kermit for Japanese.  This means we cannot use ISO-2022 right away.  We
>should also consider yet another way to convert Kanji code in local.  How
>we can merge the ISO-2022 and the conventional local Kanji conversion?

    [Do this by allowing new Japanese Kermit programs to support SET
    TRANSFER-SYNTAX {EUC, SHIFTJIS} so they can talk to old Japanese
    Kermit programs that support only these transfer syntaxes.]

Take command and macro may enable us to convert Kanji code in local.
Until ISO2022 is implemented for Japanese, we may be able to use this
local conversion facility.


>ACKNOWLEDGMENT
>
>We would like to express our appreciation to Ms. Christine Gianone and
>Mr. Frank da Cruz for their help and consideration for the Japanese Kanji
>inconsistency problem.  We also express special thanks to Mr. Koichi
>Nishimoto, Administrator of DECUS Japan, for supporting this meeting.

    [And we appreciate the efforts of the Japanese contingent, and their
    valuable contributions to this proposal.  The Japanese, with their
    multiplicity of character sets and their extensive practical
    experience in this area, are the "acid test" of any attempt to extend
    Kermit or any other data communications protocol to encompass multiple
    character sets.   - Christine and Frank]

Thank you very much for correcting my English. It's a good practice
for me.

-Ken (murakami%ntt-20.ntt.jp@relay-cs-net or murakami@ntt-20.ntt.jp)

-------

From jrd  Mon Apr 10 17:54:32 1989
Return-Path: <jrd>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA13825; Mon, 10 Apr 89 17:54:32 EDT
Date: Mon, 10 Apr 1989 17:54:31 EDT
From: "Joe R. Doupnik" <jrd@watsun.cc.columbia.edu>
To: isokermit
Cc: jrd
Subject: ISO-2022 end results
Message-Id: <CMM.0.88.608248471.jrd@watsun.cc.columbia.edu>

Chris, Frank, and the group:

	I think it is worth remembering that multi-language documents or
terminal sessions presume the ability to maintain and display two, and
sometimes more, alphabets. For printed output three or more alphabets is
an implementation detail and is similar to font changes for word processed
documents. For computer displays it is a more complicated problem because
those with which I am familiar have at most 256 bytes (GL and GR) of pattern
information (code byte + bit map) and we can't add a third ON THE SAME SCREEN
without hardware games or going to bitmapped graphics mode. Hopefully the
Japanese manufacturers have done better than 256 bytes. In the US some of the
major word processor vendors have decided that the only sensible way of
displaying fonts is to use graphics mode, a good stack of bitmap files, and
then ask for patience by the users as the screen is updated slowly.

	While ISO 2022 provides the tools to change alphabets at any point
there is a practical aspect of viewing files. The latter normally restricts
the display to a maximum of two fonts/languages, one each in GL and GR tables.
Many dot matrix or laser printers are freed from this limitation, at the cost
of downloading new fonts when needed. A rather smart printer program is needed
to accomplish this; mine is called WordPerfect.

	One conclusion to be drawn from this is taken from word processing
programs: when a given "font" is not available then substitute the nearest
equivalent character (by heuristics particular to each vendor). Adopting such
a strategy for file transfer is dangerous because the file is not being
reproduced precisely, and may not have warning messages about substitutions.
It can be useful for terminal emulation however.

	The second conclusion is that we might focus on two character sets
at a (long) time for terminal emulation, with substitutions, and let file
transfers employ the full range of ISO-2022 as required.

	Finally, with terminal emulation it is convenient for echos of what
we send to appear on the screen in the expected form rather than sending one
form and receiving a different mapping. Thus, keyboard output needs to track
the received language.

	Joe Doupnik

From JRD@cc.usu.edu  Mon Apr 10 18:15:11 1989
Return-Path: <JRD@cc.usu.edu>
Received: from cc.usu.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA14048; Mon, 10 Apr 89 18:15:11 EDT
Message-Id: <8904102215.AA14048@watsun.cc.columbia.edu>
Date: Thu, 6 Apr 89 21:03 MDT
From: Joe Doupnik <JRD@cc.usu.edu>
To: isokermit@watsun.cc.columbia.edu
X-Vms-To: IN%"isokermit@watsun.cc.columbia.edu",JRD


Chris, Frank, and the group:

	Ken-Ichiro Murakami <MURAKAMI@ntt-20.ntt.jp> brought up some important
points on the command set of Kermit. I agree that there should be separate
commands for file transfer and terminal emulation. I think we are trying to
arrange file transfer commands so that SET FILE TYPE manages record delimiters
and Columbia would like this to be extended to manage more general file
constructions such as found in applications programs; both uses are file
system items. It is appropriate to include applications program names in this
command because the file format is of concern, even though it does strongly
involve elements of alphabet (or font) changing.

	The purpose of the command SET TRANSFER-SYNTAX is to manually inform
the other machine that a polyalphabet file is being transferred using ISO-2022
methods and thus allow the host to be aware of and to translate ISO-2022 to
perhaps another storage form. It is a user override since the file reader may
not be able to decide independently to deliver characters literally or in ISO
forms. Clearly, normal Kermit quoting mechanisms can manage both forms on the
communications link, but the receiving host may need advance notification
about how to store text in it's own local format. The command can also be used
to enable the local Kermit's file reader to use ISO conventions.

An excerpt from draft proposal #2:
> SELECTING ISO-2022 TRANSFER SYNTAX
>     
> Kermit's default transfer syntax is NORMAL (meaning either ASCII text, or
> binary, according to SET FILE TYPE).  Kermit's ISO-2022 transfer syntax
> must therefore be enabled in some way, either automatically or explicitly by
> the user.  In the automatic case, the Kermit program recognizes (somehow) that
> it is to transfer a multi-alphabet text file.  In the manual case, the user
> issues a SET command:
>      
>   SET TRANSFER-SYNTAX ISO-2022
>      
> It must also be possible to override the automatic use of ISO-2022 syntax
> via the command:
>      
>   SET TRANSFER-SYNTAX NORMAL

	Personally, I think we should be very cautious about adding more than
NORMAL or ISO-2022 to the list of methods. ISO is supposed to allow conversion
from almost any to almost any other representation and thus reduce the
"N by N" problem (where each side understands how to convert all the forms
of the other side).

	On mandatory eight bit quoting capabilities: it would certainly
simplify matters. Tough on long departed developers.

	And I tend to agree with Gisbert that files from applications programs
are best managed with standalone filters. As a practical matter, adding
executable code to a Kermit program during operation of the program is not an
easy thing to accomplish. We should not overlook the chance to understand a
few widely accepted file formats, so each Kermit implementation may include
some or let the user do the filtering externally.

	Terminal emulation ought to be separated from file transfer, in my
opinion. A diverse collection of files may be transferred via ISO methods
while the terminal communications language might be unrelated to the files.
We do need some method of permitting the local user to define the contents of
the display adapter (or equivalent) for both Left and Right tables. Thus GL
and GR qualifiers in SET TERMINAL CHARACTER-SET <name> are convenient options.

	Translation of keystrokes is still a problem for me and requires more
information from persons most affected. One concept I try to keep in mind
is that an echo of what we transmit ought to appear "correctly" on the local
screen. Curiously, ISO-2022 says a great deal about displays but nothing about
keyboards. Somehow we need to transmit character codes representing both GR
and GL tables (without memorizing lots of escape codes). Defining a few
specialized keys to transmit the ISO table shift codes is possible and would
be similar to operating in an outgoing 7-bit environment and an incoming 8-bit
one. After all, the keyboard maps to only one table at a time. Keyboard
questions should not impact the proposal before us, however.

	Joe Doupnik

From lts!amanda@uunet.uu.net  Tue Apr 11 09:47:23 1989
Return-Path: <lts!amanda@uunet.uu.net>
Received: from uunet.UU.NET by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA20330; Tue, 11 Apr 89 09:47:23 EDT
Received: from lts.UUCP by uunet.UU.NET (5.61/1.14) with UUCP 
	id AA01767; Tue, 11 Apr 89 09:47:18 -0400
Received: by lts.UUCP (4.12/2.881128)
	id AA01856; Tue, 11 Apr 89 08:08:47 est
Date: Tue, 11 Apr 89 08:08:47 est
From: Amanda Walker <lts!amanda@uunet.uu.net>
Message-Id: <8904111308.AA01856@lts.UUCP>
To: isokermit@watsun.cc.columbia.edu
Subject: Display vs. File Transfer

Joe,

While I think you have made some excellent points, I'm not so sure that we
should limit ourselves to two-character-sets-at-a-time.  There seem to me
to be three major classes of displays out there that we need to worry about:

 1. Hardware character generator, perhaps with some non-vanilla ASCII
    characters available.  This includes most terminals, the IBM MDA & CGA,
    and most terminal emulators for window systems.

 2. Software programmable character generator with a limited number of
    slots.  This includes the IBM EGA & VGA, the Hercules RAMfont boards,
    the DEC VT220/240/320/340, and so on.

 3. True graphics displays.  This includes the Macintosh, Amiga, a PC running
    Windows or the Presentation Manager, and most workstations (in theory,
    anyway...).

On class one machines, you do the best you can.  On, say, an IBM MDA/CGA,
the ROM character generator does have a fair amount of ISO 8859/1 in it;
it's not complete, but you do get most of the characters with diacritical
marks, which makes it more useful than nothing.

I think it would be a mistake to cripple Kermit implementations on machines
that fall into the latter two classes in order to accomodate machines of the
first one.  There will always be implementation limits--for example, very
few machines will able to display the full JIS Kanji set--but I think these
should be left up to the implementation, rather than being part of the spec.

There are also tricks you can play with machines in class 2, which are the
ones you seemed concerned about.  On a machine with an EGA/VGA/MCGA/Hercules
card, you get two full 256-slot character sets.  Nothing says that each
of these must be a single ISO 2022 character set...  For example, I wrote
the EGA support for my company's PC telnet package, which emulates a VT220.
It leaves the default character set alone, and loads a secondary character
set with only those characters in the VT100 line-drawing set and ISO 8859/1
that don't appear in the ROM font.  So far it handles VT100 graphics, DMCS
(DEC Multinational character set, which almost but not quite ISO 8859/1),
and ISO 8859/1, and there's still a fair amount of room left.  If things
get cramped (which they will as we add character sets), I may end up
loading a new set into slot 0 as well, but so far that's been unnecessary.

On class 3 machines, of course, it's not a problem.  This is where things
seem to be going, as well.  I, for one, would be happy to run my IBM PC/AT
in graphics mode if I needed to see all of the characters I was using.
No, it's not as speedy as text mode, but it's not all that shabby (esp.
if you're not running under Windows :-)).

One of the nice things about Kermit so far has been that it has taken more
of a "Greatest Common Factor" philosphy than a "Least Common Denominator"
one, and I don't think we should stop now.


Amanda Walker
InterCon Systems Corporation

P.S.  I can't donate any code to the project, but I'd be happy to send
the PC folks a copy of the supplemetary font I mentioned above, if they're
interested.  Just let me know where I should mail it...

From @cunyvm.cuny.edu:FI@NORUNIT.BITNET  Wed Apr 12 13:56:08 1989
Return-Path: <@cunyvm.cuny.edu:FI@NORUNIT.BITNET>
Received: from CUNYVM.CUNY.EDU by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA04421; Wed, 12 Apr 89 13:56:08 EDT
Message-Id: <8904121756.AA04421@watsun.cc.columbia.edu>
Received: from NORUNIT.BITNET by CUNYVM.CUNY.EDU (IBM VM SMTP R1.1) with BSMTP id 0575; Wed, 12 Apr 89 13:55:57 EDT
Date: Wed, 12 Apr 89 16:10:44 ECT
To: isokermit@watsun.cc.columbia.edu
From: FI%NORUNIT.BITNET@cunyvm.cuny.edu
Comment: CROSSNET mail via SMTP@INTERBIT

To the ISO-kermit list administration:

   This list seems to be lacking a '-request' address, which forces me
   to distribute this message to everybody.  My apologies for this.

   Please remove me from the list until further.

Frithjov Iversen


From mcvax!krafla!frisk@uunet.uu.net  Wed Apr 12 21:14:36 1989
Return-Path: <mcvax!krafla!frisk@uunet.uu.net>
Received: from uunet.UU.NET by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA08362; Wed, 12 Apr 89 21:14:36 EDT
Received: from mcvax.UUCP by uunet.UU.NET (5.61/1.14) with UUCP 
	id AA24988; Wed, 12 Apr 89 21:14:29 -0400
Received: by mcvax.cwi.nl via EUnet; Wed, 12 Apr 89 20:14:16 +0200 (MET)
Received: by hafro.is (5.57/smail2.5/08-08-88)
	id AA24627; Wed, 12 Apr 89 13:09:08 GMT
Received: by rhi.hi.is (13.1/smail2.5/03-10-88)
	id AA28781; Wed, 12 Apr 89 13:02:31 gmt
From: mcvax!rhi.hi.is!frisk@uunet.uu.net (Fridrik Skulason)
Message-Id: <8904121302.AA28781@rhi.hi.is>
Subject: 8859/1 kermit in Iceland
To: isokermit@watsun.cc.columbia.edu
Date: Wed, 12 Apr 89 13:02:29 GMT
X-Mailer: Elm [version 2.1 PL1]


Just a few random thoughts...

   Here in Iceland we have been using kermit with ISO 8859/1 translation
   for almost three years now. This was done since our national language
   contains 10 characters outside the 7-bit ASCII character set, and we
   wanted to simplify the file transfer process.

   The changes:

   A) File transfer:

	The command SET FILE TYPE {BINARY|TEXT} was added for Kermit
 	versions that did not support it before (like IBM PC).

	No changes were made in the BINARY case, but when transmitting
	TEXT files, automatic translation to ISO 8859/1 was done. Also,
	when receiving TEXT files, translation from ISO 8859/1 to the
	native character set was done automatically.

	So, when transmitting text files from an IBM PC using Code Page 861
	(861 is the Icelandic PC character set) to a VAX using DEC-
	Multilingual character set (which is almost, but not quite ISO 8859/1),
	the PC translated the file to ISO 8859/1 and the VAX translated
	the file to DEC Multinational.

	Natic character set A ----> ISO 8859/1 ----> Native character set B

	We also changed the Macintosh Kermit in a similar way, but in other
	cases (HP 9000 and ATARI ST) no changes were needed, since those
	machines use the ISO 8859/1 character set here in Iceland.

        The only problem that we have run into is that some characters can not
        be represented in ISO 8859/1. The most common problem is when people
        try to transmit files containing PC line/box drawing characters. This,
        however has not been a serious problem.

   B) Terminal emulation.

	A new command:

	    SET TRANSLATION {NONE | ISO | CP850 | DEC-MULTI | ROMAN-8}

	was added to the IBM PC Kermit.

	It was used to specify what sort of translation should be applied
	to incoming/outgoing characters while in terminal emulation mode.

	An important thing to note is that this translation command is
	totally independent of the file translation.

	All incoming characters that could not be translated to Code Page 861
	were displayed as character 168 (upside-down question mark)
	

From MURAKAMI@ntt-20.ntt.jp  Thu Apr 13 02:41:30 1989
Return-Path: <MURAKAMI@ntt-20.ntt.jp>
Received: from ntt-20.NTT.JP by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA10524; Thu, 13 Apr 89 02:41:30 EDT
Date: Thu, 13 Apr 89 15:37:37 I
From: ken-ichiro murakami <MURAKAMI@ntt-20.ntt.jp>
Subject: Simple Macro commands for local (Kanji) code conversion
To: isokermit@watsun.cc.columbia.edu
Cc: murakami@ntt-20.ntt.jp
Message-Id: <12485718037.27.MURAKAMI@NTT-20.NTT.JP>

Hi.

   I wrote macros to convert Kanji in batch manner. This was the first
experience for me to write Kermit macro definition. I found this
facility is powerful enough to handle a file. However, It's impossible
to handle multiple files using conventional macro facility.
Therefore, I tried to utilize MS-DOS batch command such as FOR command
in vain. The FOR command allows only a command as its argument. So,
I'm compelled to invoke yet another batch command file. This caused a
problem. After yet another batch file is executed, variable which is
used in FOR command is cleared. So, I gave up to use FOR batch
command.
   To handle multiple files specified by wild card, we need the
following new macro commands
	(1)  FOR command like MS-DOS batch command
	     For example,

		define send-all FOR \%f IN (*.ASM) ksend %f
		define ksend send \%1

	     Of course, file expansion command (*.ASM) is necessary
	     in the FOR command.

	(2) STRING-SEARCH command to check if the specified file contains
            wild card or not.
	    For Example,

		define check-wild if string-search \%1 * goto check,-
			          if string-search \%1 ? goto check,-
				  goto ok,-
				  :check, if defined \%2 goto error,-


            This is used to check parameters specified in commands such as
            (K)SEND and (K)GET. If the first argument contains wild card,
            the second parameter should not be specified.

	(3) clear the variables \%1 .. \%9 
	    In the conventional Kermit macro, parameters for macro command
	    such as \%1 is not cleared. It caused a problem.
            It's impossible to decide whether user specified macro
            parameters of not. For example, the following command sequence
	    makes the problem.

		MS-KERMIT>KSEND foo.bar qwe.asd
		MS-KERMIT>KGET kkk.kkk

	    The second command will rename the file kkk.kkk to qwe.asd.
	    If it's impossible to clear these variables, we must have
	    a convention to clear them. I defined such macro to clear
	    variables after every macro execution.

   My experience shows that it's possible to make macro commands for
code conversion. They can convert special characters created by
specific application programs such as WORDPERFECT. (Under condition
that  (1), (2) and (3) are supported.)
   This makes it possible to consider ISO2022 transfer convention and
application oriented character coding conversion separetely. So, I
take back my proposal, that is, SET TRANSFER-SYNTAX argument
extension. Only {NORMAL|ISO2022} is OK. As for SET FILE
{TEXT|BINARY|WORDPERFECT..} command, it's desirebale to separate
application oriented file coding such as WORDPERFECT. If these
application name is inlcuded in the command, we must modify the
command every time a new program appears. Rather, we should process
these special files by macro. So, I propose not to extend SET FILE
argument. {TEXT|BINARY} is enough.

-Ken


---< cut here >---

echo Kermit, AUTOMATIC KANJI CONVERSION MACROS  version -1     April 12, 1989\13
echo \9                                               by  Ken-ichiro Murakami\13
echo USAGE:\13
echo \9 KANJI {JIS|EUC|SJIS} specifies Kanji code in remote host\13
echo \9 KSEND source [destination] sends a source file after conversion\13
echo \9 KTRANSMIT source transmits a source file after conversion\13
echo \9 KGET source [destination] gets a file and converts Kanji code\13
echo \9 KRECEIVE source [destination] gets a file and converts Kanji code\13
echo NOTE:\13
echo \9 NO WILD CARD is allowed in the file specification.\13
echo \9 Reserved variables are %r, %s and %x. Don't overwrite them.\13
echo \9 Reserved file name is SYS9999.TMP.\13


; ******* CAUTION! ******
; Because of limited MS-DOS batch command, it's impossible to process multiple
; files specified by wild card. Therefore, I gave up to support wild card.
; I would like to request MS-kermit to support the following facility
; to compensate the limited MS-DOS batch command.

;(1) FOR command like MS-DOS FOR batch command 
;(2) File name expansion facility in FOR command like (*.ASM) in MS-DOS batch
;    command. This will be used for wild card.  
;(3) String-search predicate for IF command argument. This will be used
;    to inspect wild card specification in variable.
;(4) If macro argument is not specified, unspecified arguments(variables)
;    should be cleared. It's impossible to decide whether the argument is
;    specified or not in the conventional MS-KERMIT macro, because variables
;    for arguments are not cleared.

; ******* definitions for KANJI macro command *******

define kanji if equal \%1 jis do set-kanji-jis,-
if equal \%1 euc do set-kanji-euc, if equal \%1 sjis do set-kanji-sjis,-
if equal \%x ok goto done,-
:no-arg, do kanji-usage,-
:done, define \%x, clr-var

define kanji-usage echo argument is JIS\44 EUC or SJIS\13

; Set register according to remote host Kanji code
; This value is used as a parameter for program CONVERT
; Note that MS-DOS uses SHIFTJIS as local Kanji code
;    -1   =    convert SHIFTJIS to JIS
;    -2   =    convert SHIFTJIS to EUC
;    -3   =    convert JIS to SHIFTJIS
;    -4   =    convert EUC to SHIFTJIS


define set-kanji-jis define \%s -1, define \%r -3, echo remote host is JIS,-
set terminal kanji-code jis-7, define \%x ok
define set-kanji-euc define \%s -2, define \%r -4, echo remote host is EUC,-
set terminal kanji-code DEC-code, define \%x ok
define set-kanji-sjis define \%s, define \%r, echo remote host is SHIFTJIS\13,-
echo Use SEND\44 RECEIVE\44 GET and TRANSMIT instead of Kanji macros\13,-
set terminal kanji-code Shift-JIS, define \%x ok


; ****** macro definitions for KSEND macro command *******

define ksend if not defined \%1 goto ksend-usage,-
if not defined \%s goto ksend-error,-
ksend1, clr-var,stop,-
:ksend-usage, echo need local file name,clr-var,stop,-
:ksend-error, echo specify remote Kanji code,clr-var

define ksend1 if not defined \%2 define \%2 \%1,-
if exist SYS9999.TMP del SYS9999.TMP,-
run convert \%s \%1 SYS9999.TMP,-
if exist SYS9999.TMP send SYS9999.TMP \%2,-
if exist SYS9999.TMP del SYS9999.TMP

define clr-var define \%1, define \%2, define \%3

; ****** macro definitions for KGET macro command *******

define kget if not defined \%1 goto kget-usage,-
if not defined \%s goto kget-error,-
kget1, clr-var,stop,-
:kget-usage, echo need remote file name,clr-var,stop,-
:kget-error, echo specify remote Kanji code,clr-var

define kget1 if not defined \%2 define \%2 \%1,-
if exist SYS9999.TMP del SYS9999.TMP,-
get \%1 SYS9999.TMP,if not exist SYS9999.TMP,goto nop,-
run convert \%r SYS9999.TMP \%2,del SYS9999.TMP,-
:nop

; ****** macro definitions for KTRANSMIT macro command *******

define ktransmit if not defined \%1 goto ktransmit-usage,-
if not defined \%s goto ktransmit-error,-
ktransmit1, clr-var,stop,-
:ktransmit-usage, echo need local file name,clr-var,stop,-
:ktransmit-error, echo specify remote Kanji code,clr-var

define ktransmit1 if exist SYS9999.TMP del SYS9999.TMP,-
run convert \%s \%1 SYS9999.TMP,-
if exist SYS9999.TMP transmit SYS9999.TMP,-
if exist SYS9999.TMP del SYS9999.TMP

; ****** macro definitions for KRECEIVE macro command *******

define kreceive if not defined \%1 goto kreceive-usage,-
if not defined \%s goto kreceive-error,-
kreceive1, clr-var,stop,-
:kreceive-usage, echo need remote file name,clr-var,stop,-
:kreceive-error, echo specify remote Kanji code,clr-var

define kreceive1 if not defined \%2 define \%2 \%1,-
if exist SYS9999.TMP del SYS9999.TMP,-
receive \%1 SYS9999.TMP,if not exist SYS9999.TMP,goto nop,-
run convert \%r SYS9999.TMP \%2,del SYS9999.TMP,-
:nop
-------

From @cuvmb.cc.columbia.edu:MOSGLA@HLERUL2.BITNET  Thu Apr 13 08:52:14 1989
Return-Path: <@cuvmb.cc.columbia.edu:MOSGLA@HLERUL2.BITNET>
Received: from columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA12668; Thu, 13 Apr 89 08:52:14 EDT
Received: from cuvmb.cc.columbia.edu by columbia.edu (5.59++/0.3) with SMTP 
	id AA24454; Thu, 13 Apr 89 08:52:03 EDT
Message-Id: <8904131252.AA24454@columbia.edu>
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 0750; Thu, 13 Apr 89 08:52:03 EDT
Received: from HLERUL2.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with
 BSMTP id 2807; Thu, 13 Apr 89 08:52:02 EDT
Date:    Thu, 13 Apr 89 14:35 CET
From: "Johan van Wingen"                          <MOSGLA%HLERUL2.BITNET@cuvmb.cc.columbia.edu>
To: isokermit@watsun.cc.columbia.edu
Subject: First comments on 2nd draft

Dear Kermit listers
My mailing of 6 April of the following text seems not to have arrived.
########################################################################
I have not had the time to study the new draft closely. Two general
remarks should be made first.
1. If a mechanism is required for including code extension techniques in
Kermit, ISO 2022 will create a Paradise for you. It is certainly to be
preferred over other methods made ad hoc by manufacturers.
2. ISO 2022 is implemented only very rarely in the data processing
world, and for good reasons. Thus what is offered to you is in fact
a Fata Morgana.
The main service ISO 2022 did, is that it acts as an ordering principle.
The new developments in ISO JTC1/SC2 go in a different direction. While
the main framework of ISO 2022 will be kept, with its announcing
sequences, the C and G selecting system will remain only a cumbersome
alternative to the multiple-octet standard (10646, at DP stage, not yet
at DIS!), and the 254 graphic character code, now proposed.

As for Appendix C, ISO 6937 does not present a usable alternative to
ISO 8859. Mr. Palme's comments are quite misleading. (There are NO
national variants of ISO 8859, Icelandic is in ISO 8859-1!)  It
requires special hardware for dealing with diacritics, because accented
letters are being coded with TWO bytes, instead of ONE, as in ISO 8859.
To the opinion of ISO SC2/WG3 members, who maintain it, ISO 6937 is
almost dead now, and will only be continued for the sake of CCITT.

ISO and ECMA registrations are the same thing. Thus for Chinese and
Korean put 58, 149. Japan is 87. Replace the other "?" by "none". The
"final letter problem" I have to verify in my files at home.

DP 10646 (140 pages) is now circulated for voting (ending 30 May). The
Netherlands voted already (yesterday) NO, in order to have several
things changed. It may be possible to get copies of the document from
ANSI, or from the institutes in your own country (DIN, AFNOR, BSI etc.).
########################################################################

For additional comments, I support Mr. Palme suggestion to consider
ODA/ODIF. It is ISO 8613, in 8 Parts, more than 600 pages. I have got
it, but it is no easy matter.  Good luck with it.

FROM  J. W. van Wingen    MOSGLA@HLERUL2.BITNET
Mail to
P. O. Box 486,  2300AL Leiden, Netherlands

From KLENSIN@infoods.mit.edu  Thu Apr 13 13:30:28 1989
Return-Path: <KLENSIN@infoods.mit.edu>
Received: from INFOODS.MIT.EDU by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA14942; Thu, 13 Apr 89 13:30:28 EDT
Received: by INFOODS id <0000026B061@INFOODS.MIT.EDU> ;
       Thu, 13 Apr 89 13:28:28 EDT
Date: Thu, 13 Apr 89 13:27:11 EDT
From: John C Klensin <KLENSIN@infoods.mit.edu>
Subject: Draft Number 2 comments -- Quite long.
To: isokermit@watsun.cc.columbia.edu
X-Vms-Mail-To: EXOS%"isokermit@watsun.cc.columbia.edu"
Message-Id: <890413132711.0000026B061@INFOODS.MIT.EDU>

Three prefatory comments:
 (1) I want to apologize in advance for the length of these comments and
the high likelihood that they cover old ground: I'm an inadvertent
latecomer to these proceedings.
 (2) I think it is appropriate to identify myself and my perspective for
those who don't know them.  I don't claim to speak for any of the groups
that follow, only to have absorbed some perspective from my interactions
with them: (i) I have been technical director for a series of projects
over the last few decades that have, among other things, been concerned
with data interchange among machines with different internal
representations.  The most recent of these is a UN-sponsored activity in
the international interchange of data about the nutrient composition of
foods.  In that context, we have have had to deal with representation of
food names in local language and character sets in files that will be used
in many countries and environments.  (ii) In conjunction with another
project, I was involved in some of the early discussions and design of the
Internet FTP that led to the TYPE, MODE, STRU, and SITE commands, which
have never done what it was hoped that they would do, and which were
intended to address some of the issues raised in the draft (iii) I have a
certain amount of standards development experience, both in the US and
internationally.  I chair the US committee on PL/I (X3J1) and have been
convenor of the corresponding ISO working group.  PL/I was the first
widely-available programming language with dialects that contain provision
for simultaneous use of single- and multiple-byte character sets.  Those
provisions were rejected for inclusion in the standard at the
recommendation of the company that developed them.  The PL/I standards are
also the only language standards in the ISO arena that have been defined
by precise semi-formal methods, in which it has not been possible to paper
over some of the issues with which the draft "iso kermit" document deals.
(iv) Finally, I chair the Standards Committee of ACM and, largely as a
consequence, am a member of ANSI/ISSB.  For readers from outside the USA,
ISSB acts as a "board of directors" for policy issues in Information
System standardization within the USA.  Among other things, it approves
and, when needed, coordinates, all of the US Member Body TAGs to the ISO
working groups and subcommittees.
  (3) Johan van Wingen's note of 13 Apr arrived just as I was about to
send this off and it covers some of the same ground.  I have not removed
the redundancies, but wish to reinforce what I see as his three main
points that are relevant to my argument:
    (i) there are few, if any, serious implementations of ISO2022.  At
best, vendors use a little bit of it for a few things.
    (ii) the industry (and standards) trends are toward eight-bit
character sets, and then toward multipe-octet character sets.  If devices
support multiple of those sets, they are more likely to do it by some
major and semi-permanent activity (switches, setup modes, etc) than by
dynamic ISO2022 switching.
     (iii) The ISO 8859-n components are the only variations there are.
There are no national variants *within* a registered ISO8859 set.  That is
the point of ISO 8859, for better or for worse.  And, as mentioned above,
that point reflects industry and hardware realities.  ISO 6937 is, or was,
a reasonable *communications* standard (hence the CCITT connection), but a
terrible file-storage and processing standard (because the concept "length
of a character string" and the concept "number of characters in a
character string" are either intimately related or all of the programming
languages and database languages/systems that deal with these things go
crazy).   DP10646 is in DP vote.  Given a single "no", that means
realistically that it is at least a year to 18 months away from being an
ISO standard.  And, despite SC2's optimism, it is not yet clear who (among
hardware and software vendors) are going to pay any attention to it in
practice. 

SUMMARY OF WHAT FOLLOWS
  Having studied "Draft 2", I think it launches us, and kermit, down the
proverbial slippery slope.  While the capabilities suggested are, without
question, useful, they go well beyond the near-term capabilities of
practical devices.  Even the devices cited as compliant to these standards
are not, in fact, compliant at the level anticipated and required by the
draft.  Similarly, the draft implies the availability of translations in
each kermit that are certainly hard, may not be well-defined, and that are
certainly not defined in the standards referenced.

I suspect that, partially as a symptom of this slippery slope
phenomomenon, the draft wanders off into areas that don't really have much
to do with character sets.

I will outline some small difficulties with the draft, then suggest an
alternative way of looking at the problem(s).  Some of the nit-picking is
included because it sets the foundation for the suggestions that follow.

QUIBBLES AND EDITORIAL REMARKS

1) Just below figure 2, paragraph starting "An 8-bit alphabet...":  As far
as I know, it would be correct to say that the number is adequate to
represent the characters in all of the world's "alphabetic" languages.
The one-graphic-per-word character sets are not alphabetic in the sense
that term is typically used.

2) Three paragraphs below, starting "A language like English...":
"adequately GL" should be "adequately in GL".  Note that this is not
strictly true as soon as punctuation and special characters are
considered.  Note that the mapping of dollar-sign-symbol (in ASCII) to
universal-currency-symbol (in ISO 646) to pound-sterling-symbol (in the UK
(BSI) version of ISO 646 whose name I don't recall at the moment) has been
the source of considerable confusion as people try to figure out whether
amounts should be multiplied by appropriate conversion factors.  It is
claimed that "English" is used in both countries, but communication
between them requires/suggests at least two distinct currency symbols,
with distinct and agreed-upon code points.

3) This may or may not be appropriate here, but it seems to me to be
intimately related to the current draft proposal.  As kermit moves into
eight-bit character sets, it may be appropriate to modify the protocol
somewhat to understand that combinations such as "8 data bits, parity, one
stop" are as well defined as "7 data bits, parity, one stop" and "8 data
bits, no parity, one stop".  The latter two are supported by the existing
protocols, the former is not.  The only arguments I know of against
supporting parity along with eight data bits are (i) that terminals and
modems don't handle it, which is no longer true: many do, and (ii) that
the trends are toward OSI-like transmission, with length encoding and ECC.
On the other hand, if we have that, we don't need kermit (or at least a
lot of what kermit has been most important for in the past).

4) Under "CHARACTER SETS" is a discussion of the registration activity.
Note that ECMA is "simply" the registration authority under ISO2375.  As
such, they are acting as ISO's agent, and have obligations to register (or
not register) things independent of the desires and plans of their
membership.   More important, while coordination has been steadily
improving, there is no requirement that CCITT register its character sets
with ECMA or anyone else.  There are many respects in which CCITT
"Recommendations" bear more resemblance to treaties than they do to
voluntary standards: the members of CCITT are governments and PTTs, while
ISO is a voluntary organization whose member bodies may be private and
voluntary organizations (in the USA, the formal representation to CCITT is
in the State Department; the ISO member body is ANSI, a private
organization with no ties to the government other than government
representation on many of its boards and committees).
  The material in appendix B is better in this regard.
  On this same topic, note that it is very easy, although time-consuming,
to register a character set under ISO2375, as long as the characters in it
are part of a pre-defined character repertoire.  The ISO standard that
contains the identifications of all known Latin-based alphabet characters
should probably be on the reference list--I don't have the number handy,
but can look it up if needed.
  Since, in principle, more variations can be registered every month, a
scheme that assumes that a given kermit will be able to accept alphabet
designation sequence and do the "right" mapping of graphics onto the local
preference implies continual updating of tables, etc.
  Note that registration applications and draft international standards
don't count.  They are subject to change, and sometimes do.  In
particular, there is a strong argument for removing all of the "Esc Seq"s
from Table 5 that do not have corresponding Registration numbers in the
last column.  The effect of that on the table may call out another phrase
that periodically occurs when thinking about standards: "premature for
standardization".   We really don't want kermit in the middle of this
rapidly-changing situation if we can avoid it.
  Referring to the "possible problem" called out at the bottom of the
document, the theory according to ISO is that Japan will have to get in
line.  JIS had the option to taking significant exception to those
assignments and apparently did not, which may speak for their intentions.
I await Ken-ichiro Murakami's second round of comments on this, after his
expert advice arrives.

5) In at least most of the kermit implementations I have worked with, the
TEXT/ BINARY distinction does not affect file transfer and is often
required by one of the transferring kermits but not the other.  With a few
trivial exceptions, and the less trivial ASCII->EBCDIC translation going
to IBM hosts, the choice of FILE TYPE tends to impact storage
representations, rather than data conversion.

6) The fourth paragraph under "SELECTING ISO-2022 TRANSFER SYNTAX" points
to "table 4".  Table 4 is very well hidden in appendix B and should be
pointed to there. 

7) In the paragraph immediately under Figure 4, ISO 646 is referred to.
When discussing these types of issues, it is important to distinguish
between ISO 646 IRV (i.e., "ASCII" with "universal currency symbol"
substituted for "dollar sign") and ISO 646 BV (which is where the
national variations on special character positions show up).  CCITT IA4
should also be on that list as another almost-ISO 646 character set.

8) Under TERMINAL EMULATION, the DEC VT200 and VT300 series terminals are
identified as ones that "already follow these standards".  This is not
strictly true: these terminals do implement GL-GR switching.  They
implement a very limited number of character sets that can be bound to G0
and G1.  They will ignore any ISO2022 announcer or designation sequence
that they don't understand, which is not the sort of thing that is usually
considered a satisfactory implementation (although it is typical of ISO
2022 "implemenations").  It would be more accurate to describe the VT200
(which does not even support Latin Alphabet 1/ ISO 8859-1) and VT300 as
fairly dumb terminals, with limited character set switching capability,
that implement what capability they do have by using ISO 2022 control
sequences.
  I would favor support for eight bit characters in seven bit environments
in the kermit terminal emulators.  For that purpose, ISO2022 shifts should
be supported, rather than the eighth-bit-quoting of the file transfer
protocol, since the terminal emulators are talking to hosts, not to kermit
servers.  That said, please understand that very few hosts implement this
stuff: Digitial's model with VAX/VMS is fairly typical, with most of the
operating system supporting GR graphics (high-bit-on) and ISO8859-1 and
DEC's "multinational" variations only if the terminal is operating in
eight-data-bit mode.  C1 controls are, however, supported with escape
sequences if the terminal is operating in seven-data-bit mode only.

9) In Appendix A, availability of ISO standards is listed.  It may be that
some text was left out.  But, the correct statement as I understand it is
more or less as follows:
 - Each of the national ISO member bodies is ISO's official sales agent in
that country.  Consequently, ISO standards should be ordered from ANSI in
the USA, from BSI in the UK, from AFNOR in France, from DIN in Germany,
etc.  The UN Bookstore may also carry some ISO standards.  ISO standards
are never free; the ISO Central Secretariat derives improtant operating
income from the sale of standards by its agents.
 - CCITT is part of the International Telecommunications Union (ITU) and
hence part of the UN system.  Its recommendations are available from the
ITU secretariat in Geneva, from the UN Bookstore(s), and through the CCITT
national committees and/or PTTs in many countries.  Some national
standards bodies, including ANSI, also carry CCITT colored books (sets of
recommendations).  The cost of CCITT colored books depends on where you
get them: sometimes costs are absorbed in indirect ways.  ANSI, which
derives a significant fraction of its annual expenses from the sale of
publications, charges for them, for its own Standards, and for ISO
Standards.

ANOTHER WAY TO THINK ABOUT THE PROBLEM

In the general case, we can expect a given system to actively support only
a small fraction of the character sets that it is possible to register.
Except for high-performance bit mapped devices, it is likely that a large
fraction of the characters in the repertoires from which the registered
character sets are and will drawn will not be available.  Conversion among
word processor formats is extremely complex and typically involves the
loss of information, since different processors support different
capabilities (and, hence, capabilities for specification).  To provide one
specific example, on MSDOS, a barely adequate WordStar 5.0 to WordPerfect
5.0 converter is a larger program than MSKERMIT, not counting
user-supplied tables that drive the heuristics.

I suggest that, terminal emulation aside, the problem of sending special
character sets ("special" is anything but ISO 646 BV, with no national-use
character positions included) is really a problem of telling the receiving
system *what kind* of "binary" file is being sent, and reaching
appropriate agreement.  Anything else involves very complex conversion
issues that don't belong in kermit, that will be little used in practice,
and that probably can't be made to work.

Two historical kermit assumptions are key in figuring out what should be
done.  If either is relaxed, other options are possible.  The first of
these assumptions is that options should be agreed to on a single exchange
only: the Telnet negotiation arrangement, in which "DO" and "WONT"
requests are exchanged until some agreement is reached, has not been
considered acceptable.  Second, although less explicit, is the assumption
that error-reporting behavior as to what can be handled (as distinct from
actual transmission errors) should occur during send/receive negotiation,
not be discovered midway in a transmission.

Sending the ISO2022 announcer <ESC><SP>C (or D) tells the receiving kermit
that it should expect alphabet selection escape sequences, but gives no
clue as to what alphabets might be selected.  That is undesirable: given
limits of devices, what is wanted is to know *what* alphabets might be
specified *before* the transfer is begun.  Referring to the "MISMATCHED
CAPABILITIES" section of the draft, an "X" earlier is clearly better than
an "X" later, but an early "NAK" with "I'm not going to deal with that" is
far more acceptable.  If the "X" after transmission starts is the agreed
option, *please* let's specify exactly what goes into the [rest of the]
data field when this situation is encountered.
  Parenthetically, reason 1 there is not relevant: if the receiver does
not know enough about attribute packets, then the control sequences will
end up in the file, with whatever conversions are in effect.  I think this
implies that SET TRANSFER-SYNTAX ISO-2022, or any of its relatives, imply
"I can handle attribute packets, and, if you can't, we are going to call
this off".  "Call this off", here, might mean "sender (user, not software)
will convert to ASCII and retry" or "send binary, will fix up at receiver
end".   The protocol should not care.

The overwhelming number of useful transfers will occur among parties that
support the same character sets, whatever they might be.  For both files
and terminal emulation, the "how do I translate from what is coming in to
what I have" question is, in practice, less interesting than the "how do I
figure out what they are sending, so I can match it" question. 

Let me ignore for a moment the way that the attributes are encoded.  What
is needed is a way for me to send to the remote kermit "I am about to send
eight-bit characters.  The whole transfer is going to be in ISO8859-6,
with *no* embedded ISO2022 controls.  Can you cope with it?".  Now, the
answer to that question is either "yes" or "no", and, as a sender, I don't
care whether "yes" means "we have a device that can display it" or "we
have local capability to convert to something else".  In principle, if we
device a general enough way to say that, then I can also say "I am about
to send WordPerfect 5.0, can you cope?".  Again, the receiver's answer is
either "yes" or "no", and "yes" might mean "we support WordPerfect around
here" or "we can accept that and have a conversion program that does a
plausible job".

To a considerable extent, ISO8859 came into being to avoid data transfers
and files with embedded ISO2022 controls.  The 8859 theory is, more or
less, "if we can agree which character set we will use among ourselves and
be clear about it, then we are in a modern version of 'the good old
days'".  I'd recommend a look at the ways message character sets are (or
used to be; I'm still working from the Red Books) specified in the X.400
suite to reinforce this.

On the other hand, one of the things that I should be able to put in that
inquiry packet is "I want to send you a file with all sorts of embedded
ISO2022 controls, including alphabet switching".  Then, and only then,
does much of the complexity of the existing draft come into play.  And, if
the receiver says "yes", an ability to handle any alphabet registered up
to that day is presumably assumed.  Since that is not realistic, I want to
repeat here a variation of a suggestion that I saw go by a few days ago (I
don't still have it, or would acknowledge the idea more specifically):
that acceptance of ISO2022 sending should be followed by a packet that
specifies what I intend to bind onto G0-G4, and/or a list of what I would
*like* to use, in descending order.  In the former case, the receiver
would say "yes" or "no"; in the latter, it would send back its list of
preferences, leaving off anything it could not handle.  Both of those are
consistent with the general kermit model.  Or, one could break the rules
and negotiate back and forth.

This is a game, incidentally, that a VT300 emulator could play without
problems, since the emulator could know what character sets were
supported, what was available for downloading from the attached computer,
and could give an orderly reply as to what it was prepared to cope with.

Another observation about word processors and similar programs: The notion
of word processor format conversion is basically unworkable, even more
unworkable than "how do I deal with the GR of ISO8859-6 on a VT52", since
there are ISO recommendations for Arabic-Latin transliteration.  I am
singling out 8859-6, incidentally, not because of any prejudice, but since
it the one of the 8859 sets for which I haven't seen an emulator (even on
a high-performance workstation) within a few miles of MIT.  I doubt that
emulators are readily available in a similar distance of most of the other
contributors.  And shifting from 8859-1 to 8859-6 or 8859-8 and back again
with announcer sequences, but within the same document or line, raises
some *hard* "terminal" problems.
  The problem is the information loss mentioned above, or, more important,
the need to make up information.  These things have gotten complicated
enough that there are very complex conversion programs on the market.
Even those programs, or at least the best of them, work with user-supplied
tables that specify parameters for the heuristics that understand certain
constructs.  And people will want to send Postscript and similar files as
well (conversion from Postscript back to "word processor" is analogous to
decompiling a program or to optical character recognition).  Keep in mind
that the current versions of WordStar and WordPerfect (at least) permit
imbedding bit-mapped graphic files in text documents, so Postscript is not
a big stretch.  While ODA and SGML have been suggested as alternatives,
each has special properties.  ODA is basically just another word processor
format, with the advantage that it is an international standard and the
disadvantage that it has few implementations.  And SGML has its own
mechanisms for dealing with "funny" characters, which are usually not
expected to appear directly in files to be transferred: I would rather
know that an incoming file is SGML, not just "text", but the protocol does
not need to do anything different, certainly not try to convert it to,
e.g., PostScript.

PROPOSAL
  Having denounced everything in sight, let me try to make a simplifying
proposal.

I.  Terminal emulation
  There is, at the moment, no protocol for a remote host to tell terminals
what they should, or must, support.  If such a protocol is defined, it
will be by the host and terminal vendors, we hope in conjunction with
appropriate International Standards.  But we will still be emulating real
terminals, with small variations.  That is why we call it "terminal
emulation", after all.  Just as kermit terminal emulators gradually moved
from Z19/VT52 support to VT100 support, we should anticipate and encourage
the emulation of devices that are able to support some of the ISO8859-n
sets (e.g., the VT3xx) and the subsets of ISO2022 controls provided by
those terminals.  If those emulators can move beyond the subsets, so much
the better, but ISO8859-n support, for a small set of n's, is much more
important than general ISO2022 support including midstream character set
switching.  I am very encouraged in this regard by what I have been
able to infer about the work in Japan, and would like to hear more about
it.  In no event does this issue have much to do with the transfer
protocol, except for the implications of SET DEFAULT-DEVICE SCREEN, where
that is implemented.
  I am at least sympathetic with Joe Doupnik's 10 April comments relating
to terminals and terminal emulation.  Similar thinking motivated the 
radical suggestions made here.
  I agree with Ken-ichiro that SET KEY should be adequate.  If SET
TERMINAL KEYMAP is needed (for reasons I don't understand), provision
should be made for assigning a different keymap everytime the character
set changes.  I.e., you probably want to permit binding keymaps to G0,
G1,..., rather than, or in addition to, GL and GR.  I could easily be
wrong about this, I haven't thought about it very much.

II. File transfer
 1. If both sender and receiver don't support attribute packets, then only
"binary" is going to work for any of this.
 2. Define, and acquire, an attribute packet something like the following.
The command breakdown may be wrong, as may the keywords.  However,
whatever is done, it is important to define things so that the default
preferred character set and file information ("transfer syntax") can be
specified in a start-up macro and retained, while the "extended"
characteristic is turned on and off.
 2a. Support SET FILE-TYPE EXTENDED, as well as "text" and "binary".
"Extended" implies attribute handling, and attribute "*X".
 2b. Support SET TRANSFER-SYNTAX <keyword> <value>
  where <keyword> is a list that gets registered and built into the
protocol on the same basis as attribute packets.  I would suggest that the
following concepts are good candidates for <keyword>" (note that these are
the concepts, not the specific keywords).  Each keyword is associated with
its own definition of a value field.  Value fields, however, are specified
by generating rules, such as those outlined below, not by the kermit
protocol manual.
 - ISO8859.  Value field is a part number and a date. (WARNING: these
things get revised, not always consistently).
 - ISO2022.  Value field is a date, followed by the proposed announcer
sequences, followed by a list of the character sets that will be
referenced. 
 - ISO ODA.  Value field is a date, and maybe some other stuff (I haven't
studied ODA in enough detail).
 - ISO SGML. Value field is a date.
 - Proprietary word processor.  Value field is a name and a version, and
some conventions are needed about how those names are spelled.  Note that
"WordPerfect 5.0" file formats are not equivalent to "WordPerfect 4.2"
file formats, and that this is a symptom of a general problem: more
capability typically means file format revisions; significant increases in
capability imply non-upward-compatible file format revisions.  
 - Proprietary 'picture' or 'page description'.  Values are things like
"Postscript", "HP-PCL", MSP, etc.  It is not clear to me why this category
and the previous one should be separate, but my intuition says that they
should be.
 - CCITT recommendation.  Number and date (or color).
 - ISO standard (not covered above).  Number and date.
 - National standard (not covered above).  National standards body acronym
(these are listed/specified in ISO documents), number, and date.
 There may be some others, and I would encourage something that a server
and user can agree upon, without assuring uniqueness, such as:
 - private.  some value field.

 3.  The associated attribute packet contains *X and an encoding of the
keyword and value string.  When it arrives, the receiving kermit either
agrees or rejects, as usual with attribute packets.  And, as suggested
above, "agree" implies an ability to deal with the result, not what that
ability is going to be.
     An important issue not addressed in the draft is that we (at least)
increasingly use kermit as an element in multi-hop transfers, e.g., PC or
workstation to host via kermit, to other host via FTP, to other host via
some encoding with a mail envelope, to PC or workstation via kermit.  Even
if both workstations can handle, say, ISO8859-5, that is no guarantee that
the intermediate hosts can.  No real problem here, but the files will
either have to arrive as binary and be post-decoded (as below) or the
kermit on the receiving workstation will need to be given a *local* SET
RECEIVE TRANSFER-SYNTAX ... command that deals with the file in the way
that files of that type are dealt with, independent of any attribute
negotiations. 

 4. For operating systems for which the capability is appropriate, a
kermit command that looks something like:
  SET DATA-CONVERTER <routine> <keyword> <value>
would permit user specification of a routine that could be invoked on the
fly to convert those particular types of files.  The semantics of
"routine" would have to be local-system-dependent.  For the record, I
don't think that this is a good idea, but it is a way to incorporate the
capability that some of you seem to want within kermit.  Note that, as
soon as you move either into word processor land (or even things like
SGML), "conversion" is not a matter for tables, but for programs.
   Some similar capability that *does* permit user-specified remapping
would be appropriate.  If ISO8859-6 is going to be "converted" into
ISO8859-1 so that it can be displayed on an ISO8859-1 machine with no
8859-6 capability, then the specification of the actions should be under
user control.  The ISO Romanization of Arabic, incidentally, runs left-to-
right.

  But the main point is that 2-3 above get a "binary" file delivered, and
delivered with sufficient information to decode it.  And they enforce
agreement on the file organization and content before the file is
transferred.  That decoding is probably best done post-transfer, rather
than on-the-fly, if for no reason than that one is likely to like to have
the file -- as transferred -- available when the decoding produces
something unexpected.  If you decode or translate on the fly, a new set of
debugging options that save the transferred data would be helpful.

The logical flow chart now looks like:
     Open transfer connection
               |
     Tell receiver what you are going to send
               |
     Receiver agrees to accept that form
               |
     Send data in that form
               |
     Close transfer connection

Note that notions of "convert to transfer syntax" and "kermit encoding"
don't appear here, at either end.  That is reasonable, practical, and
probably The Right Thing.  At the same time, things are not being
transferred as "text" that are not "text" as that is traditionally
understood by kermit implementations.

  John Klensin
  Klensin@INFOODS.MIT.EDU


From @cuvmb.cc.columbia.edu:A-PIRARD@BLIULG11.BITNET  Fri Apr 14 11:48:13 1989
Return-Path: <@cuvmb.cc.columbia.edu:A-PIRARD@BLIULG11.BITNET>
Received: from columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA24542; Fri, 14 Apr 89 11:48:13 EDT
Received: from cuvmb.cc.columbia.edu by columbia.edu (5.59++/0.3) with SMTP 
	id AA07620; Fri, 14 Apr 89 11:47:45 EDT
Message-Id: <8904141547.AA07620@columbia.edu>
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 1479; Fri, 14 Apr 89 11:47:43 EDT
Received: from VM1.EARN-ULG.AC.BE by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with
 BSMTP id 5383; Fri, 14 Apr 89 11:47:42 EDT
Received: by BLIULG11 (Mailer R2.03B) id 3958; Fri, 14 Apr 89 17:46:36 +0200
Date:         Fri, 14 Apr 89 14:45:26 +0200
From: Andr'e PIRARD <A-PIRARD%BLIULG11@cuvmb.cc.columbia.edu>
Subject:      Re: 8859/1 kermit in Iceland
To: Fridrik Skulason <mcvax!rhi.hi.is!frisk@uunet.uu.net>,
        ISO/Kermit Discussion Group <isokermit@watsun.cc.columbia.edu>
In-Reply-To:  Your message of Wed, 12 Apr 89 13:02:29 GMT

Abridged, Fridrik says:
>   Here in Iceland we have been using kermit with ISO 8859/1 translation
>   for almost three years now. This was done since our national language
>   contains 10 characters outside the 7-bit ASCII character set, and we
>   wanted to simplify the file transfer process.
>
>        No changes were made in the BINARY case, but when transmitting
>        TEXT files, automatic translation to ISO 8859/1 was done. Also,
>        when receiving TEXT files, translation from ISO 8859/1 to the
>        native character set was done automatically.
>
>        Natic character set A ----> ISO 8859/1 ----> Native character set B
>
>        We also changed the Macintosh Kermit in a similar way, but in other
>        cases (HP 9000 and ATARI ST) no changes were needed, since those
>        machines use the ISO 8859/1 character set here in Iceland.
>
>            SET TRANSLATION {NONE | ISO | CP850 | DEC-MULTI | ROMAN-8}
>        It was used to specify what sort of translation should be applied
>        to incoming/outgoing characters while in terminal emulation mode.

I quote this note because it just tells me that what Fridrik has done
is exactly what I have done between CMS Kermit and the IBM PC
(in our own program), what I am longing to have for the MacIntosh and others
and what I proposed to Frank as a very low cost to value addition
to Kermit implementations: allowing byte to byte generalized translation in
both text transfer and terminal mode, and recommending to use this translation
so that a common code ISO 8859-x be talked on the line to remove the
NxN problem. I said even hidden patchable translation tables are useful,
but the more user interface (SETs or the like) the better.
I am sure Fridrik is not the only one. On the contrary, this solution
will satisfy the majority of those who can do with a single version
of ISO 8859 (and who maybe do not know what ISO 8859 is), a useful solution
*now*, because it is usable with today's software.

Now I quite understand Frank's reaction that being restricted to a single -x
is a pity and that my proposition is no use to e. g. the Eastern languages.
And I sure praise him to have risen the debate to higher grounds.
But, given the size of his document and that of the comments it already
raised, I am afraid this proposition is very difficult to implement
and to use.

So, I suggest that international characters be supported on two levels:

1) restricted, within a single version of ISO8859, in the proposition terms
no announcers, switchers etc... In fact, having it work with to-day's
Kermits to which translation and maybe simple commands are added.
That's in fact coordinating Fridrik's work, mine and maybe other's and
is straightforward in its definition.

2) general, across multiple codes or ISO versions, in which usage I am not
interested, but my heart is with implementers and will comment:
If a multibyte standard existed, most of the guessing at how to store
and transmit the data would not exist. ISO 10646 is sure the kind, but
Johan van Wingen tells me it is not sure we should wait for it.
But, even if we don't want to bet on 10646, the multibyte
idea is there. Why insisting on sticking to existing standards for what
is our own right to define under the Kermit protocol? If ISO 2022 is used
just to switch among several codes, why not transmit double bytes ccdd where
cc is the code and dd the data. The only reason would be performance,
but is it really so or that important, faced to simplicity. (Of course,
cc being constant would be transmitted under the restricted case).
Or, even though 10646 is not waited for, trying to be as close as
possible to its definition could ease yet another conversion when/if
it comes to reality.

Now I'll sure contact Fridrik for some software and ideas exchange soon.

Hoping the best move for Kermit.

Andr).

From lts!amanda@uunet.uu.net  Fri Apr 14 14:45:59 1989
Return-Path: <lts!amanda@uunet.uu.net>
Received: from uunet.UU.NET by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA25822; Fri, 14 Apr 89 14:45:59 EDT
Received: from lts.UUCP by uunet.UU.NET (5.61/1.14) with UUCP 
	id AA20648; Fri, 14 Apr 89 14:45:55 -0400
Received: by lts.UUCP (4.12/2.881128)
	id AA09736; Fri, 14 Apr 89 13:40:06 est
Date: Fri, 14 Apr 89 13:40:06 est
From: Amanda Walker <lts!amanda@uunet.uu.net>
Message-Id: <8904141840.AA09736@lts.UUCP>
To: isokermit@watsun.cc.columbia.edu
Subject: Are we getting off track, or am I just confused...?

I've only been on this list for a week or so, but I have read the
draft proposals, and am pretty familar with the ISO standards we've
been talking about.  It seems to me that some of the debate going on
is ranging pretty far from the original goal of this effort, at least
the way I understand it.  Now, I may be covering familiar ground in
this message, but it evidently seems simpler to me than it does to
some others. 

Looking at things from the point of view of a long-time Kermit user,
Kermit does two things: file transfer and terminal emulation.  In the
case of many implementations, the terminal emulation function amounts
to acting as a "virtual cable," reducing it to basically a means of
transferring files.  Historically, one of Kermit's main strengths was
the fact that it allows us to transfer a text file from one machine to
another, even (or especially) when they have widely differing internal
formats for storing such files.  It does this by defining the
representation taken by such files (i.e. printable ASCII delimited by
CR-LF pairs) and a way to encode this representation into packets
that can be sent over almost any communications channel.  As I see it,
one of the main points of the ISO-Kermit idea is to extend this
representation of a "Kermit text stream" to include polyalphabetic
text.  As several people have pointed out, once there is a common
representation, each implementation only has to deal with it's own
native formats.  I think that ISO 2022 is very appropriate for such
a representation.  By adding a few more control characters to the
representation, we gain the ability to send polyalphabetic text, while
remaining compatible with pre-existing kermits.  The only "protocol"
extension I see any need for is a flag in the initial negotiation
saying "I can send/handle ISO 2022", much in the way long packets are
handled now.  If one side can't handle it, the current kermit text
format is used.

Now, if a given implementation is smart enough to do other kinds of
translation (such as handling complexly formatted documents or,
say, TeX notation for diactricals and non-ASCII characters), that's
fine, but it's an implementation feature that the user can ask for,
not part of the represention "on the wire."  One of the advantages
of using ISO 2022 controls is that if all else fails, a file can
be transmitted as an 8-bit stream, thus preserving the information
even though one end of the connection may not be able to interpret it.

Given that both sides of a connection can handle translating from
their native character set to and from an ISO 2022 stream, I think
that we've accomplished what we set out to.  The fact that most
machines can only handle one character set at a time (at least
for simple text documents) is a red herring, I think, as is the
fact that most of these character sets are only partially-intersecting
subsets of what can be represented using ISO 2022.  It still lets us
preserve as much information as possible, which, once again, has
been of the biggest strengths of Kermit.

Once we go beyond simple text file transfer into the realm of being
able to interchange arbitrarily formatted documents, it's time to
look at ODA or full ANSI X3.64 or something, but that seems to me
to be a separate issue.  I am personally interested in it, but I
think we should take this one step at a time.  ISO 2022 is a way
to represent a polyalphabetic text stream, and if that's what we
want, it'll do the job quite well.  It's straightforward and will
bring immediate benefit to Kermit users.

The second major issue is terminal emulation, where it would be very
nice to be able to view polyalphabetic text.  Some existing terminals
(such as the DEC VT340) are a start, but microcomputer implementations
of terminal emulators are an excellent testbed for doing a much better
job.  I think that using ISO 2022 is also a good way to start on this,
since so far, most hosts talk to their terminals over a text stream.
However, I still would like to keep file transfer and terminal emulation
separate, despite the fact that in many implementations they may well
share code.  As I mentioned in a previous message, I think it would be
a mistake to cripple one machine because of another's shortcomings.
I think that implementors should be encouraged to put as many of the
registered character sets into their emulators as is pratical.

I hope this is making sense--it's been a long week.

On to some implementation details....

For machines (such as an IBM with an EGA) where there is a
programmable character generator with a limited number of slots, one
way to make the most of the hardware would be to treat the CG as a
cache, and only keep it loaded with the characters that are used on a
given screenful.  Aside from test patterns :-), I can't think of many
times when you would actually have more than 512 different alphabetic
glyphs on the screen at once (I'm not counting Kanji/Hanzi as
alphabetic), even for multilingual text.  I have used a similar
technique in printer drivers for printers with very limited download
capacity (anyone remember the Xerox 2700?).  The only problem I can
think of for the EGA/VGA is that you'd have to be real careful to
avoid screen flicker when updating the character set table...

On a related note, I have seen reference to a freely available JIS-
sponsored dot matrix (20x20 dot?) font sent that has glyphs for
the entire JIS Kanji/Roman/Cyrillic/Greek set.  Does anyone have
any information about how to obtain this?  If it truly is freely
available, it could save a lot of work in implementing Kanji versions
of Kermit...

Amanda Walker
InterCon Systems Corporation
amanda@lts.UUCP

From KLENSIN@infoods.mit.edu  Fri Apr 14 17:02:43 1989
Return-Path: <KLENSIN@infoods.mit.edu>
Received: from INFOODS.MIT.EDU by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA26944; Fri, 14 Apr 89 17:02:43 EDT
Received: by INFOODS id <0000047A071@INFOODS.MIT.EDU> ;
       Fri, 14 Apr 89 16:59:45 EDT
Date: Fri, 14 Apr 89 16:36:17 EDT
From: John C Klensin <KLENSIN@infoods.mit.edu>
Subject: RE:  Are we getting off track, or am I just confused...?
To: Amanda Walker <lts!amanda@uunet.uu.net>
X-Vms-Mail-To: EXOS%"Amanda Walker <lts!amanda@uunet.uu.net>"
Message-Id: <890414163617.0000047A071@INFOODS.MIT.EDU>
Cc: isokermit@watsun.cc.columbia.edu

Amanda,
  It depends on how one defines the problem, and I'm not sure, after 
reading your note, how you define it.
  If the notion is "take my internal character set, transform it to
ISO2022 sequences and an arbitrary collection of registered character
sets, and send it to a remote host with the expectation that it will
translate to its nearest approximation to what was sent" then I think
you create chaos.  The problem is that the list of "registered character
sets" is not a closed set.  In principle, each week can bring a new one,
and, with the introduction of non-Latin characters into ISO 8859, you
can't predict from already-known ones what the new one will be.   
On-the-fly translation from an unknown character set to a local one so 
that local characters can be stored in files (much like ASCII->EBCDIC 
translation now occurs) impresses me as a difficult job.
  Alternately, the idea might be "take my internal character set, 
transform it into ISO 2022 form, including the correct announcers and 
identifiers to tell the remote kermit what my character set is, send it 
to the remote, and expect that it will place it in a file, ISO2022 
controls and all" seems to be to be perfectly sensible and consistent 
with my model of what is plausible.  If that--perfectly canonical--ISO 
2022 code containing file is then to converted to something else on the 
target machine (either a single character set or a different 
canonicalization or even a different set of ISO2022 introducers and 
announcers), then that is quite reasonable, too, but it is not part of 
the kermit problem.
  The third interpretation is that one really wants to have a single
canonical character set into which all "text" other than ASCII is
translated in the hope that a local system can map it into the 
characters it understands and can display locally and that it can map
characters it does not understand into some local convention.   If that 
is the intent, then the right solution is some universal multi-octet 
character set--"universal" in the sense that all conceivable characters 
are in it--for kermits to use in transmission.  Unfortunately, such a 
Standard is probably some time off, and, unless you want to eliminate a 
large fraction of the world's population, it may be three or more bytes
rather than two.  A factor of three in file transmission size is a
pretty high price to pay if all I want to do is to send a file in, say, 
German or French.
  And, while I agree about the number of characters one would have to 
download to an EGA or VGA "most of the time", one of the things we 
realized looking at some related issues for PL/I is that almost any 
scheme that limits the number of character sets on a screen can be 
broken by text that uses parallel translation or a multilingual 
dictionary.  Those are precisely the things that some of us would like 
to send but, for most purposes that do not involve Kanji or Hanji, ISO 
8859-n, for small values of n, tends to suffice without a great deal of 
complexity, in-line character set changes, or multiple octet sets.

  John Klensin   Klensin@INFOODS.MIT.EDU


From lts!amanda@uunet.uu.net  Fri Apr 14 19:47:02 1989
Return-Path: <lts!amanda@uunet.uu.net>
Received: from uunet.UU.NET by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA29112; Fri, 14 Apr 89 19:47:02 EDT
Received: from lts.UUCP by uunet.UU.NET (5.61/1.14) with UUCP 
	id AA15025; Fri, 14 Apr 89 19:46:58 -0400
Received: by lts.UUCP (4.12/2.881128)
	id AA10357; Fri, 14 Apr 89 18:36:26 est
Date: Fri, 14 Apr 89 18:36:26 est
From: Amanda Walker <lts!amanda@uunet.uu.net>
Message-Id: <8904142336.AA10357@lts.UUCP>
To: isokermit@watsun.cc.columbia.edu
Subject: A difference in scope, I think.

John,

I apologize if I was little vague in my previous message;  I think that
my interpretation of the scope of the current proposal (based on my first
reading of Draft #2) was quite a bit more limited than yours, which caused
my confusion at your previous message.  After rereading the draft and the
last couple of weeks worth of traffic on this mailing list, I am less
confused about your position, but a little more confused about just what
the proposal is supposed to do (which, I suppose, is exactly what this
list is for...).

My initial interpretation of the draft, and one that I still think is
heavily implied by the introduction, is that the extension is indeed
meant to, as you put it:

	"take my internal character set, transform it to ISO2022
	sequences and an arbitrary collection of registered character 
	sets, and send it to a remote host with the expectation that it
	will translate to its nearest approximation to what was sent."

I agree thyat this is an open-ended problem, since the set of registered
character sets is open-ended.  That is, I believe why the new attribute
field was introduced--so that two Kermits can determine whether what they
can interchange effectively.  Basically, I'd rehprase your quote above
as "I know what I can send, and you know what you can receive; if these
overlap, we now know what we can interchange without having to know anything
about each other's respective character sets."  This is a much more
restricted capability than the ability to transfer arbitrary polyalphabetic
text, and one I find quite plausible.  Fridrik's and Andre's experience
seem to bear this out.

The idea of "take my internal character set, transform it into ISO
2022 form, including the correct announcers and identifiers to tell
the remote kermit what my character set is, send it to the remote, and
expect that it will place it in a file, ISO2022 controls and all" is
trivial, given the ability to translate out of the native character
set; all the sending kermit has to do is do the translation but flag
the file as a binary file.

The wider problem, that of universal interchange of arbitrary
polyalphabetic text, is definitely a fascinating one, but I'm not
sure it's tractable in kermit in the short term, if only because
for most (especially small) machines that run kermit, we'd have to
basically write a whole text processing system to handle it.  I don't
think that should be part of this effort.

Of course, there are a few machines which can in fact represent almost
arbitrary polyalphabetic text as a subset of of their generic text format.
The Macintosh is one, thanks to the Script Manager, but so far it's an
exception to the rule, and even it only handles horizontal-format text--so
much for Mongolian...

This brings up a question I had:  I was under the impression that the set
of registered character set defined, in effect, a single mapping of multi-byte
codes to alphabetic glyphs, with ISO 2022 defining how to encode a stream
of these code into a 7 or 8 bit data stream in a reasonably concise manner.
Is this in fact how it was intended to work?  If not, is it an unreasonable
way to look at it?

I guess the gist of what I am trying to say is that the draft proposal
seems to be intended to allow Kermit users to interchange files between
dissimilar systems with as little modification to Kermit (and therefore
as little modification to the text stream format) as possible.  I didn't
read it as a general solution by any means.  Frank and Christine, if this
was the wrong interpretation, please let me know :-).

I hope this clears things up a bit.


Amanda Walker
InterCon Systems Corporation
amanda@lts.UUCP

From KLENSIN@infoods.mit.edu  Sat Apr 15 10:47:45 1989
Return-Path: <KLENSIN@infoods.mit.edu>
Received: from INFOODS.MIT.EDU by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA04948; Sat, 15 Apr 89 10:47:45 EDT
Received: by INFOODS id <0000057B061@INFOODS.MIT.EDU> ;
       Sat, 15 Apr 89 10:45:49 EDT
Date: Sat, 15 Apr 89 09:55:27 EDT
From: John C Klensin <KLENSIN@infoods.mit.edu>
Subject: RE:  A difference in scope, I think.
To: Amanda Walker <lts!amanda@uunet.uu.net>
X-Vms-Mail-To: EXOS%"Amanda Walker <lts!amanda@uunet.uu.net>"
Message-Id: <890415095527.0000057B061@INFOODS.MIT.EDU>
Cc: isokermit@watsun.cc.columbia.edu

Amanda,
  I, too, may be confused about what was intended.  A few observations:

>	"take my internal character set, transform it to ISO2022
>	sequences and an arbitrary collection of registered character 
>	sets, and send it to a remote host with the expectation that it
>	will translate to its nearest approximation to what was sent."

>I agree thyat this is an open-ended problem, since the set of registered
>character sets is open-ended.  That is, I believe why the new attribute
>field was introduced--so that two Kermits can determine whether what they
>can interchange effectively.  Basically, I'd rehprase your quote above
>as "I know what I can send, and you know what you can receive; if these
>overlap, we now know what we can interchange without having to know anything
>about each other's respective character sets."  This is a much more
>restricted capability than the ability to transfer arbitrary polyalphabetic
>text, and one I find quite plausible.  Fridrik's and Andre's experience
>seem to bear this out.
  I think this is what I'm trying to get to, but, for this purpose, the 
draft proposal is (slightly) inadequate.  To make it adequate, 
information needs to be exchanged, at attribute-packet evaluation time, 
about what character set(s) are going to be designated.  I don't think 
that finding out, halfway through a transfer, that you are including a 
reference to a character set that I've never heard of is a satisfactory 
way to proceed.

>The idea of "take my internal character set, transform it into ISO
>2022 form, including the correct announcers and identifiers to tell
>the remote kermit what my character set is, send it to the remote, and
>expect that it will place it in a file, ISO2022 controls and all" is
>trivial, given the ability to translate out of the native character
>set; all the sending kermit has to do is do the translation but flag
>the file as a binary file.
  Possibly trivial, but extremely useful.  This is *not* a binary file, 
it is a text file with certain embedded controls.  Knowing the latter 
makes it much easier to apply post-transmission processing.  It also 
makes ISO->EBCDIC translation possible if solid mappings from ISO-eight-
bit sets to extended EBCDIC ever solidify.  An EBCDIC analogue to 
ISO2022 is equally trivial; the problem is getting the code page mess to 
settle down.  So, when I say "place it in a file, ... and all", I don't 
necessarily mean "completely untransformed", which is the criterion we 
apply to "binary".

>The wider problem, that of universal interchange of arbitrary
>polyalphabetic text, is definitely a fascinating one, but I'm not
>sure it's tractable in kermit in the short term, if only because
>for most (especially small) machines that run kermit, we'd have to
>basically write a whole text processing system to handle it.  I don't
>think that should be part of this effort.
   Concur.  My concern was that the proposal included enough things that 
doing those in combination implied an effort large and complex enough 
that one might as well do this task.

>Of course, there are a few machines which can in fact represent almost
>arbitrary polyalphabetic text as a subset of of their generic text format.
>The Macintosh is one, thanks to the Script Manager, but so far it's an
>exception to the rule, and even it only handles horizontal-format text--so
>much for Mongolian...
  And only if someone builds the appropriate tables.  I don't pay enough 
attention to the Macintosh to know, but has anyone done tables for Thai 
yet?  Just as one example, others are possible.

>This brings up a question I had:  I was under the impression that the set
>of registered character set defined, in effect, a single mapping of multi-byte
>codes to alphabetic glyphs, with ISO 2022 defining how to encode a stream
>of these code into a 7 or 8 bit data stream in a reasonably concise manner.
>Is this in fact how it was intended to work?  If not, is it an unreasonable
>way to look at it?
  Let me give an impression of the answer, since I'm not completely sure 
I understand the question, nor am I sure that I know the correct answer. 
Johan, please comment on this, since I think you are the expert.
  Simple answer (possibly wrong) to one version of your question (I'm 
here ignoring the "multi-byte" part): Yes, that was the way it was
intended, but then someone discovered non-Roman alphabets.  So, today
the answer is "no", regardless of what was intended.  And the reason why
it is "no" makes this "an unreasonable way to look at it". 
  Warning: in what follows, the term "character" is roughly the same as 
"graphic", although the character standards are quite careful to avoid 
specifying exactly how a character should be printed.  However, 
"Icelandic Lower-case Eth" is a "character" in this sense (one of those 
that appear in no other alphabet).  "Character" is independent of 
"alphabet", even though the name of the alphabet(s) in which it appears 
may be part of the character's identification.  More important, it is 
independent of any particular "character set" (really a code table) or 
position (column and row) of that table. "code point" is a synonym for 
"position" as used in this sense.   To all practical intents and 
purposes, "code point" ("column and row") of a "character set" or "code
table" is isomorphic with the binary encoding of that code point in data 
transmission, but that is not strictly true either. 
  For the (extended) Roman alphabet characters, there is an
international standard that lists all of the characters and assigns
standard identifiers (names and sequence numbers) to them.  The
registration procedure says, more or less, "produce a list that
identifies those characters with code points" and we will assign it a 
number.  So, for the Roman-based alphabets, I think the answer to your 
intended question is still "yes", with the "only" problem being that, 
while I can know what code tables I use, I can't know how to translate 
an arbitrary code page, given only its identifier, even if I somehow 
know that it is Roman-only.  The reference listing of characters 
provides an unambiguous way to map between (Roman) character sets, if 
mappings exist (of course, each character set contains only a few 
characters from the whole, so 'mappings' are possible only when one set 
is just a different coding permutation of the characters in another.  Or 
if you decide to map to "closest approximation in graphic form".  That 
is a disaster, since those mapping are often misleading and not
reversible. 
  But SC2, in its wisdom, decided to permit ISO8859 character sets that 
contain distinctly non-Roman characters and some characters that, 
because of right-to-left problems, I'm not sure what to do with even if 
I "understand" the character set and have the characters (see "Example" 
below).  For those non-Roman character assignments, there is, in at 
least some cases, no reference international standard.  Hence no 
universal anything that we are just mapping and remapping.  And, if the 
government of Thailand decides to register a Roman-Thai character set as 
ISO DP 8859-x, it will probably eventually be approved, and the answer 
to your question becomes, seriously, "no".

  Alternate answer, taking the "multi-byte" seriously:  Let me say this 
as strongly as I can: THERE IS NO SUCH THING AS A STANDARD, OR EVEN 
NEAR-STANDARD, MULTIBYTE CHARACTER SET.  ANYWHERE.  The closest 
approximation is ISO DP 10646, and it is a *PROPOSAL*, not a Standard, 
or even a Draft Standard.  It is also, in the opinion of The Netherlands 
and others, defective and it is quite probable that, if it is ever 
approved, it will be changed from the present form.  It is also not 
large enough to handle the number of Hanji that China's standards bodies 
believe they need, since they claim they need at least three octets.  I 
have not studied 10646, so have no further opinions on it.  However, the 
point is that it is not suitable, today, as the base for anything.

> I guess the gist of what I am trying to say is that the draft proposal 
>seems to be intended to allow Kermit users to interchange files between
>dissimilar systems with as little modification to Kermit (and therefore
>as little modification to the text stream format) as possible.
  If this were the proposal, I would be in favor of it.  While I don't 
think the proposal modifies the text stream format very much, I read a 
demand for a lot of data conversion capability into things like Figure 4 
and the surrounding text.  Also read the paragraphs "LOCAL FILE 
REPRESENTATION".  This is not an "implementation detail" or, as we say 
here, a "small matter of programming".  It is an impossible task, 
requiring programs that automatically adapt to changing knowledge and 
conversions that can't occur, at least without significant loss of 
information, in real systems.  
  Or maybe I'm seriously misreading the intent, in which case someone 
should please correct me before I clutter the bandwidth up any further.  
But comments made by others about how nice it would be to have kermit 
perform automatic WordPerfect to ODA conversions imply that I'm not the 
only one reading things this way.

>I hope this clears things up a bit.
  I hope so, too.  I'm getting pretty confused, and hope that I haven't 
added to the general confusion.

  John Klensin
  Klensin@INFOODS.MIT.EDU


From jrd  Sat Apr 15 21:15:37 1989
Return-Path: <jrd>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA08824; Sat, 15 Apr 89 21:15:37 EDT
Date: Sat, 15 Apr 1989 21:15:36 EDT
From: "Joe R. Doupnik" <jrd@watsun.cc.columbia.edu>
To: isokermit
Cc: jrd
Subject: Short commentary
Message-Id: <CMM.0.88.608692536.jrd@watsun.cc.columbia.edu>

	Just a brief commentary on the most recent discussions between John
and Amanda, or "The practical side of affairs, as I perceive them."

        People already use statically defined glyph/token/symbol tables,
relating display glyphs and the corresponding octet value(s) or code point.
Some are fully registered, others such as the IBM-PC left table(s) are not. In
either case they are present and popular enough for us to pay attention in
Kermit.

        I think that disposes of the arguments concerning glyphs in isolation
versus existing in tables: only the tables are of concern to Kermit because
only the tables are used to prepare the original file. Concepts dealing with
isolated glyphs for all of the world's major languages need to count "our"
octets and budgets. I think it also obviates the discussion about how many
octets or whatever are needed to decide which glyph to "display" because that
is really a detail. After all, the escape sequences we have been discussing
are multi-character and some, CSI for example, can be one octet or two,
depending on whether 7 or 8 bit controls are used. While the etymology of
individual characters can be fascinating it has no bearing on our Kermit
discussion.

	To amplify the sentence above, we have been concerned about the
receiver not possessing a table matching that in the transmitter, and the
possibility of letting the receiver select "a close equivalent." To select
implies knowledge of the transmitter's table; in fact the table may exist at
the receiver. But the catch is: the receiver is promising that it can do
something constructive with the character, such as print, display, or
store it in recognizable form at the time the character arrives on the
communications circuit. It may not be able to do that, short of physically
drawing the thing, because the storage device (printer, screen, disk/file
system, etc) has no such ability.

       Consequently, the receiver either accepts text (thanks John) verbatium
without understanding code tables and such or it knows how to reformat the
information for local consumption. That's a simple choice yielding receiver
responses of "Yes, I can deal with that file" or "No, I can't", regardless of
how smart the Kermit receiver code might be (internal tables vs exporting the
information to the operating system). The number of tables, such as
ISO-8859-n, need not be huge in practice. How many are there now? Less than
two dozen I would guess if we don't count Eastern languages. At 256 bytes per
table that is not a huge storage problem; realities of equipment reduce the
quantity of active tables much more.

        Ok, the receiver has plenty of tables but Kermit has to decide where
the file output is going: printer (can it manage Greek etc?), disk (uh oh),
screen (so what kind of display is in use right now?). A mismatch at that
point means translation cannot occur. It also means that Kermit would need to
know a lot about the local computer system; too much, in my opinion. Thus, the
poor user needs to inform the Kermit receiver which tables are permitted with
which destinations; not very pleasant, but I see few alternatives.

        I do have a question for our non-US colleagues: how do your file
systems store mixed language text? I'm betting that almost none has the
slightest concept of language or mixed character sets, just raw octets or
equivalent storage units. If this were the case what should Kermit do?  My
thought is Kermit ought to store/deliver the file in ISO-2022 form and let
later utilities/hardware deal with any conversions, except when the local
"system" (not just Kermit) possesses the character set of the transmitter.

        ISO-2022 and similar are just mechanisms for heavily overloading
octets and thus reducing the number of them on a communications link; they are
intended to be transparent when the communications process completes.  They
are data compression methods, not translation methods. John and Amanda have
elaborated that point but it needs reinforcing. The only place where aspects
of language occurs is specifying the active character sets. I can easily
imagine using ISO-2022 to transmit digitized pictures effectively, as patterns
in a set of unregistered tables. (Let's not get distracted discussing
pictures of characters; FAX does that job well enough already).

        Summarizing: tables count, both sides need a 100% match as a system or
the output is considered verbatum, the number of tables is normally not large,
the number of octets describing a table entry is a detail, ISO-2022 represents
a fairly useful method of walking a hierarchy of such tables, the NxN problem
has not gone away.

	There must be some blind spots in the above discussion or I am missing
the whole point of this proposal.

	Joe D.

From MURAKAMI@ntt-20.ntt.jp  Mon Apr 17 11:23:58 1989
Return-Path: <MURAKAMI@ntt-20.ntt.jp>
Received: from ntt-20.NTT.JP by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA23362; Mon, 17 Apr 89 11:23:58 EDT
Date: Mon, 17 Apr 89 21:53:16 I
From: ken-ichiro murakami <MURAKAMI@ntt-20.ntt.jp>
Subject: my poinion for ISO2022
To: isokermit@watsun.cc.columbia.edu
Message-Id: <12486834997.27.MURAKAMI@NTT-20.NTT.JP>

Hi.

 Kermit> SET TRANSFER-SYNTAX ENGLISH (default is Japanese) :-)

Here are some opinions and comments for ISO2022 discussion.

First, I take back my proposal SET FILE FILTER command.  I think
kermit macro and external program can do the same thing as filter.
(However, it's not on-the-fly but post- or pre-transfer.)  John
recommend me to use MAKE program instead of MS-DOS batch command.
(Thanks, John.) I have not finished to write automatic Kanji
conversion macro using make program. But, it seems the make program
brings us convenient multiple file handling and conditional branch
facility.

Second, I explains you some Japanese situation and some comments to
the previous discussions.

(1)  John says;
>As kermit moves into
>eight-bit character sets, it may be appropriate to modify the protocol
>somewhat to understand that combinations such as "8 data bits, parity, one
>stop" are as well defined as "7 data bits, parity, one stop" and "8 data
>bits, no parity, one stop".  The latter two are supported by the existing
>protocols, the former is not.

Is it correct? I thought all these combinations were allowed in Kermit
protocol. If both end use the same combination, there is no problem.
I think it's out of scope. Frank, is it correct?

(2) John says;
>  Referring to the "possible problem" called out at the bottom of the
>document, the theory according to ISO is that Japan will have to get in
>line.  JIS had the option to taking significant exception to those
>assignments and apparently did not, which may speak for their intentions.
>I await Ken-ichiro Murakami's second round of comments on this, after his
>expert advice arrives.

We, Japanese, have really a lot of Kanji code and we are often annoyed
by them. As John said, JIS had the option to taking significant
exception. But, it is only true for ISO2022 interpretation. As I
reported before, THERE IS NO CONFLICT between ISO/ECMA alphabet codes
and escape sequences and the codes used in Japan. Please refer my
previous comment about it. As for loose Japanese interpretation for
ISO2022, we seldom encounter a problem. It's possible to say there is
ISO2022-like de-facto standard. If this is incorrect, Dr.Fujii
will give us a comment.

(3) John said,

>that acceptance of ISO2022 sending should be followed by a packet that
>specifies what I intend to bind onto G0-G4, and/or a list of what I would
>*like* to use, in descending order.  In the former case, the receiver
>would say "yes" or "no"; in the latter, it would send back its list of
>preferences, leaving off anything it could not handle.  Both of those are
>consistent with the general kermit model.  Or, one could break the rules
>and negotiate back and forth.

As John pointed out, there are two view points;
  (a) whether ISO2022 is supported or not
  (b) what character set is supported

Consider, there are two micro computers which support ISO2022. One is
only for Japanese, the other is only for Chinese. Since the latter
cannot handle Kanji, file transfer between these computers may be
aborted because of unexpected character sets. What should we do? The
problem is that we don't know whether the file in the former computer
contains Kanji or not. If we want to know it, we must read through the
file before file transfer. I don't know what I should do in this case.

(4) John said;

>I.  Terminal emulation
	.
	.
>If those emulators can move beyond the subsets, so much
>the better, but ISO8859-n support, for a small set of n's, is much more
>important than general ISO2022 support including midstream character set
>switching. I am very encouraged in this regard by what I have been
>able to infer about the work in Japan, and would like to hear more about
>it. 

Most of terminals around me support ANSI X3.32, ANSI X3.41, ANSI X3.4,
ANSI X3.64, ISO 646, ISO2022, ISO6429 and other Kanji codes such as
EUC and SHIFTJIS. These documents say nothing about ISO8859-n. Since
ordinary communication channel such as TCP/IP and UUCP ensures only
7-bit transparency, it's necessary for Japanese terminal to support
7-bit ISO2022. (Both EUC and SHIFTJIS needs 8-bit transparency.)
Therefore, ISO2022 is very popular in Japan.

(5) John said;

>if both workstations can handle, say, ISO8859-5, that is no guarantee that
>the intermediate hosts can.  No real problem here, but the files will
>either have to arrive as binary and be post-decoded (as below) or the
>kermit on the receiving workstation will need to be given a *local* SET
>RECEIVE TRANSFER-SYNTAX ... command that deals with the file in the way
>that files of that type are dealt with, independent of any attribute
>negotiations. 

You can implement this facility by macro independent of Kermit
command.  I think we should not add new kermit command if you can do
with macro.

(6) John said;

> 4. For operating systems for which the capability is appropriate, a
>kermit command that looks something like:
>  SET DATA-CONVERTER <routine> <keyword> <value>
>would permit user specification of a routine that could be invoked on the
>fly to convert those particular types of files.  The semantics of
>"routine" would have to be local-system-dependent.  For the record, I
>don't think that this is a good idea, but it is a way to incorporate the
>capability that some of you seem to want within kermit. 

This command is the same as SET FILE FILTER command which I proposed
before. I think macro might do the same thing as SET DATA-CONVERTER
command.

(7) John said;

>  But the main point is that 2-3 above get a "binary" file delivered, and
>delivered with sufficient information to decode it.  And they enforce
>agreement on the file organization and content before the file is
>transferred.  That decoding is probably best done post-transfer, rather
>than on-the-fly, if for no reason than that one is likely to like to have
>the file -- as transferred -- available when the decoding produces
>something unexpected.  If you decode or translate on the fly, a new set of
>debugging options that save the transferred data would be helpful.

This is important point. It seems it's safe to adopt post-transfer
decoding.

(8) Andr'e said;

> So, I suggest that international characters be supported on two levels:
> 1) restricted, within a single version of ISO8859, in the.....
> 2) general, across multiple codes or ISO versions, in which usage ...

We, Japanese, need 2).  1) will bring us nothing. For Japanese,
standardization of Kanji(ISO2022) is important. Ordinary, our files
contain various characters including Kanji, Kana, Roman ASCII.
In addition, we have many Kanji character codes.


That's all. Since English is one of my weak point, I might misunderstand
what other guys said. If so, please correct me. 

Our objective to adopt ISO 2022 is automatic Kanji conversion. It
seems it slightly different from your objective.

-Ken

P.S. 

Since I'm very busy, I cannot afford to post my opinion anymore in
this month. It's time consuming job for me to read and write English.
Sorry.
-------

From fdc  Mon Apr 17 21:40:34 1989
Return-Path: <fdc>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA28863; Mon, 17 Apr 89 21:40:34 EDT
Date: Mon, 17 Apr 1989 21:40:33 EDT
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: isokermit
Subject: De nonnullis
Message-Id: <CMM.0.88.608866833.fdc@watsun.cc.columbia.edu>

 (is anyone else having trouble mailing to ISOKERM@CUVMA.BITNET?  It's
  supposed to work...)
                ---------------
Date: 18 April 1989, 00:54:22 SET
From: Gisbert W.Selke           +49 228 225888       <RECK@DBNUAMA1.BITNET>
To:   ISOKERM at CUVMA
Re:   De nonnullis

Mhm, I'm not sure I had antecipated discussions of this scope. Well.
Living in a mostly non-English speaking country and having to cope with
several (mis)representations of German characters, I think I'd be quite
happy with a standard  - a *Kermit* standard that not necessarily
coincides completely with any ISO or whatever standard or draft or proposal
- to be able to transfer German files from one system to the (un-like) other,
with at least the characters coming out the 'same'. Now, this means I'm not
interested mainly in converting from one word processor format to the other;
as long as there are many different formats living on the same machine, we
can't even dream of tackling that task in a general fashion. So that should
indeed be left as an implementation detail, or as mere programming,
preferably outside of Kermit proper.
What I *am* interested in is defining a standard so that I may reasonably
expect a file - containing, say, a maths text (in German, with Greek
characters) - that I send from my PC to arrive well and legible on my
friend's VAX in Turkey, say.
Basically, the ISO kermit draft proposal does exactly this. Apart from
details, having this would be a great thing for us - practically speaking.
And I think it's not too Eurocentristic either - we should indeed be able
to cope with Eastern languages within this scheme, at least to a great
extent.
The purpose of all this is, of course, *file* transfer - so the question
of how the receiving system might be able to represent physically whatever
it receives is of minor interest (oh yes, it should be able to store
incoming bits...). Of course I can write a TeX text on my vanilla PC, and
I'm not at all bothered that it doesn't have the hardware to display it
properly on screen, or even on my cheapo dot matrix printer - so, again,
sending such a file is not a question of Kermit (per se) or the receiver's
hardware but of the software that it runs. (Imagine someone sending me a
file in Hebrew - all I'd need is, say, an ISO-to-TeXXeT converter to
print it out or even view it on screen. No extraordinary hardware!)

To repeat: what we (well, I for one) need is a reasonable chance of two
arbitrary Kermits speaking the same language. Anything else should be left
to the respective Kermit or stand-alone 'mere programming', and shouldn't be
allowed to clobber the transmission standard.  So that does indeed reduce the
NxN problem to a 2x(N-1) problem. - As a matter of course, we may suggest a
standardized syntax for the user to specify what conversion to use (*if* a
particular Kermit implementation provides such conversion mechanisms);
but that's not really what is at stake here. Yes, the user must, in general,
know something about the file format E uses; still, it's much better to have
to say something like 'set file-type xyz' than to have no chance at all to
get a file thorugh unharmed. And I guess that's all you can - practically -
really ask for - as long as we don't have The Universal Word Processor.

\Gisbert

From cmg  Tue Apr 18 11:40:38 1989
Return-Path: <cmg>
Received: by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA04967; Tue, 18 Apr 89 11:40:38 EDT
Date: Tue, 18 Apr 1989 11:40:37 EDT
From: Christine M Gianone <cmg@watsun.cc.columbia.edu>
To: isokermit
Subject: ISO/Kermit Draft Proposal....
Message-Id: <CMM.0.88.608917237.cmg@watsun.cc.columbia.edu>

Many thanks to John Klensin for his detailed, thoughtful, and probing comments
on the second ISO/Kermit draft proposal, and to Amanda Walker, Joe Doupnik, et
al, for their remarks also.

ISO 2022 TRANSFER SYNTAX

It is true that ISO 2022 is not widely implemented, except to some degree in
DEC VT300-series terminals and a few other places.  Nevertheless, we are
proposing to use it in Kermit as a file transfer syntax because it is the
only widely known and approved mechanism for designating and switching among
textual character sets.  Furthermore, the 7- and 8-bit single-byte and
multibyte character sets described in the proposal are in use today, and they
are standardized and registered.

Our concern with registration is that there be a unique identifier for each
character set.  We assume (perhaps naively) that these character sets -- which
are agreed upon after months or years of debate by national and international
standards organizations -- will not change very often, meaning not more than
once every 5 or 10 years (like, say, US ASCII or GOST Latin/Cyrillic), in
other words much less frequently than do Kermit programs themselves.  And when
they do, adjusting Kermit programs will be the least of our problems!  In any
case, the proposal allows for both the character set identifier AND the
version number.

It's true that other sets will be registered as time goes on, and that the
world may eventually settle on universal multibyte code.  Neither of these
considerations should hold us back from trying to extend Kermit file transfer
to accommodate alphabets beyond ASCII.  Nor should the current proposal
preclude addition at a later time of a transfer syntax built around a
universal character code.

It is important that we agree that the use of ISO 2022 escape sequences and
shifts with registered alphabets provides a completely unambiguous
representation of the text being transferred.  This is the "common
intermediate representation" or "transfer syntax", crucial to any
communication protocol, that avoids the n x n problem in which every computer
must know about every other computer's formats.  So the most fundamental
question we can ask is: "Have we chosen the best transfer syntax?".  Once this
matter is settled, all other questions boil down to implementation --
conversion between local file format and the transfer syntax, matching
capabilities between two ISO-capable Kermits, and the design of the user
interface.

LOCAL FILE FORMAT

As John points out, the proposal deliberately sidesteps the issue of local
storage format.  ISO 2022 is not used for this, and there is no other widely
accepted standard.  And in fact, mixed-language documents are often embedded
in some word-processor's complicated proprietary and version-dependent
formatting.  This issue has already prompted some lively debate, as well as
suggestions for a SET FILE FILTER command to pre- and postprocess such files,
and even more ambitious suggestions to incorporate ISO ODA or SGML in the
transfer syntax.  This latter suggestion must be deferred for further study,
but we must take care to ensure that the multinational text transfer syntax
that we decide upon does not preclude the later addition of a "document
description language" layer.  Anyone familiar with ODA or SGML is urged to
consider this question.

Nevertheless, if we are to extend the Kermit protocol to transfer
multinational text, the extension should be as general and unrestrictive as
possible.  So some day, when (if) multi-language files are common and easily
parsable, the Kermit protocol will be ready to transfer them.  In the
meantime, the proposal should also address more mundane tasks like
transferring, say, a French document from a PC to a Macintosh.

MATCHING CAPABILITIES BETWEEN KERMITS 

John makes the very good point that the attribute packet includes ISO 2022
announcers, but not the alphabet designators.  Therefore, a file transfer
could fail in midstream when the receiver sees an unknown alphabet designator.
This was a deliberate omission.  Do we want to turn Kermit into a 2-pass
compiler?  This could be tedious for large files, particular in multi-file
operations, where the entire operation could time out before the sender had
time to comb through a 9000-megabyte file collecting alphabets.

In a common case, we're transferring files between incompatible systems, but
the files contain only characters that have equivalents in (say) the ISO
Latin-1 alphabet.  No problem here, so long as both Kermits know this alphabet
and use it in the transfer syntax.  But as soon as we mix two or more
character sets, there is the chance that these might include one that the
receiving Kermit will not know about.  How does the sending Kermit tell the
receiving Kermit what alphabets will be used?

First, note that we have not required the use of attribute packets.  Therefore
any attribute-based notification will be optional.  Second, we have
recommended (but not required) that alphabet designating sequences be
transmitted at the head of the file data.  Perhaps this is a bad idea -- it
might cause a lot of unecessary disk activity at the receiver -- premature
reading in of translation tables, etc.

Prenotification of alphabets should remain optional, so that the sender is not
compelled to read the entire file before sending it.  But assuming the sender
knows the alphabets in advance, then it makes sense to send a list of them in
an attribute packet.  This means we have to (a) define a new attribute, and
(b) settle upon the syntax of the alphabet designators.

The first unused attribute code is "2" (ASCII 50, 03/02 if you prefer).  Let's
call this one "Character Set".  One or more of these may appear in an
Attribute packet in any positions.  The format is:

    +---+---+-----+
    | 2 | L | CSD |
    +---+---+-----+

where CSD is the "character set designator", and L is the length of the CSD.
We must still decide the format of the CSD.  It can't just be the final letter
of the designating escape sequence, because these are not necessarily unique
(see Table 5 of Draft 2 of the proposal).  There would seem to be several
choices for the CSD:

  1. Something we make up, like "A" for Latin 1, "B" for Latin-2, etc.

  2. Everything after the <ESC> in the designating sequence (as shown in
     Table 5 of the second draft).  The first character (such as "-", "(", or
     "$") would tell whether it  was a 94, 96, or multibyte character set.
     The problem here is that different characters are used in this position
     for the same character set, depending on whether it is to be designated
     as G0, G1, G2, or G3.  Also, the designators are different lengths,
     depending on whether the character set is single- or multi-byte.

  3. The ECMA registration number, as shown in Table 5.  This makes more
     sense, since we are simply identifying the character set and not making
     irrelevant assertions about where to assign it.  But in the latest
     version of Table 5, we are still missing registration numbers for
     JIS-Katakana and JIS-Roman.  Can anyone supply these?

OK, so now we have a way for the sender to notify the receiver about the
alphabets in advance, so the receiver can reject the file if it contains an
unknown alphabet and, because character sets now have their own attribute
type, the receiver can inform the sender exactly why the file has been
rejected.

This can save the user time, but it doesn't get the file transferred.  The
user currently has one alternative: SET TRANSFER-SYNTAX NORMAL on the
receiving end, send the file with ISO-2022 syntax so that it will be stored
with embedded alphabet designators and shifts on the receiving end, and then
postprocess it after it is received.

A second alternative is now suggested to cover the case where a file contains
a mixture of alphabets, some known to the receiver, others not.  The receiver
has not been prenotified of the alphabets, and has not rejected the file.  At
some point, an alphabet designator arrives which the receiving Kermit does not
recognize.  We suggest that this designator be accepted and stored as data,
and that subsequent characters be stored untranslated.  When a designator for
a known alphabet arrives, the receiving Kermit stores the ISO-2022 Coding
Method Delimiter, <ESC>d, and resumes translation.  We suggest that this be
the default behavior when an unexpected, unknown alphabet arrives, but that
the behavior can be controlled by a new command, SET UNKNOWN-ALPHABET {KEEP,
DISCARD}.

Now suppose the receiving computer has applications or devices which support a
character set which the receiving Kermit does not know about?  To cover this
case, we can define a standard format for translation tables, and provide a
LOAD CHARACTER-SET <filename> command to allow the user to add new character
sets to a Kermit program's repertoire.  What should such a file look like?
Here's a first attempt to design the format.  The file is written entirely in
printable ASCII, with line divisions as shown.  Numbers are represented as
ASCII decimal digits.

Line  Contents
 1.    Transfer Syntax Character Set Name (e.g. LATIN1-ISO)
 2.    Local Character Set Name (e.g. IBM-CODE-PAGE-437)
 3.    Number of bytes per character (1, 2, 3) = b
 4.    Number of Characters per plane (94, 96) = n
 5.    ISO/ECMA Registration Number of Transfer Character Set (0 if none)
 6.    Final Letter of Designating Sequence for Transfer Character Set
 7.    Version Number of Transfer Character Set (0 if none)
 8.    Direction of display (encoded to allow for any combination of
       left, right, upwards, downwards, boustrophedon, etc).
 9-16. Reserved.

Each line, 17 through (17 + n^b), contains a pair of characters in 8-bit form,
in ASCII decimal representation, with the two characters separated by a comma:

       <character from transfer set>, <character from local set>

The character pair is listed, rather than a single value (as in most
translation tables) so that the program may build two tables from it, one for
each direction of translation.

For a single byte set, the numbers vary from 0 to 255.  Typical entries
might look like:

        43, 43
       243, 224
       
For a multibyte set, each byte is represented separately, for example:

       37 143, 255 10
       37 144, 255 142

The obvious limitation of this kind of loadable translation table is that
it is one-to-one.  It will not accommodate transfer syntaxes like T.61, which
would require some two-to-one mappings, nor local file formats in which
special characters might be represented by sequences.  Is it worth expanding
the syntax of the loadable translation table to allow for arbitrary
translations?

There is also an implication that character set names must be standardized and
registered, so that different Kermit versions will mean the same thing by the
same command, and possibly that built-in translation tables can be overlaid by
user-defined ones, and also so that tables may be built up and shared by
Kermit users.  Perhaps most significant (and we're not sure if this is a plus
or a minus), user-loadable character sets would also let people use ISO-2022
transfer syntax with nonstandard and/or unregistered character sets, for
instance by a Cherokee Indian organization that composes its newspaper on a
combination of PCs and Macintoshes that use different encodings for Cherokee.

John argued that character set standards change all the time.  We think it's
more likely that computer-system-specific character sets will change all the
time -- witness the hot debate over IBM EBCDIC Code Pages.  Either way, we
have an argument for user-loadable (or site-loadable) translation tables.

To complement the LOAD CHARACTER-SET command, there should also be a SHOW
CHARACTER-SET command, by which the Kermit program tells the user what
character sets it knows about, including the name, size, designator, version,
and registration number of each set.

A SINGLE ALPHABET

For the foreseeable future, few of the complications of the proposal will come
into play.  As many have pointed out, the overwhelming use of an
"ISO-Extended" Kermit will be within a single 8-bit character set, like ISO
8859-1.  Recall that the proposal says this can be done by sending the data
as-is, with no shifts, announcers, or designators, provided both Kermits know
what the transfer alphabet is.  However, the proposal is vague on exactly how
to accomplish this.  The recent messages from Fridrik and Andre amplify this
concern.  To address this common case, we suggest a command like this:

    SET TRANSFER-SYNTAX LATIN1-ISO

to specify that the sender will translate from local notation to (e.g.) ISO
Latin 1, or that the receiver will interpret incoming data as ISO Latin 1.
This command would differ from SET TRANSFER-SYNTAX ISO-2022 in that no
ISO-2022 announcers, designators, shifts, or other controls would be inserted
into the data stream.  

In order for the sender to inform the receiver of the transfer alphabet,
another new operand can be defined for the Encoding ("*") attribute.  This
could be "O" (uppercase O, for "Other alphabet"), meaning that the character
set is specified in the "2" attribute, of which there should be only one per
file.  Example (recall that Attribute packets are in Parameter-Length-Value
notation):

   +---+---+---+---+---+-----+
   | * | ! | O | 2 | # | 100 |
   +---+---+---+---+---+-----+
     P   L   V   P   L   V

Encoding = "O", Alphabet = 100 (ECMA Registration Number for Latin 1, assuming
this is the convention we adopt for alphabet identification).

CONCLUSION

In light of recent comments, it would seem useful to break the ISO/Kermit
proposal into levels.  Level 0 (default) is the current "normal" file transfer
syntax.  Level 1: Allow user to specify transfer syntax to be some character
set other than ASCII, with no announcers, designators or shifts.  Level 2:
Full ISO-2022 transfer syntax, as previously proposed, amended to allow (a)
preannouncement of character sets, and (b) user-defined loadable character
sets.  Defer all discussion of "higher-level" presentation syntaxes
(wordprocessor/database/spreadsheet/graphics formats, etc, not to mention LZW
compression...) until MUCH LATER, but keep them in mind while designing
Kermit's international-alphabet extension.  Finally, note that Kermit Levels
0, 1, and 2 can be mixed, so that (for example) a PC can convert from its
international code page to ISO Latin 1, with or without announcers,
designators, and shifts, send it to another Kermit operating at Level 0, and
have it stored as transmitted for later postprocessing.

For John's "quibbles" and editorial remarks, as well as corrections of fact,
we are grateful, and they will be reflected in the next draft.  We are also
indebted to Amanda, Joe, Ken, Hiro, Andre, Fridrick, Gisbert, and all others
who have contributed to the current round of discussions.

We enclose below a simplified flow diagram representing the operation of an
extended Kermit program.  Further discussion welcome!

- Christine and Frank


NOTE: The following does not take into account the FILE FILTER function which
has been proposed and withdrawn, and may be proposed again.  Also, an 8-bit
transmission channel is assumed.


  SET FILE TYPE BINARY (overrides SET TRANSFER-SYNTAX command)
  |    |
  N    Y-->  Transfer file unmodified.  END.
  |
  Text mode.  Three possibilities:
  SET TRANSFER-SYNTAX NORMAL (the default)
  |    |
  N    Y-->  LEVEL 0: Transfer syntax is ASCII with CRLF as line terminator.
  |          Sending program translates from local format to transfer syntax,
  |          Receiving program translates from transfer syntax to local format.
  |          END.
  |
  SET TRANSFER-SYNTAX LATIN1-ISO (or any other single character set)
  |    |
  N    Y-->  LEVEL 1: Transfer syntax is specified character set with CRLFs.
  |          Sender translates from local format to specified character set
  |          using (a) default built-in table, (b) built-in table selected by
  |          SET FILE CHARACTER-SET, or (c) user-defined table obtained via
  |          LOAD CHARACTER-SET and then selected by SET FILE CHARACTER-SET.
  |          Receiver may operate at Level 0 (file will be stored as sent),
  |          or Level 1 or 2 (user must issue {SET FILE, LOAD} CHARACTER-SET
  |          commands if this is not the program's default character set, and
  |          also the appropriate SET TRANSFER-SYNTAX command to have receiving
  |          Kermit program convert the file to local form).  END.
  |
  File composed of more than one character set:
  SET TRANSFER-SYNTAX ISO-2022
  |    |
  N    Y-->  LEVEL 2: Transfer syntax is ISO-2022.  Assumes that sender can
  |          Identify the different character sets in the local file, and
  |          can translate them to registered character sets if necessary.
  |          |
  |          Do sender and receiver both support Attribute packets?
  |          |    |
  |          N    Y-->  Sender specifies encoding ("*") to be ISO-2022 ("I"),
  |          |          and lists ISO-2022 announcers.  Sender also optionally
  |          |          lists the alphabets to be used in Attribute "2".
  |          |          |
  |          |          Receiver agrees to these facilities and alphabets?
  |          |          |    |
  |          |          Y    N --> Receiver rejects the file, indicating "*"
  |          |          |          or "2" as the reason.  END.
  |          |          |
  |          |          |          NOTE: If receiver has been set to NORMAL
  |          |          |          transfer-syntax, it will always accept the 
  |          |          |          file.
  |          |          |
  |          |          Receiver accepts the file.
  |          |          |
  |          Transfer begins.  Sender translates from local file format to
  |          the character sets of the transfer syntax, using ISO-2022
  |          announcers, designators, and shifts to switch among them.
  |          |
  |          If the sender encounters an unknown alphabet while reading the
  |          file, it should send an EOF (Z) packet with the D (Discard) code 
  |          in the data field and proceed to the next file, if any.  Receiver 
  |          will discard or keep the file according to its SET INCOMPLETE 
  |          setting.
  |          |
  |          Receiver may operate at Level 0, Level 1, or Level 2.
  |          |          |          |
  |          |          |          0 -- > Receiver stores the data in the form
  |          |          |                 transmitted, retaining all 
  |          |          |                 announcers, designators, and shifts, 
  |          |          |                 but converting from ASCII/CRLF
  |          |          |                 format to local text format.
  |          |          |
  |          |          1 -->  Receiver stores the data as transmitted, 
  |          |                 retaining all announcers, designators, and 
  |          |                 shifts, but converting from the specified 
  |          |                 transfer character set to local text format.
  |          |
  |          2 --> Receiver heeds announcers, designators, and shifts, and
  |                translates from the indicated character set to local 
  |                representation.  Translations are according to built-in
  |                tables, or tables obtained via LOAD CHARACTER-SET.
  |                |
  |                If the receiver encounters an alphabet it does not know, it 
  |                will act according to the SET UNKNOWN-ALPHABET command:
  |                |      |
  |                KEEP   DISCARD --> Reject the file by putting an X (Cancel 
  |                |                  File) code in the data field of its 
  |                |                  Acknowledgement.  END.
  |                |
  |                (default) Continue to receive the file, but store the 
  |                designator for the unknown alphabet along with the 
  |                untranslated characters from that alphabet, until the next 
  |                known alphabet is encountered.  Mark the end of the 
  |                untranslated material with the ISO-2022 Coding Method
  |                Delimiter, <ESC>d.  Also, issue a warning to the user.
  |                |
  |                END.
  |
  Reserved for future (e.g. ISO 10646)...

(END)

From lts!amanda@uunet.uu.net  Tue Apr 18 12:46:53 1989
Return-Path: <lts!amanda@uunet.uu.net>
Received: from uunet.UU.NET by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA05678; Tue, 18 Apr 89 12:46:53 EDT
Received: from lts.UUCP by uunet.UU.NET (5.61/1.14) with UUCP 
	id AA06348; Tue, 18 Apr 89 12:45:24 -0400
Received: by lts.UUCP (4.12/2.881128)
	id AA15825; Tue, 18 Apr 89 11:40:23 est
Date: Tue, 18 Apr 89 11:40:23 est
From: Amanda Walker <lts!amanda@uunet.uu.net>
Message-Id: <8904181640.AA15825@lts.UUCP>
To: isokermit@watsun.cc.columbia.edu
Subject: Re: Frank & Christine's latest messages

I think the three-level idea is a good one, but I do want to bring up
one point that John made to me in email: as soon as alphabet shifts
are introduced, the encdoding of a particular piece of text is no
longer unambiguous, since glyphs are duplicated across alphabets.
The JIS Kanji set contains roman, cyrillic, and greek characters;
ISO 8859/2 contains non-ASCII characters that are also in ISO 8859/2,
and so on.  This is not a problem for rendering the text, but it could
be for interpretation.

Christine's suggestion of mandating the ability to use a
dynamically-loaded translation table does alleviate this problem to
some extent.  If Machine A -> ISO 2022 is a one-to-many mapping
(which it will be in the general case), then ISO 2022 -> Machine B
needs to be able to be a many-to-one mapping.


--Amanda

                                                                                                                                                                                                                                                                                                                                                       