usefor-article-03 February 2000

[< Prev] [TOC] [ Next >]
4.4.  Characters and Character Sets

   Transmission paths for news articles MUST treat news articles as
   uninterpreted sequences of octets, excluding the values 0 (ASCII NUL)
   and 13 and 10 (ASCII CR and LF, which MUST ONLY appear in the
   combination CRLF which denotes a line separator).

        NOTE: this correspponds to the range of octets permitted for
        Mime "8bit data" [RFC 2045].  Thus raw binary data cannot be
        transmitted in an article body except by the use of a Content-
        Transfer-Encoding such as base64.

   An octet, or a sequence of octets, may represent a character in some
   Coded Character Set (CCS) as determined by some Character Encoding
   Scheme (CES) [RFC 2130].

   If it comes to a relaying agent's attention that it is being asked to
   pass an article using the Content-Transfer-Encoding "8bit" to a
   relaying agent that does not support it, it SHOULD report this error
   to its administrator. It MUST refuse to pass the article and MUST NOT
   re-encode it with different Mime encodings.

        NOTE: This strategy will do little harm. The target relaying
        agent is unlikely to be able to make use of the article on its
        own servers, and the usual flooding algorithm will likely find
        some alternative route to get the article to destinations where
        it is needed.
[< Prev] [TOC] [ Next >]
#Diff to first older
NewerOlder
News Article Format June 2003
News Article Format April 2003
News Article Format February 2003
News Article Format August 2002
News Article Format May 2002
News Article Format November 2001
News Article Format July 2001
News Article Format April 2001
Son of 1036 June 1994

--- ../s-o-1036/Characters_And_Character_Sets.out          June 1994
+++ ../usefor-article-03/Characters_And_Character_Sets.out          February 2000
@@ -1,206 +1,28 @@
 4.4. Characters And Character Sets
 
-Header and body lines MAY contain any ASCII characters other
-than CR (ASCII 13), LF (ASCII 10), and NUL (ASCII 0).
-
-     NOTE:  CR  and  LF are excluded because they clash
-     with common  EOL  conventions.   NUL  is  excluded
-     because  it  clashes with the C end-of-string con-
-     vention, which is  significant  to  most  existing
-     news   software.    These   three  characters  are
-     unlikely to be transmitted successfully.
-
-However, posters SHOULD avoid using ASCII control characters
-except for tab (ASCII 9), formfeed (ASCII 12), and backspace
-(ASCII 8).  Tab signifies sufficient horizontal white  space
-to  reach  the next of a set of fixed positions; posters are
-warned that there is no standard set of positions,  so  tabs
-should be avoided if precise spacing is essential.  Formfeed
-signifies a point at which a reading agent SHOULD pause  and
-await  reader  interaction  before  displaying further text.
-Backspace SHOULD be used only for  underlining,  done  by  a
-sequence of underscores (ASCII 95) followed by an equal num-
-ber of backspaces, signifying that the same number  of  text
-characters  following  are  to  be  underlined.  Posters are
-warned that underlining  is  not  available  on  all  output
-devices  and  is  best  not relied on for essential meaning.
-Reading agents SHOULD recognize underlining and translate it
-to the appropriate commands for devices that support it.
-
-     NOTE: Interpretation of almost all control charac-
-     ters  is  device-specific  to  some  degree,   and
-     devices  differ.   Tabs  and  underlining are sup-
-     ported, to some extent, by most modern devices and
-     reading  agents, hence the cautious exemptions for
-
-INTERNET DRAFT to be        NEWS                    sec. 4.4
-
-
-     them.  The underlining method is specified because
-     the  inverse method, text and then underscores, is
-     tempting to the naive... but if sent unaltered  to
-     a  device  that shows only the most recent of sev-
-     eral overstruck characters rather than  a  compos-
-     ite, the result can be utterly unreadable.
-
-     NOTE: A common interpretation of tab is that it is
-     a request to space forward to  the  next  position
-     whose  number  is  one  more than a multiple of 8,
-     with positions numbered sequentially  starting  at
-     1.  (So tab positions are 9, 17, 25, ...)  Reading
-     agents not constrained by existing system  conven-
-     tions might wish to use this interpretation.
-
-     NOTE: It will typically be necessary for a reading
-     agent to catch and interpret  formfeed,  not  just
-     send  it  to  the output device.  The actions per-
-     formed by typical output devices  on  receiving  a
-     formfeed  are neither adequate for nor appropriate
-     to the pause-for-interaction meaning.
-
-Cooperating subnets which wish to employ non-ASCII character
-sets  by using escape sequences (employing, e.g., ESC (ASCII
-27), SO (ASCII 14), and SI (ASCII 15)) to alter the  meaning
-of  superficially-ASCII  characters  MAY do so, but MUST use
-MIME headers to alert reading agents to the particular char-
-acter  set(s)  and escape sequences in use.  A reading agent
-SHOULD not pass such an escape sequence through,  unaltered,
-to  the  output  device  unless  the agent confirms that the
-sequence is one used to affect character sets and has reason
-to  believe  that the device is capable of interpreting that
-particular sequence properly.
-
-     NOTE:  Cooperating-subnet  organizers  are  warned
-     that  some very old relayers strip certain control
-     characters out of articles they pass  along.   ESC
-     is known to be among the affected characters.
-
-     NOTE:  There  are  now standard Internet encodings
-     for Japanese [rrr] and Vietnamese [rrr] in partic-
-     ular.
-
-Articles  MUST  not  contain  any octet with value exceeding
-127, i.e. any octet that is not an ASCII character.
-
-     NOTE: This rule, like others, may  be  relaxed  by
-     unanimous  consent of the members of a cooperating
-     subnet, provided suitable precautions are taken to
-     ensure  that  rule-violating  articles do not leak
-     out of the subnet.  (This has already been done in
-     many  areas  where  ASCII  is not adequate for the
-     local language(s).)  Beware that articles contain-
-     ing non-ASCII octets in headers are a violation of
-
-INTERNET DRAFT to be        NEWS                    sec. 4.4
-
-
-     the MAIL specifications and  are  not  valid  MAIL
-     messages.   MIME  offers a way to encode non-ASCII
-     characters in ASCII for use in headers;  see  sec-
-     tion 4.5.
-
-     NOTE: While there is great interest in using 8-bit
-     character sets, not all software  can  yet  handle
-     them  correctly.  Hence the restriction to cooper-
-     ating subnets.  MIME  encodings  can  be  used  to
-     transmit  such  characters  while remaining within
-     the octet restriction.
-
-In anticipation of the day when it is possible to  use  non-
-ASCII  characters  safely  anywhere,  and to provide for the
-(substantial) cooperating subnets  that  are  already  using
-them, transmission paths SHOULD treat news articles as unin-
-terpreted sequences of octets (except perhaps for  transfor-
-mations  between  EOL  representations)  and relayers SHOULD
-treat non-ASCII characters in articles as  ordinary  charac-
-ters.
-
-     NOTE:  8-bit  enthusiasts  are warned that not all
-     software conforms to  these  recommendations  yet.
-     In particular, standard NNTP [rrr] is a 7-bit pro-
-     tocol, and  there  may  be  implementations  which
-     enforce  this rule.  Be warned, also, that it will
-     never be safe to send raw binary data in the  body
-     of news articles, because changes of EOL represen-
-     tation may (will!) corrupt it.
-
-Except  where  cooperating  subnets   permit   more   direct
-approaches,  MIME [rrr] headers and encodings SHOULD be used
-to transmit non-ASCII content using  ASCII  characters;  see
-section  4.5, appendix B, and the MIME RFCs for details.  If
-article content can be expressed in  ASCII,  it  SHOULD  be.
-Failing  that, the order of preference for character sets is
-that described in MIME [rrr].
-
-     NOTE: Using the MIME facilities, it is possible to
-     transmit ANY character set, and ANY form of binary
-     data, using only ASCII characters.  Equally impor-
-     tant,  such  articles  are self-describing and the
-     reading agent can tell which octet-to-symbol  map-
-     ping  is  intended!  Designation of some preferred
-     character sets is intended to minimize the  number
-     of character sets that a reading agent must under-
-     stand in order to display most articles  properly.
-
-Articles  containing  non-ASCII  characters,  articles using
-ASCII characters (values 0 through 127)  to  refer  to  non-
-ASCII  symbols, and articles using escape sequences to shift
-character sets SHOULD include MIME headers indicating  which
-character set(s) and conventions are being used, and MUST do
-so  unless  such  articles  are  strictly  confined   to   a
-
-INTERNET DRAFT to be        NEWS                    sec. 4.4
-
-
-cooperating subnet which has its own pre-agreed conventions.
-MIME encodings are preferred over all these techniques.   If
-it  comes to a relayer's attention that it is being asked to
-pass an article using such techniques outward across what it
-knows  to  be  the boundary of such a cooperating subnet, it
-MUST report this error to its administrator, and MAY  refuse
-to  pass the article beyond the subnet boundary.  If it does
-pass the article, it MUST re-encode it with  MIME  encodings
-to make it conform to this Draft.
-
-     NOTE:  Such re-encoding is a non-trivial task, due
-     to MIME rules such as the  prohibition  of  nested
-     encodings.   It's not just a matter of pouring the
-     body through a simple filter.
-
-Reading agents SHOULD note MIME headers and attempt to  show
-the   reader  the  closest  possible  approximation  to  the
-intended content.  They SHOULD not just send the  octets  of
-the  article to the output device unaltered, unless there is
-reason to believe that the output device will indeed  inter-
-pret  them  correctly.   Reading  agents MUST not pass ASCII
-control characters or escape sequences, other than  as  dis-
-cussed above, unaltered to the output device; only by chance
-would the result be the desired one, and  there  is  serious
-potential  for  harmful  side  effects, either accidental or
-malicious.
-
-     NOTE: Exactly what to  do  with  unwanted  control
-     characters/sequences  depends on the philosophy of
-     the reading agent, but passing  them  straight  to
-     the  output device is almost always wrong.  If the
-     reading agent wants to mark the presence of such a
-     character/sequence  in  circumstances  where  only
-     ASCII printable characters are  available,  trans-
-     lating  it  to "#" might be a suitable method; "#"
-     is a conspicuous character seldom used  in  normal
-     text.
-
-     NOTE: Reading agents should be aware that many old
-     output devices (or the transmission paths to them)
-     zero out the top bit of octets sent to them.  This
-     can transform non-ASCII characters into ASCII con-
-     trol characters.
-
-Followup  agents MUST be careful to apply appropriate trans-
-formations of representation to  the  outbound  followup  as
-well  as  the  inbound  precursor.  A followup to an article
-containing non-ASCII material is very likely to contain non-
-ASCII material itself.
-
-INTERNET DRAFT to be        NEWS                    sec. 4.5
+   Transmission paths for news articles MUST treat news articles as
+   uninterpreted sequences of octets, excluding the values 0 (ASCII NUL)
+   and 13 and 10 (ASCII CR and LF, which MUST ONLY appear in the
+   combination CRLF which denotes a line separator).
+
+        NOTE: this correspponds to the range of octets permitted for
+        Mime "8bit data" [RFC 2045].  Thus raw binary data cannot be
+        transmitted in an article body except by the use of a Content-
+        Transfer-Encoding such as base64.
+
+   An octet, or a sequence of octets, may represent a character in some
+   Coded Character Set (CCS) as determined by some Character Encoding
+   Scheme (CES) [RFC 2130].
+
+   If it comes to a relaying agent's attention that it is being asked to
+   pass an article using the Content-Transfer-Encoding "8bit" to a
+   relaying agent that does not support it, it SHOULD report this error
+   to its administrator. It MUST refuse to pass the article and MUST NOT
+   re-encode it with different Mime encodings.
+
+        NOTE: This strategy will do little harm. The target relaying
+        agent is unlikely to be able to make use of the article on its
+        own servers, and the usual flooding algorithm will likely find
+        some alternative route to get the article to destinations where
+        it is needed.
 

Documents were processed to this format by Forrest J. Cavalier III