usefor-article-04 April 2001

[< Prev] [TOC] [ Next >]
4.4.1.  Character Sets within Article Headers

   Within article headers, characters are represented as octets
   according to the UTF-8 encoding scheme [ISO 10646] or [RFC 2279] and
   hence all the characters in the Universal Multiple-Octet Coded
   Character Set (UCS) [ISO 10646] (which is essentially a superset of
   Unicode [UNICODE] and expected to remain so) are potentially
   available. However, interpreting the octets directly as US-ASCII
   characters should ensure correct behaviour in most situations.

        NOTE: UTF-8 is an encoding for 16bit (and even 32bit) character
        sets with the property that any octet less than 128 immediately
        represents the corresponding US-ASCII character, thus ensuring
        upwards compatibility with previous practice.  Non-ASCII
        characters from UCS are represented by sequences of octets
        satisfying the syntax of a UTF8-xtra-char (2.4).  Only those
        octet sequences explicitly permitted by [RFC 2044] shall be
        used.  UCS includes all characters from the ISO-8859 series of
        characters sets [ISO 8859] (which includes all Greek and Arabic
        characters) as well as the more elaborate characters used in
        Japan and China. See the following section for the appropriate
        treatment of UCS characters by reading agents.

   Notwithstanding the great flexibility permitted by UTF-8, there is
   need for restraint in its use in order that the essential components
   of headers may be discerned using reading agents that cannot present
   the full UCS range. In particular, header-names and tokens MUST be in
   US-ASCII, and certain other components of headers, as defined
   elsewhere in this standard - notably msg-ids, date-times, dot-atoms,
   domains and path-identities - MUST be in US-ASCII.  Comments, phrases
   (as in addresses) and unstructureds (as in Subject headers) MAY use
   the full range of UTF-8 characters. For newsgroup-names see 5.5.

   Where the use of non-ASCII characters, encoded in UTF-8, is permitted
   as above, they MAY also be encoded using the Mime mechanism defined
   in [RFC 2047], but this usage is deprecated within news articles
   (even though it is required in mail messages) since it is less
   legible in older reading agents which support neither it nor UTF-8.
   Nevertheless, reading agents SHOULD support this usage, but only in
   those contexts explicitly mentioned in [RFC 2047].
[< Prev] [TOC] [ Next >]
#Diff to first older
NewerOlder
News Article Format and Transmission May 2004
News Article Format and Transmission November 2003
News Article Format June 2003
News Article Format April 2003
News Article Format February 2003
News Article Format August 2002
News Article Format May 2002
News Article Format November 2001
News Article Format July 2001
News Article Format February 2000

--- ../usefor-article-03/Character_Sets_within_Article_Headers.out          February 2000
+++ ../usefor-article-04/Character_Sets_within_Article_Headers.out          April 2001
@@ -1,11 +1,12 @@
 4.4.1.  Character Sets within Article Headers
 
-   Within article headers, the CES is UTF-8 [ISO 10646] or [RFC 2279]
-   and hence the CCS is the Universal Multiple-Octet Coded Character Set
-   (UCS) [ISO 10646] (which is essentially a superset of Unicode
-   [UNICODE] and expected to remain so). However, interpreting the
-   octets directly as US-ASCII characters should ensure correct
-   behaviour in most situations.
+   Within article headers, characters are represented as octets
+   according to the UTF-8 encoding scheme [ISO 10646] or [RFC 2279] and
+   hence all the characters in the Universal Multiple-Octet Coded
+   Character Set (UCS) [ISO 10646] (which is essentially a superset of
+   Unicode [UNICODE] and expected to remain so) are potentially
+   available. However, interpreting the octets directly as US-ASCII
+   characters should ensure correct behaviour in most situations.
 
         NOTE: UTF-8 is an encoding for 16bit (and even 32bit) character
         sets with the property that any octet less than 128 immediately


Documents were processed to this format by Forrest J. Cavalier III