usefor-article-09 February 2003

[< Prev] [TOC] [ Next >]
2.4.2.  Syntax adapted from Email and MIME

   Much of the syntax of Netnews Articles is based on the corresponding
   syntax defined in [RFC 2822] or in the MIME specifications [RFC 2045]
   et seq, which are deemed to have been incorporated into this standard
   as required. However, there are some important differences arising
   from the fact that [RFC 2822] does not recognize anything beyond US-
   ASCII characters, that it does not recognize the MIME headers [RFC
   2045], and that it includes much syntax described as "obsolete"
   (which is excluded from this standard, as detailed below).

        NOTE: Netnews parsers historically have been much less
        permissive than Email parsers, and this is reflected in the
        modifications referred to, and in some further specific rules.
   The following syntactic rules therefore supersede the corresponding
   rules given in [RFC 2822] and [RFC 2045], thus allowing UTF-8
   characters [RFC 2279] to appear in certain contexts (the five rules
   beginning with "strict-" reflect the corresponding original rules
   from [RFC 2822]).

      UTF8-2          = %xC2-DF UTF8-tail
      UTF8-3          = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2(UTF8-tail) /
              %xED %x80-9F UTF8-tail / %xEE-EF 2(UTF8-tail)
      UTF8-4          = %xF0 %x90-BF 2(UTF8-tail) / %xF1-F7 3(UTF8-tail)
      UTF8-5          = %xF8 %x88-BF 3(UTF8-tail) / %xF9-FB 4(UTF8-tail)
      UTF8-6          = %xFC %x84-BF 4(UTF8-tail) / %xFD 5(UTF8-tail)
      UTF8-tail       = %x80-BF
      UTF8-xtra-char  = UTF8-2 / UTF8-3 / UTF8-4 / UTF8-5 / UTF8-6
      text            = %d1-9 /            ; all UTF-8 characters except
              %d11-12 /          ; US-ASCII NUL, CR and LF
              %d14-127 /
              UTF8-xtra-char
      ctext           = NO-WS-CTL /        ; all of <text> except
              %d33-39 /          ; SP, HTAB, "(", ")"
              %d42-91 /          ; "\" and DEL
              %d93-126 /
              UTF8-xtra-char
      qtext           = NO-WS-CTL /        ; all of <text> except
              %d33 /             ; SP, HTAB, "\" DQUOTE
              %d35-91 /          ; and DEL
              %d93-126 /
              UTF8-xtra-char
      utext           = NO-WS-CTL /        ; Non white space controls
              %d33-126 /         ; The rest of UTF-8
              UTF8-xtra-char
      strict-text     = %d1-9 /            ; text restricted to
              %d11-12 /          ; US-ASCII
              %d14-127
      strict-qtext    = NO-WS-CTL /        ; qtext restricted to
              %d33 /             ; US-ASCII
              %d35-91 /
              %d93-126
      strict-quoted-pair
            = "\" strict-text
      strict-qcontent = strict-qtext / strict-quoted-pair
      strict-quoted-string
            = [CFWS] DQUOTE
                 *( [FWS] strict-qcontent ) [FWS]
                 DQUOTE [CFWS]
      unstructured    = 1*( [FWS] utext ) [FWS]

   The syntax for UTF8-xtra-char excludes those redundant sequences of
   octets which cannot occur in UTF-8, as defined by [RFC 2279], either
   because they would not be the shortest possible encodings of some UCS
   character [ISO/IEC 10646], or they would represent one of the
   characters D800 through DFFF, disallowed in UCS because of their
   surrogate use in the UTF-16 encoding.  These sequences MUST NOT be
   generated by posting agents. Where they occur inadvertently, they
   SHOULD be passed on untouched by other agents, but attempts to
   interpret them as malformed UTF-8 MUST NOT be made. However, if there
   is reason to suppose they are representations of some other character
   set they MAY, as suggested in section 4.4.1, be interpreted as such.
   The syntax also includes, for completeness, the cases UTF8-5 and
   UTF8-6 which cannot, in fact, arise in [UNICODE 3.2] (though they
   might conceivably arise in some future extension).

   Observe, in contradistinction to [RFC 2822], that an unstructured
   header MUST contain at least one non-whitespace character (see also
   remarks about empty headers in 4.2.6).

   Wherever in this standard the syntax is stated to be taken from [RFC
   2822], it is to be understood as the syntax defined by [RFC 2822]
   after making the above changes, but NOT including any syntax defined
   in section 4 ("Obsolete syntax") of [RFC 2822].  Software compliant
   with this standard MUST NOT generate any of the syntactic forms
   defined in that Obsolete Syntax, although it MAY accept such
   syntactic forms. Certain syntax from the MIME specifications [RFC
   2045] et seq is also considered a part of this standard (see 6.21).
[< Prev] [TOC] [ Next >]
#Diff to first older
NewerOlder
News Article Format and Transmission May 2004
News Article Format and Transmission November 2003
News Article Format June 2003
News Article Format April 2003
News Article Format August 2002
News Article Format May 2002

--- ../usefor-article-08/Syntax_adapted_from_Email_and_MIME.out          August 2002
+++ ../usefor-article-09/Syntax_adapted_from_Email_and_MIME.out          February 2003
@@ -4,32 +4,28 @@
    syntax defined in [RFC 2822] or in the MIME specifications [RFC 2045]
    et seq, which are deemed to have been incorporated into this standard
    as required. However, there are some important differences arising
-   from the fact that [RFC 2822] does not recognize anything other than
-   US-ASCII characters, that it does not recognize the MIME headers [RFC
+   from the fact that [RFC 2822] does not recognize anything beyond US-
+   ASCII characters, that it does not recognize the MIME headers [RFC
    2045], and that it includes much syntax described as "obsolete"
    (which is excluded from this standard, as detailed below).
 
         NOTE: Netnews parsers historically have been much less
         permissive than Email parsers, and this is reflected in the
         modifications referred to, and in some further specific rules.
-
    The following syntactic rules therefore supersede the corresponding
    rules given in [RFC 2822] and [RFC 2045], thus allowing UTF-8
    characters [RFC 2279] to appear in certain contexts (the five rules
    beginning with "strict-" reflect the corresponding original rules
    from [RFC 2822]).
-      UTF8-xtra-2-head= %xC2-DF
-      UTF8-xtra-3-head= %xE0 %xA0-BF / %xE1-EC %x80-BF /
-              %xED %x80-9F / %xEE-EF %x80-BF
-      UTF8-xtra-4-head= %xF0 %x90-BF / %xF1-F7 %x80-BF
-      UTF8-xtra-5-head= %xF8 %x88-BF / %xF9-FB %x80-BF
-      UTF8-xtra-6-head= %xFC %x84-BF / %xFD    %x80-BF
-      UTF8-xtra-tail  = %x80-BF
-      UTF8-xtra-char  = UTF8-xtra-2-head 1( UTF8-xtra-tail ) /
-              UTF8-xtra-3-head 1( UTF8-xtra-tail ) /
-              UTF8-xtra-4-head 2( UTF8-xtra-tail ) /
-              UTF8-xtra-5-head 3( UTF8-xtra-tail ) /
-              UTF8-xtra-6-head 4( UTF8-xtra-tail )
+
+      UTF8-2          = %xC2-DF UTF8-tail
+      UTF8-3          = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2(UTF8-tail) /
+              %xED %x80-9F UTF8-tail / %xEE-EF 2(UTF8-tail)
+      UTF8-4          = %xF0 %x90-BF 2(UTF8-tail) / %xF1-F7 3(UTF8-tail)
+      UTF8-5          = %xF8 %x88-BF 3(UTF8-tail) / %xF9-FB 4(UTF8-tail)
+      UTF8-6          = %xFC %x84-BF 4(UTF8-tail) / %xFD 5(UTF8-tail)
+      UTF8-tail       = %x80-BF
+      UTF8-xtra-char  = UTF8-2 / UTF8-3 / UTF8-4 / UTF8-5 / UTF8-6
       text            = %d1-9 /            ; all UTF-8 characters except
               %d11-12 /          ; US-ASCII NUL, CR and LF
               %d14-127 /
@@ -74,6 +70,9 @@
    interpret them as malformed UTF-8 MUST NOT be made. However, if there
    is reason to suppose they are representations of some other character
    set they MAY, as suggested in section 4.4.1, be interpreted as such.
+   The syntax also includes, for completeness, the cases UTF8-5 and
+   UTF8-6 which cannot, in fact, arise in [UNICODE 3.2] (though they
+   might conceivably arise in some future extension).
 
    Observe, in contradistinction to [RFC 2822], that an unstructured
    header MUST contain at least one non-whitespace character (see also


Documents were processed to this format by Forrest J. Cavalier III