usefor-article-07 May 2002
[< Prev]
[TOC] [ Next >]
2.4.2. Syntax adapted from Email and MIME
Much of the syntax of Netnews Articles is based on the corresponding
syntax defined in [RFC 2822] or in the MIME specifications [RFC 2045]
et seq, which are deemed to have been incorporated into this standard
as required. However, there are some important differences arising
from the fact that [RFC 2822] does not recognize anything other than
US-ASCII characters, that it does not recognize the MIME headers [RFC
2045], and that it includes much syntax described as "obsolete"
(which is excluded from this standard, as detailed below).
NOTE: Netnews parsers historically have been much less
permissive than Email parsers, and this is reflected in the
modifications referred to, and in some further specific rules.
The following syntactic rules therefore supersede the corresponding
rules given in [RFC 2822] and [RFC 2045], thus allowing UTF-8
characters [RFC 2279] to appear in certain contexts (the five rules
begining with "strict-" reflect the corresponding original rules from
[RFC 2822]).
UTF8-xtra-2-head= %xC2-DF
UTF8-xtra-3-head= %xE0 %xA0-BF / %xE1-EC %x80-BF /
%xED %x80-9F / %xEE-EF %x80-BF
UTF8-xtra-4-head= %xF0 %x90-BF / %xF1-F7 %x80-BF
UTF8-xtra-5-head= %xF8 %x88-BF / %xF9-FB %x80-BF
UTF8-xtra-6-head= %xFC %x84-BF / %xFD %x80-BF
UTF8-xtra-tail = %x80-BF
UTF8-xtra-char = UTF8-xtra-2-head 1( UTF8-xtra-tail ) /
UTF8-xtra-3-head 1( UTF8-xtra-tail ) /
UTF8-xtra-4-head 2( UTF8-xtra-tail ) /
UTF8-xtra-5-head 3( UTF8-xtra-tail ) /
UTF8-xtra-6-head 4( UTF8-xtra-tail )
text = %d1-9 / ; all UTF-8 characters except
%d11-12 / ; US-ASCII NUL, CR and LF
%d14-127 /
UTF8-xtra-char
ctext = NO-WS-CTL / ; all of <text> except
%d33-39 / ; SP, HTAB, "(", ")"
%d42-91 / ; "\" and DEL
%d93-126 /
UTF8-xtra-char
qtext = NO-WS-CTL / ; all of <text> except
%d33 / ; SP, HTAB, "\" DQUOTE
%d35-91 / ; and DEL
%d93-126 /
UTF8-xtra-char
utext = NO-WS-CTL / ; Non white space controls
%d33-126 / ; The rest of UTF-8
UTF8-xtra-char
strict-text = %d1-9 / ; text restricted to
%d11-12 / ; US-ASCII
%d14-127
strict-qtext = NO-WS-CTL / ; qtext restricted to
%d33 / ; US-ASCII
%d35-91 /
%d93-126
strict-quoted-pair
= "\" strict-text
strict-qcontent = strict-qtext / strict-quoted-pair
strict-quoted-string
= [CFWS] DQUOTE
*( [FWS] strict-qcontent ) [FWS]
DQUOTE [CFWS]
unstructured = 1*( [FWS] utext ) [FWS]
The syntax for UTF8-xtra-char excludes those redundant sequences of
octets which cannot occur in UTF-8, as defined by [RFC 2279], either
because they would not be the shortest possible encodings of some UCS
character [ISO/IEC 10646], or they would represent one of the
characters D800 through DFFF, disallowed in UCS because of their
surrogate use in the UTF-16 encoding. These sequences MUST NOT be
generated by posting agents. Where they occur inadadvertently, they
MAY be passed on untouched by other agents, but they MUST NOT ever be
interpreted as valid characters.
Observe, in contradistinction to [RFC 2822], that an unstructured
MUST contain at least one non-whitespace character (see also remarks
about empty headers in 4.2.6).
Wherever in this standard the syntax is stated to be taken from [RFC
2822], it is to be understood as the syntax defined by [RFC 2822]
after making the above changes, but NOT including any syntax defined
in section 4 ("Obsolete syntax") of [RFC 2822]. Software compliant
with this standard MUST NOT generate any of the syntactic forms
defined in that Obsolete Syntax, although it MAY accept such
syntactic forms. Certain syntax from the MIME specifications [RFC
2045] et seq is also considered a part of this standard (see 6.21).
[< Prev]
[TOC] [ Next >]
#Diff to first older