usefor-article-06 November 2001
[< Prev]
[TOC] [ Next >]
2.4. Syntax Notation
This standard uses the Augmented Backus Naur Form described in [RFC
2234]. A discussion of this is outside the bounds of this standard,
but it is expected that implementors will be able quickly to
understand it with reference to that defining document.
Much of the syntax of News Articles is based on the corresponding
syntax defined in [RFC 2822] or in the MIME specifications [RFC 2045]
et seq, which is deemed to have been incorporated into this standard
as required. However, there are some important differences arising
from the fact that [RFC 2822] does not recognise anything other than
US-ASCII characters, that it does not recognise the MIME headers [RFC
2045], and that it includes much syntax described as "obsolete".
NOTE: News parsers historically have been much less permissive
than Mail parsers, and this is reflected in the modifications
referred to, and in some further specific rules.
The following syntactic forms therefore supersede the corresponding
rules given in [RFC 2822] and [RFC 2045], thus allowing UTF-8
characters [RFC 2279] to appear in certain contexts (the five rules
begining with "strict-" reflect the corresponding original rules from
[RFC 2822]).
UTF8-xtra-2-head= %xC2-DF
UTF8-xtra-3-head= %xE0 %xA0-BF / %xE1-EC %x80-BF /
%xED %x80-9F / %xEE-EF %x80-BF
UTF8-xtra-4-head= %xF0 %x90-BF / %xF1-F7 %x80-BF
UTF8-xtra-5-head= %xF8 %x88-BF / %xF9-FB %x80-BF
UTF8-xtra-6-head= %xFC %x84-BF / %xFD %x80-BF
UTF8-xtra-tail = %x80-BF
UTF8-xtra-char = UTF8-xtra-2-head 1( UTF8-xtra-tail ) /
UTF8-xtra-3-head 1( UTF8-xtra-tail ) /
UTF8-xtra-4-head 2( UTF8-xtra-tail ) /
UTF8-xtra-5-head 3( UTF8-xtra-tail ) /
UTF8-xtra-6-head 4( UTF8-xtra-tail )
text = %d1-9 / ; all UTF-8 characters except
%d11-12 / ; US-ASCII NUL, CR and LF
%d14-127 /
UTF8-xtra-char
ctext = NO-WS-CTL / ; all of <text> except
%d33-39 / ; SP, HTAB, "(", ")"
%d42-91 / ; and "\"
%d93-126 /
UTF8-xtra-char
qtext = NO-WS-CTL / ; all of <text> except
%d33 / ; SP, HTAB, "\" and DQUOTE
%d35-91 /
%d93-126 /
UTF8-xtra-char
utext = NO-WS-CTL / ; Non white space controls
%d33-126 / ; The rest of US-ASCII
UTF8-xtra-char
strict-text = %d1-9 / ; text restricted to
%d11-12 / ; US-ASCII
%d14-127
strict-qtext = NO-WS-CTL / ; qtext restricted to
%d33 / ; US-ASCII
%d35-91 /
%d93-127
strict-quoted-pair
= "\" strict-text
strict-qcontent = strict-qtext / strict-quoted-pair
strict-quoted-string
= [CFWS]
DQUOTE *([FWS] strict-qcontent) [FWS] DQUOTE
[CFWS]
The syntax for UTF8-xtra-char excludes those redundant sequences of
octets which cannot occur in UTF-8, as defined by [RFC 2279], either
because they would not be the shortest possible encodings of some UCS
character, or they would represent one of the characters D800 through
DFFF, disallowed in UCS because of their surrogate use in the UTF-16
encoding. These sequences MUST NOT be generated by posting agents.
Where they occur inadavertently, they MAY be passed on untouched by
other agents, but they MUST NOT ever be interpreted as valid
characters.
Wherever in this standard the syntax is stated to be taken from [RFC
2822], it is to be understood as the syntax defined by [RFC 2822]
after making the above changes, but NOT including any syntax defined
in section 4 ("Obsolete syntax") of [RFC 2822]. Software compliant
with this standard MUST NOT generate any of the syntactic forms
defined in that Obsolete Syntax, although it MAY accept such
syntactic forms. Certain syntax from the MIME specifications [RFC
2045] et seq is also considered a part of this standard (see 6.21).
The following syntactic forms, taken from [RFC 2234] or from [RFC
2822], are repeated here for convenience only:
ALPHA = %x41-5A / ; A-Z
%x61-7A ; a-z
CR = %x0D ; carriage return
CRLF = CR LF
DIGIT = %x30-39 ; 0-9
HTAB = %x09 ; horizontal tab
LF = %x0A ; line feed
SP = %x20 ; space
NO-WS-CTL = %d1-8 / ; US-ASCII control characters
%d11 / ; which do not include the
%d12 / ; carriage return, line feed,
%d14-31 / ; and whitespace characters
%d127
specials = "(" / ")" / ; Special characters used in
"<" / ">" / ; other parts of the syntax
"[" / "]" /
":" / ";" /
"@" / "
"," / "." /
DQUOTE
WSP = SP / HTAB ; Whitespace characters
FWS = ([*WSP CRLF] 1*WSP); Folding whitespace
ccontent = ctext / quoted-pair / comment
comment = "(" *([FWS] ccontent) [FWS] ")"
CFWS = *([FWS] comment) (([FWS] comment) / FWS )
DQUOTE = %d34 ; quote mark
quoted-pair = "\" text
atext = ALPHA / DIGIT /
"!" / "#" / ; Any character except
"$" / "%" / ; controls, SP, and specials.
"&" / "'" / ; Used for atoms
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "}" /
"|" / "}" /
"~"
atom = [CFWS] 1*atext [CFWS]
dot-atom = [CFWS] dot-atom-text [CFWS]
dot-atom-text = 1*atext *( "." 1*atext )
qcontent = qtext / quoted-pair
quoted-string = [CFWS]
DQUOTE *([FWS] qcontent) [FWS] DQUOTE
[CFWS]
word = atom / quoted-string
phrase = 1*word
unstructured = *( [FWS] utext ) [FWS]
NOTE: CFWS occurs at many places in the syntax in order to allow
comments and extra whitespace to be inserted almost anywhere.
The syntax is in fact ambiguous insofar as it may be impossible
to tell in which of several possible ways a given comment or WS
was produced. However, this does not lead to semantic ambiguity
because, unless specifically stated otherwise, the presence of
absence of a comment or additional WS has no semantic meaning
and, in particular, it is a matter of indifference whether it
forms a part of the syntactic construct preceding it or the one
following it.
NOTE: Following [RFC 2234], literal text included in the syntax
is to be regarded as case-insensitive. However, in
contradistinction to [RFC 2822], the Netnews protocols are
sensitive to case in some instances (as in newsgroup names, some
header parameters, etc.). Care has been taken to indicate this
explicitly where required.
The complete syntax defined in this standard is repeated, for
convenience, in Appendix B.
[< Prev]
[TOC] [ Next >]
#Diff to first older
--- ../usefor-article-05/Syntax_Notation.out July 2001
+++ ../usefor-article-06/Syntax_Notation.out November 2001
@@ -25,7 +25,7 @@
UTF8-xtra-2-head= %xC2-DF
UTF8-xtra-3-head= %xE0 %xA0-BF / %xE1-EC %x80-BF /
- = %xED %x80-9F / %xEE-EF %x80-BF
+ %xED %x80-9F / %xEE-EF %x80-BF
UTF8-xtra-4-head= %xF0 %x90-BF / %xF1-F7 %x80-BF
UTF8-xtra-5-head= %xF8 %x88-BF / %xF9-FB %x80-BF
UTF8-xtra-6-head= %xFC %x84-BF / %xFD %x80-BF