usefor-article-03 February 2000
[< Prev]
[TOC] [ Next >]
2.4. Syntax Notation
This standard uses the Augmented Backus Naur Form described in [RFC
2234]. A discussion of this is outside the bounds of this standard,
but it is expected that implementors will be able to quickly
understand it with reference to the defining document.
Much of the syntax of News Articles is based on the corresponding
syntax defined in [MESSFOR] or in the Mime specifications [RFC 2045]
et seq, which is deemed to have been incorporated into this standard
as required. However, there are some important differences arising
from the fact that [MESSFOR] does not recognise anything other than
US-ASCII characters, that it does not recognise the MIME headers [RFC
2045], and that it includes much syntax described as "obsolete".
NOTE: News parsers historically have been much less permissive
than Mail parsers, and this is reflected in the modifications
referred to, and in some further specific rules.
The following syntactic forms therefore supersede the corresponding
rules given in [MESSFOR] and [RFC 2045], thus allowing UTF-8
characters [RFC 2044] to appear in certain contexts (the four rules
begining with "strict-" reflect the corresponding original rules from
[MESSFOR]).
UTF8-xtra-head = %d192-253
UTF8-xtra-tail = %d128-191
UTF8-xtra-char = UTF8-xtra-head 1*UTF8-xtra-tail
text = %d1-9 / ; all UTF-8 characters except
%d11-12 / ; US-ASCII NUL, CR and LF
%d14-127 /
UTF8-xtra-char
ctext = NO-WS-CTL / ; all of <text> except
%d33-39 / ; SP, HTAB, "(", ")"
%d42-91 / ; and "\"
%d93-126 /
UTF8-xtra-char
qtext = NO-WS-CTL / ; all of <text> except
%d33 / ; SP, HTAB, "\" and DQUOTE
%d35-91 /
%d93-126 /
UTF8-xtra-char
utext = NO-WS-CTL / ; Non white space controls
%d33-126 / ; The rest of US-ASCII
UTF8-xtra-char
strict-text = %d1-9 / ; text restricted to
%d11-12 / ; US-ASCII
%d14-127
strict-qtext = NO-WS-CTL / ; qtext restricted to
%d33 / ; US-ASCII
%d35-91 /
%d93-127
strict-quoted-pair
= "\" strict-text
strict-quoted-string
= [CFWS] DQUOTE
*([FWS] (strict-qtext / strict-quoted-pair))
[FWS] DQUOTE [CFWS]
NOTE: There are sequences of octets which cannot legitimately
occur in UTF-8, even a few permitted by the above syntax. These
SHOULD NOT be generated by posting agents but, where they occur
inadavertently, they SHOULD be passed on untouched by other
agents.
Wherever in this standard the syntax is stated to be taken from
[MESSFOR], it is to be understood as the syntax defined by [MESSFOR]
after making the above changes, but NOT including any syntax defined
in section 4 ("Obsolete syntax") of [MESSFOR]. Software compliant
with this standard MUST NOT generate any of the syntactic forms
defined in that Obsolete Syntax, although it MAY accept such
syntactic forms. Certain syntax from the MIME specifications [RFC
2045] et seq is also considered a part of this standard (see 6.17).
The following syntactic forms, taken from [RFC 2234] or from
[MESSFOR], are repeated here for convenience only:
ALPHA = %x41-5A / ; A-Z
%x61-7A ; a-z
CR = %x0D ; carriage return
CRLF = CR LF
DIGIT = %x30-39 ; 0-9
HTAB = %x09 ; horizontal tab
LF = %x0A ; line feed
SP = %x20 ; space
NO-WS-CTL = %d1-8 / ; US-ASCII control characters
%d11 / ; which do not include the
%d12 / ; carriage return, line feed,
%d14-31 / ; and whitespace characters
%d127
WSP = SP / HTAB ; Whitespace characters
FWS = ([*WSP CRLF] 1*WSP); Folding whitespace
atext = ALPHA / DIGIT /
"!" / "#" / ; Any character except
"$" / "%" / ; controls SP, and specials.
"&" / "'" / ; Used for atoms
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "}" /
"|" / "}" /
"~"
atom = [CFWS] 1*atext [CFWS]
dot-atom = [CFWS] dot-atom-text [CFWS]
dot-atom-text = 1*atext *( "." 1*atext )
comment = "(" *([FWS]
(ctext / quoted-pair / comment)) [FWS] ")"
CFWS = *([FWS] comment) (([FWS] comment) / FWS )
DQUOTE = %d34 ; quote mark
quoted-pair = "\" text
quoted-string = [CFWS] DQUOTE
*([FWS] (qtext / quoted-pair))
[FWS] DQUOTE [CFWS]
unstructured = *( [FWS] utext ) [FWS]
NOTE: CFWS occurs at many places in the syntax in order to allow
comments and extra whitespace to be inserted almost anywhere.
The syntax is in fact ambiguous insofar as it may be impossible
to tell in which of several possible ways a given comment or WS
was produced. However, this does not lead to semantic ambiguity
because, unless specifically stated otherwise, the presence of
absence of a comment or additional WS has no semantic meaning
and, in particular, it is a matter of indifference whether it
forms a part of the syntactic construct preceding it or the one
following it.
NOTE: Following [RFC 2234], literal text included in the syntax
is to be regarded as case-insensitive. However, in
contradistinction to [MESSFOR], the Netnews protocols are
sensitive to case in some instances (as in newsgroup names, some
header parameters, etc.). Care has been taken to indicate this
explicitly where required.
[< Prev]
[TOC] [ Next >]
#Diff to first older
--- ../s-o-1036/Syntax_Notation.out June 1994
+++ ../usefor-article-03/Syntax_Notation.out February 2000
@@ -1,47 +1,134 @@
-2.2. Syntax Notation
+2.4. Syntax Notation
-Although the mechanisms specified in this Draft are all
-described in prose, most are also described formally in the
-modified BNF notation of RFC 822. Implementors will need to
-be familiar with this notation to fully understand this
-specification, and are referred to RFC 822 for a complete
-explanation of the modified BNF notation. Here is a brief
-illustrative example:
-
- sentence = clause *( punct clause ) "."
- punct = ":" / ";"
- clause = 1*word [ "(" clause ")" / "," 1*word ]
- word = <any English word>
-
-This defines a sentence as some clauses separated by puncts
-and ended by a period, a punct as a colon or semicolon, a
-clause as at least one <word> optionally followed by either
-a parenthesized clause or a comma and at least one more
-<word>, and a <word> as (informally) any English word. <>
-are used to enclose names when (and only when) distinguish-
-ing them from surrounding text is useful. The full form of
-the repetition notation is <m>"*"<n><thing>, denoting <m>
-through <n> repetitions of <thing>; <m> defaults to zero,
-<n> to infinity, and the "*" and <n> can be omitted if <m>
-and <n> are equal, so 1*word is one or more words, 1*5word
-is one through five words, and 2word is exactly two words.
-
-The character "\" is not special in any way in this nota-
-tion.
-
-This Draft is intended to be self-contained; all syntax
-rules used in it are defined within it, and a rule with the
-same name as one found in MAIL does not necessarily have the
-same definition. The lexical layer of MAIL is NOT, repeat
-NOT, used in this Draft, and its presence must not be
-assumed; notably, this Draft spells out all places where
-
-INTERNET DRAFT to be NEWS sec. 2.2
-
-
-white space is permitted/required and all places where con-
-structs resembling MAIL comments can occur.
-
- NOTE: News parsers historically have been much
- less permissive than MAIL parsers.
+ This standard uses the Augmented Backus Naur Form described in [RFC
+ 2234]. A discussion of this is outside the bounds of this standard,
+ but it is expected that implementors will be able to quickly
+ understand it with reference to the defining document.
+
+ Much of the syntax of News Articles is based on the corresponding
+ syntax defined in [MESSFOR] or in the Mime specifications [RFC 2045]
+ et seq, which is deemed to have been incorporated into this standard
+ as required. However, there are some important differences arising
+ from the fact that [MESSFOR] does not recognise anything other than
+ US-ASCII characters, that it does not recognise the MIME headers [RFC
+ 2045], and that it includes much syntax described as "obsolete".
+
+ NOTE: News parsers historically have been much less permissive
+ than Mail parsers, and this is reflected in the modifications
+ referred to, and in some further specific rules.
+
+ The following syntactic forms therefore supersede the corresponding
+ rules given in [MESSFOR] and [RFC 2045], thus allowing UTF-8
+ characters [RFC 2044] to appear in certain contexts (the four rules
+ begining with "strict-" reflect the corresponding original rules from
+ [MESSFOR]).
+
+ UTF8-xtra-head = %d192-253
+ UTF8-xtra-tail = %d128-191
+ UTF8-xtra-char = UTF8-xtra-head 1*UTF8-xtra-tail
+ text = %d1-9 / ; all UTF-8 characters except
+ %d11-12 / ; US-ASCII NUL, CR and LF
+ %d14-127 /
+ UTF8-xtra-char
+ ctext = NO-WS-CTL / ; all of <text> except
+ %d33-39 / ; SP, HTAB, "(", ")"
+ %d42-91 / ; and "\"
+ %d93-126 /
+ UTF8-xtra-char
+ qtext = NO-WS-CTL / ; all of <text> except
+ %d33 / ; SP, HTAB, "\" and DQUOTE
+ %d35-91 /
+ %d93-126 /
+ UTF8-xtra-char
+ utext = NO-WS-CTL / ; Non white space controls
+ %d33-126 / ; The rest of US-ASCII
+ UTF8-xtra-char
+ strict-text = %d1-9 / ; text restricted to
+ %d11-12 / ; US-ASCII
+ %d14-127
+ strict-qtext = NO-WS-CTL / ; qtext restricted to
+ %d33 / ; US-ASCII
+ %d35-91 /
+ %d93-127
+ strict-quoted-pair
+ = "\" strict-text
+ strict-quoted-string
+ = [CFWS] DQUOTE
+ *([FWS] (strict-qtext / strict-quoted-pair))
+ [FWS] DQUOTE [CFWS]
+
+ NOTE: There are sequences of octets which cannot legitimately
+ occur in UTF-8, even a few permitted by the above syntax. These
+ SHOULD NOT be generated by posting agents but, where they occur
+ inadavertently, they SHOULD be passed on untouched by other
+ agents.
+
+ Wherever in this standard the syntax is stated to be taken from
+ [MESSFOR], it is to be understood as the syntax defined by [MESSFOR]
+ after making the above changes, but NOT including any syntax defined
+ in section 4 ("Obsolete syntax") of [MESSFOR]. Software compliant
+ with this standard MUST NOT generate any of the syntactic forms
+ defined in that Obsolete Syntax, although it MAY accept such
+ syntactic forms. Certain syntax from the MIME specifications [RFC
+ 2045] et seq is also considered a part of this standard (see 6.17).
+
+ The following syntactic forms, taken from [RFC 2234] or from
+ [MESSFOR], are repeated here for convenience only:
+
+ ALPHA = %x41-5A / ; A-Z
+ %x61-7A ; a-z
+ CR = %x0D ; carriage return
+ CRLF = CR LF
+ DIGIT = %x30-39 ; 0-9
+ HTAB = %x09 ; horizontal tab
+ LF = %x0A ; line feed
+ SP = %x20 ; space
+ NO-WS-CTL = %d1-8 / ; US-ASCII control characters
+ %d11 / ; which do not include the
+ %d12 / ; carriage return, line feed,
+ %d14-31 / ; and whitespace characters
+ %d127
+ WSP = SP / HTAB ; Whitespace characters
+ FWS = ([*WSP CRLF] 1*WSP); Folding whitespace
+ atext = ALPHA / DIGIT /
+ "!" / "#" / ; Any character except
+ "$" / "%" / ; controls SP, and specials.
+ "&" / "'" / ; Used for atoms
+ "*" / "+" /
+ "-" / "/" /
+ "=" / "?" /
+ "^" / "_" /
+ "`" / "}" /
+ "|" / "}" /
+ "~"
+ atom = [CFWS] 1*atext [CFWS]
+ dot-atom = [CFWS] dot-atom-text [CFWS]
+ dot-atom-text = 1*atext *( "." 1*atext )
+ comment = "(" *([FWS]
+ (ctext / quoted-pair / comment)) [FWS] ")"
+ CFWS = *([FWS] comment) (([FWS] comment) / FWS )
+ DQUOTE = %d34 ; quote mark
+ quoted-pair = "\" text
+ quoted-string = [CFWS] DQUOTE
+ *([FWS] (qtext / quoted-pair))
+ [FWS] DQUOTE [CFWS]
+ unstructured = *( [FWS] utext ) [FWS]
+
+ NOTE: CFWS occurs at many places in the syntax in order to allow
+ comments and extra whitespace to be inserted almost anywhere.
+ The syntax is in fact ambiguous insofar as it may be impossible
+ to tell in which of several possible ways a given comment or WS
+ was produced. However, this does not lead to semantic ambiguity
+ because, unless specifically stated otherwise, the presence of
+ absence of a comment or additional WS has no semantic meaning
+ and, in particular, it is a matter of indifference whether it
+ forms a part of the syntactic construct preceding it or the one
+ following it.
+
+ NOTE: Following [RFC 2234], literal text included in the syntax
+ is to be regarded as case-insensitive. However, in
+ contradistinction to [MESSFOR], the Netnews protocols are
+ sensitive to case in some instances (as in newsgroup names, some
+ header parameters, etc.). Care has been taken to indicate this
+ explicitly where required.