usefor-article-07 May 2002
[< Prev]
[TOC] [ Next >]
2.4.1. Syntax Notation
This standard uses the Augmented Backus Naur Form described in [RFC
2234].
In particular, it makes significant use of the "incremental
alternative" feature of that notation. For example, the two rules
header = other-header
header =/ Date-header
are equivalent to the single rule
header = other-header / Date-header
[< Prev]
[TOC] [ Next >]
#Diff to first older
--- ../usefor-article-06/Syntax_Notation.out November 2001
+++ ../usefor-article-07/Syntax_Notation.out May 2002
@@ -1,160 +1,12 @@
-2.4. Syntax Notation
+2.4.1. Syntax Notation
This standard uses the Augmented Backus Naur Form described in [RFC
- 2234]. A discussion of this is outside the bounds of this standard,
- but it is expected that implementors will be able quickly to
- understand it with reference to that defining document.
+ 2234].
- Much of the syntax of News Articles is based on the corresponding
- syntax defined in [RFC 2822] or in the MIME specifications [RFC 2045]
- et seq, which is deemed to have been incorporated into this standard
- as required. However, there are some important differences arising
- from the fact that [RFC 2822] does not recognise anything other than
- US-ASCII characters, that it does not recognise the MIME headers [RFC
- 2045], and that it includes much syntax described as "obsolete".
-
- NOTE: News parsers historically have been much less permissive
- than Mail parsers, and this is reflected in the modifications
- referred to, and in some further specific rules.
-
- The following syntactic forms therefore supersede the corresponding
- rules given in [RFC 2822] and [RFC 2045], thus allowing UTF-8
- characters [RFC 2279] to appear in certain contexts (the five rules
- begining with "strict-" reflect the corresponding original rules from
- [RFC 2822]).
-
- UTF8-xtra-2-head= %xC2-DF
- UTF8-xtra-3-head= %xE0 %xA0-BF / %xE1-EC %x80-BF /
- %xED %x80-9F / %xEE-EF %x80-BF
- UTF8-xtra-4-head= %xF0 %x90-BF / %xF1-F7 %x80-BF
- UTF8-xtra-5-head= %xF8 %x88-BF / %xF9-FB %x80-BF
- UTF8-xtra-6-head= %xFC %x84-BF / %xFD %x80-BF
- UTF8-xtra-tail = %x80-BF
- UTF8-xtra-char = UTF8-xtra-2-head 1( UTF8-xtra-tail ) /
- UTF8-xtra-3-head 1( UTF8-xtra-tail ) /
- UTF8-xtra-4-head 2( UTF8-xtra-tail ) /
- UTF8-xtra-5-head 3( UTF8-xtra-tail ) /
- UTF8-xtra-6-head 4( UTF8-xtra-tail )
- text = %d1-9 / ; all UTF-8 characters except
- %d11-12 / ; US-ASCII NUL, CR and LF
- %d14-127 /
- UTF8-xtra-char
- ctext = NO-WS-CTL / ; all of <text> except
- %d33-39 / ; SP, HTAB, "(", ")"
- %d42-91 / ; and "\"
- %d93-126 /
- UTF8-xtra-char
- qtext = NO-WS-CTL / ; all of <text> except
- %d33 / ; SP, HTAB, "\" and DQUOTE
- %d35-91 /
- %d93-126 /
- UTF8-xtra-char
- utext = NO-WS-CTL / ; Non white space controls
- %d33-126 / ; The rest of US-ASCII
- UTF8-xtra-char
- strict-text = %d1-9 / ; text restricted to
- %d11-12 / ; US-ASCII
- %d14-127
- strict-qtext = NO-WS-CTL / ; qtext restricted to
- %d33 / ; US-ASCII
- %d35-91 /
- %d93-127
- strict-quoted-pair
- = "\" strict-text
- strict-qcontent = strict-qtext / strict-quoted-pair
- strict-quoted-string
- = [CFWS]
- DQUOTE *([FWS] strict-qcontent) [FWS] DQUOTE
- [CFWS]
-
- The syntax for UTF8-xtra-char excludes those redundant sequences of
- octets which cannot occur in UTF-8, as defined by [RFC 2279], either
- because they would not be the shortest possible encodings of some UCS
- character, or they would represent one of the characters D800 through
- DFFF, disallowed in UCS because of their surrogate use in the UTF-16
- encoding. These sequences MUST NOT be generated by posting agents.
- Where they occur inadavertently, they MAY be passed on untouched by
- other agents, but they MUST NOT ever be interpreted as valid
- characters.
-
- Wherever in this standard the syntax is stated to be taken from [RFC
- 2822], it is to be understood as the syntax defined by [RFC 2822]
- after making the above changes, but NOT including any syntax defined
- in section 4 ("Obsolete syntax") of [RFC 2822]. Software compliant
- with this standard MUST NOT generate any of the syntactic forms
- defined in that Obsolete Syntax, although it MAY accept such
- syntactic forms. Certain syntax from the MIME specifications [RFC
- 2045] et seq is also considered a part of this standard (see 6.21).
-
- The following syntactic forms, taken from [RFC 2234] or from [RFC
- 2822], are repeated here for convenience only:
- ALPHA = %x41-5A / ; A-Z
- %x61-7A ; a-z
- CR = %x0D ; carriage return
- CRLF = CR LF
- DIGIT = %x30-39 ; 0-9
- HTAB = %x09 ; horizontal tab
- LF = %x0A ; line feed
- SP = %x20 ; space
- NO-WS-CTL = %d1-8 / ; US-ASCII control characters
- %d11 / ; which do not include the
- %d12 / ; carriage return, line feed,
- %d14-31 / ; and whitespace characters
- %d127
- specials = "(" / ")" / ; Special characters used in
- "<" / ">" / ; other parts of the syntax
- "[" / "]" /
- ":" / ";" /
- "@" / "
- "," / "." /
- DQUOTE
- WSP = SP / HTAB ; Whitespace characters
- FWS = ([*WSP CRLF] 1*WSP); Folding whitespace
- ccontent = ctext / quoted-pair / comment
- comment = "(" *([FWS] ccontent) [FWS] ")"
- CFWS = *([FWS] comment) (([FWS] comment) / FWS )
- DQUOTE = %d34 ; quote mark
- quoted-pair = "\" text
- atext = ALPHA / DIGIT /
- "!" / "#" / ; Any character except
- "$" / "%" / ; controls, SP, and specials.
- "&" / "'" / ; Used for atoms
- "*" / "+" /
- "-" / "/" /
- "=" / "?" /
- "^" / "_" /
- "`" / "}" /
- "|" / "}" /
- "~"
- atom = [CFWS] 1*atext [CFWS]
- dot-atom = [CFWS] dot-atom-text [CFWS]
- dot-atom-text = 1*atext *( "." 1*atext )
- qcontent = qtext / quoted-pair
- quoted-string = [CFWS]
- DQUOTE *([FWS] qcontent) [FWS] DQUOTE
- [CFWS]
- word = atom / quoted-string
- phrase = 1*word
- unstructured = *( [FWS] utext ) [FWS]
-
- NOTE: CFWS occurs at many places in the syntax in order to allow
- comments and extra whitespace to be inserted almost anywhere.
- The syntax is in fact ambiguous insofar as it may be impossible
- to tell in which of several possible ways a given comment or WS
- was produced. However, this does not lead to semantic ambiguity
- because, unless specifically stated otherwise, the presence of
- absence of a comment or additional WS has no semantic meaning
- and, in particular, it is a matter of indifference whether it
- forms a part of the syntactic construct preceding it or the one
- following it.
-
- NOTE: Following [RFC 2234], literal text included in the syntax
- is to be regarded as case-insensitive. However, in
- contradistinction to [RFC 2822], the Netnews protocols are
- sensitive to case in some instances (as in newsgroup names, some
- header parameters, etc.). Care has been taken to indicate this
- explicitly where required.
-
- The complete syntax defined in this standard is repeated, for
- convenience, in Appendix B.
+ In particular, it makes significant use of the "incremental
+ alternative" feature of that notation. For example, the two rules
+ header = other-header
+ header =/ Date-header
+ are equivalent to the single rule
+ header = other-header / Date-header