usefor-article-03 February 2000
[< Prev]
[TOC] [ Next >]
5.5. Newsgroups
The Newsgroups header's content specifies which newsgroup(s) the
article is posted to. It is an inheritable header (4.2.2.2) which
SHOULD then become the default Newsgroups header of any followup,
unless a Followup-To header is present to prescribe otherwise.
Newsgroups-content = newsgroup-name
*( *FWS ng-delim *FWS newsgroup-name )
*FWS
newsgroup-name = component *( "." component )
component = component-start
*( component-start / component-other )
component-start = Un-lowercase / Un-digit
Un-lowercase = <Unicode Letter, Lowercase> /
<Unicode Letter, Other>
Un-digit = <Unicode Number, Decimal Digit> /
<Unicode Number, Other>
component-other = "+" / "-" / "_"
ng-delim = ","
where the <Unicode ...> items are as described in [UNICODE].
The inclusion of folding white space within a Newsgroups-content is a
newly introduced feature in this standard. It MUST be accepted by all
conforming implementations (relaying agents, serving agents and
reading agents). Posting agents should be aware that such postings
may be rejected by overly-critical old-style relaying agents. When a
sufficient number of relaying agents are in conformance, posting
agents SHOULD generate such whitespace in the form of <CRLF WS> so as
to keep the length of lines in the relevant headers (notably
Newsgroups and Followup-To) to no more than than 79 characters (or
other agreed policy limit - see 4.5). Before such critical mass
occurs, injecting agents MAY reformat such headers by removing
whitespace inserted by the posting agent, but relaying agents MUST
NOT do so.
A newsgroup-name consists of one or more components. Components MAY
contain non-ASCII letters, but these MUST be encoded in UTF-8 and not
according to [RFC 2047]. A component MUST contain at least one
letter (and MUST, according to the syntax, begin with a letter or
digit). Components SHOULD begin with a letter. Composite characters
(made by overlaying one character with another) and format
characters, as allowed in certain parts of Unicode and needed by
certain languages, must use whatever canonical conventions apply to
those parts of Unicode (such conventions are not defined in this
Standard). The use of "_" in a component is deprecated. Serving
agents MAY refuse to accept newsgroups using such a component.
NOTE: Components composed entirely of digits would cause
problems for the commonly used implementation technique of using
the component as the name of a directory, whilst also using
sequential numbers to distinguish the articles within a group.
Components containing other non-permitted characters could cause
problems when newsgroup-names appear in URLs [RFC 1738] (for
example an '@' character would prevent distinguishing between
newsgroup-names and message identifiers).
NOTE: According to the syntax, uppercase letters cannot occur in
newsgroup-names, but this standard imposes no requirement on
software to check this condition, since it would be unreasonable
to expect it to do so in parts of Unicode for which it was not
configured (in general, a table lookup is required). Rather, it
is the responsibility of those creating new newsgroups (7.1) not
to violate it. It is, moreover, to be expected that a newsgroup
created in violation of this condition will not be propagated
particularly well.
Whilst there is no longer any technical reason to limit the length of
a component (formerly, it was limited to 14 characters) nor to limit
the total length of a newsgroup-name, it should be noted that these
names are also used in the newsgroups line (7.1.2) where an overall
policy limit applies, and moreover excessively long names can be
exceedingly inconvenient in practical use. Agencies responsible for
individual hierarchies SHOULD therefore, as a matter of policy, set
reasonable limits for the length of a component and of a newsgroup-
name. In the absence of such explicit policies, the default figures
are 30 characters and 71 characters respectively.
[If the checkpolicies proposal is included in the Standard, there should
be a reference to it here.]
NOTE: The newsgroup-name as encoded in UTF-8 should be regarded
as the canonical form. Reading agents may convert it to whatever
character set they are able to display (see 4.4.1) and serving
agents may possibly need to convert it to some form more
suitable as a filename. Simple algorithms for both kinds of
conversion are readily available. Observe that the syntax does
not allow comments within the Newsgroups header; this is to
simplify processing by relaying and serving agents which have a
requirement to process this header extremely rapidly.
Posters SHOULD use only the names of existing newsgroups in the
Newsgroups header. However, it is legitimate to cross-post to
newsgroup(s) which do not exist on the posting agent's host, provided
that at least one of the newsgroups DOES exist there, and followup
agents SHOULD accept this (posting agents MAY accept it, but SHOULD
at least alert the poster to the situation and request confirmation).
Relaying agents MUST NOT rewrite Newsgroups headers in any way, even
if some or all of the newsgroups do not exist on the relaying agent's
host. Serving agents MUST NOT create new newsgroups simply because an
unrecognised newsgroup-name occurs in a Newsgroups header (see 7.1
for the correct method of newsgroup creation).
The Newsgroups header is intended for use in Netnews articles rather
than in mail messages. It MAY be used in a mail message to indicate
that it is a copy also posted to the listed newsgroups, but it SHOULD
NOT be used in a mail-only reply to a Netnews article (thus the
"inheritable" property of this header applies only to followups to a
newsgroup, and not to followups to the poster). Moreover, if a
newsgroup-name contains any non-ASCII character, it MAY be encoded
using the mechanism defined in [RFC 2047] when sent by mail but, if
it is subsequently returned to the Netnews environment, it MUST then
be re-encoded into UTF-8.
[< Prev]
[TOC] [ Next >]
#Diff to first older
--- ../s-o-1036/Newsgroups.out June 1994
+++ ../usefor-article-03/Newsgroups.out February 2000
@@ -1,272 +1,112 @@
5.5. Newsgroups
-The Newsgroups header's content specifies which newsgroup(s)
-the article is posted to:
-
- Newsgroups-content = newsgroup-name *( ng-delim newsgroup-name )
- newsgroup-name = plain-component *( "." component )
- component = plain-component / encoded-word
- plain-component = component-start *13component-rest
- component-start = lowercase / digit
- lowercase = <letter a-z>
- component-rest = component-start / "+" / "-" / "_"
+ The Newsgroups header's content specifies which newsgroup(s) the
+ article is posted to. It is an inheritable header (4.2.2.2) which
+ SHOULD then become the default Newsgroups header of any followup,
+ unless a Followup-To header is present to prescribe otherwise.
+
+ Newsgroups-content = newsgroup-name
+ *( *FWS ng-delim *FWS newsgroup-name )
+ *FWS
+ newsgroup-name = component *( "." component )
+ component = component-start
+ *( component-start / component-other )
+ component-start = Un-lowercase / Un-digit
+ Un-lowercase = <Unicode Letter, Lowercase> /
+ <Unicode Letter, Other>
+ Un-digit = <Unicode Number, Decimal Digit> /
+ <Unicode Number, Other>
+ component-other = "+" / "-" / "_"
ng-delim = ","
+ where the <Unicode ...> items are as described in [UNICODE].
-Encoded words used in newsgroup names MUST not contain char-
-acters other than letters, digits, "+", "-", "/", "_", "=",
-and "?" (although they may encode them).
-
-A newsgroup name consists of one or more components, which
-may be plain components or (except for the first) encoded
-words. A plain component MUST contain at least one letter,
-MUST begin with a letter or digit, and MUST not be longer
-than 14 characters. The first component MUST begin with a
-letter; subsequent components SHOULD begin with a letter.
-Newsgroup names MUST not contain uppercase letters, except
-where required by encodings in encoded words. The sequences
-"all" and "ctl" MUST not be used as components.
-
- NOTE: The alphabet and syntax specified encom-
- passes all existing names of widespread news-
- groups, while avoiding various forms that are
- known to cause problems. Important existing soft-
- ware uses various non-alphanumeric characters as
- punctuation adjacent to newsgroup names. (It
- would, in fact, be preferable to ban "+" from
- newsgroup names, were it not that several
- widespread newsgroups related to the C++ program-
- ming language already use it.)
-
- NOTE: Much existing software converts the news-
- group name into a directory path and stores the
- articles themselves using numeric filenames, so
-
-INTERNET DRAFT to be NEWS sec. 5.5
-
-
- all-digit name components can be troublesome; the
- "Great Renaming" early in the history of Usenet
- included revisions of several newsgroup names to
- eliminate such components.
-
- NOTE: The same storage technique is the reason for
- the 14-character limit. The limit is now largely
- historical, since most modern systems have much
- larger limits on the length of a directory entry's
- name, but many old systems are still in use. Sys-
- tems with shorter limits also exist, but news
- software on such systems has had to deal with the
- problem already, since there are several
- widespread newsgroups with 14-character components
- in their names. Implementors are warned that it
- is intended that the successor to this Draft will
- increase the 14-character limit, and are urged to
- fix their software to handle longer names grace-
- fully (if such fixes are necessary, given the
- intended domain of application of the particular
- software).
-
- NOTE: The requirement that the first character of
- a name be a letter accommodates existing software
- which assumes it can tell the difference between a
- newsgroup name and other possible syntactic enti-
- ties by inspecting the first character. Similar
- considerations motivate excluding "+", "-", and
- "_" from coming first in a component, and the
- preference for components that do not begin with
- digits. The "all" sequence is used as a wildcard
- symbol in much existing software, and the "ctl"
- sequence was involved in an obsolete historical
- mechanism for marking control messages, so they
- are best avoided.
-
- NOTE: Possibly newsgroup names should have been
- case-insensitive, but all existing software treats
- them as case-sensitive. (RFC 977 [rrr] claims
- that they are case-insensitive in NNTP, but exist-
- ing implementations are believed to ignore this.)
- The simplest solution is just to ban use of upper-
- case letters, since no widespread newsgroup name
- uses them anyway; this avoids any possibility of
- confusion.
-
- NOTE: The syntax has the disadvantage of contain-
- ing no white space, making it impossible to con-
- tinue a Newsgroups header across several lines.
- Implementors of relayers and reading agents are
- warned that it is intended that the successor to
- this Draft will change the definition of ng-delim
- to:
-
-INTERNET DRAFT to be NEWS sec. 5.5
-
-
- ng-delim = "," [ space ]
-
- and are urged to fix their software to handle
- (i.e., ignore) white space following the commas.
- Meanwhile, posters must avoid inserting such space
- (despite the natural-language convention which
- permits it) and posting agents should strip it
- out.
-
- NOTE: Encoded words as components are somewhat
- problematic, but are clearly desirable for use in
- non-English-speaking nations. They are not sub-
- ject to the 14-character limit, and this (plus the
- possibility of "/" within them) may require spe-
- cial handling in news software.
-
-Encoded words are allowed in newsgroup names ONLY where non-
-ASCII characters are necessary to the name, and must use the
-"b" encoding [rrr] and the first suitable character set in
-the MIME order of preferred character sets [rrr].
-
- NOTE: Since the newsgroup name is the encoded
- form, NOT the underlying non-ASCII form, there is
- room for terrible confusion here if the choice of
- encoding for a particular name is not fully stan-
- dardized.
-
-Posters SHOULD use only the names of existing newsgroups in
-the Newsgroups header, because newsgroups are NOT created
-simply by being posted to. However, it is legitimate to
-cross-post to newsgroup(s) which do not exist on the posting
-agent's host, provided that at least one of the newsgroups
-DOES exist there, and followup agents MUST accept this
-(posting agents MAY accept it, but SHOULD at least alert the
-poster to the situation and request confirmation). Relayers
-MUST not rewrite Newsgroups headers in any way, even if some
-or all of the newsgroups do not exist on the relayer's host.
-
- NOTE: Early experience with news software that
- created newsgroups when they were mentioned in a
- Newsgroups header was thoroughly negative: posters
- frequently mistype newsgroup names.
-
- NOTE: While it is legitimate for some of an arti-
- cle's newsgroups not to exist on the host where it
- is posted, this IS a rather unusual situation
- except in followups (which should go to all news-
- groups the precursor was posted to, even if not
- all of them reach the site where the followup is
- being posted).
-
- NOTE: Rewriting Newsgroups headers to strip
- locally-unknown newsgroups is superficially
- attractive. However, early experience with
-
-INTERNET DRAFT to be NEWS sec. 5.5
-
-
- exactly that policy was thoroughly negative: news
- propagation is more redundant and much less
- orderly than many people imagine, and in particu-
- lar it is not unheard-of for the (sometimes)
- fastest path between two (say) U of Toronto sites
- to pass outside U of Toronto... in which case
- newsgroup stripping can cause incomplete propaga-
- tion. Having an article's set of newsgroups
- change as it propagates can also result in fol-
- lowups not achieving the same propagation as the
- original. It's been tried; it's more trouble than
- it's worth; don't do it.
-
- NOTE: In particular, newsgroup stripping superfi-
- cially looks like a solution to the problem of
- duplicate regional newsgroup names. For example,
- both University of Toronto and University of Texas
- have "ut.general" newsgroups, and material cross-
- posted to that name and a global newsgroup appears
- in both universities' local newsgroups. However,
- the side effects of stripping are sufficiently
- unacceptable to disqualify it for this purpose.
- Don't do it.
-
-Cross-posting an article to several relevant newsgroups is
-far superior to posting separate articles with duplicated
-content to each newsgroup, because reading agents can detect
-the situation and show the article to a reader only once.
-Posters SHOULD cross-post rather than duplicate-post.
-
- NOTE: On the other hand, cross-posting to a large
- number of newsgroups usually indicates that the
- poster has not thought about his audience; arti-
- cles are rarely pertinent to more than (say) half
- a dozen newsgroups. Posting agents might wish to
- request confirmation when the number of newsgroups
- exceeds (say) five in the presence of a Followup-
- To header, or (say) two in the absence of such a
- header.
-
- NOTE: One problem with cross-postings is what to
- do with an article cross-posted to a set of news-
- groups including both moderated and unmoderated
- ones. Posters tend to expect such an article to
- show up immediately in the unmoderated newsgroups,
- especially if they do not realize that one or more
- of the newsgroups is moderated. However, since it
- is not possible for a moderator to retroactively
- add an already-posted article to a moderated news-
- group, the only correct action is to mail such an
- article to one (and only one) of the moderators
- for action. It is probably best for the posting
- agent to detect this situation and ask the poster
- what action is preferred. The acceptable choices
-
-INTERNET DRAFT to be NEWS sec. 5.5
-
-
- are to alter the newsgroup list or to mail to a
- moderator of the poster's choice; the posting
- agent should NOT offer duplicate-posting as an
- easy-to-request option (if only because many mod-
- erators will reject a submission that has already
- been posted to unmoderated newsgroups).
-
- NOTE: An article cross-posted to multiple moder-
- ated newsgroups really should have approval from
- all the moderators involved. In practice, the
- only straightforward way to do this is to send the
- article to one of them and have him consult the
- others.
-
-A newsgroup SHOULD not appear more than once in the News-
-groups header.
-
-Newsgroup names having only one component are reserved for
-newsgroups whose propagation is restricted to a single host
-(or the administrative equivalent). It is inadvisable to
-name a newsgroup "poster" because that word has special
-meaning in the Followup-To header (see section 6.1). The
-names "control" and "junk" are frequently used for pseudo-
-newsgroups internal to relayer implementations, and hence
-are also best avoided.
-
- NOTE: Beware of the duplicate-regional-newsgroup-
- names problem mentioned above. In particular,
- there are many, many hosts with a newsgroup named
- "general", and some surprising things show up in
- such newsgroups when people cross-post. It is
- probably better to use multi-component names,
- which are less likely to be duplicated. Fred's
- Widget House should use "fwh.general" rather than
- just "general" as its in-house general-topics
- newsgroup.
-
-It is conventional to reserve newsgroup names beginning with
-"to." for test messages sent on an essentially point-to-
-point basis (see also the ihave/sendme protocol described in
-section 7.2); newsgroup names beginning with "to." SHOULD
-not be used for any other purpose. The second (and possibly
-later) components of such a name should, together, comprise
-the relayer name (see section 5.6) of a relayer. The news-
-group exists only at the named relayer and its neighbors.
-The neighbors all pass that newsgroup to the named relayer,
-while the named relayer does not pass it to anyone.
-
-The order of newsgroup names in the Newsgroups header is not
-significant.
-
-INTERNET DRAFT to be NEWS sec. 5.6
+ The inclusion of folding white space within a Newsgroups-content is a
+ newly introduced feature in this standard. It MUST be accepted by all
+ conforming implementations (relaying agents, serving agents and
+ reading agents). Posting agents should be aware that such postings
+ may be rejected by overly-critical old-style relaying agents. When a
+ sufficient number of relaying agents are in conformance, posting
+ agents SHOULD generate such whitespace in the form of <CRLF WS> so as
+ to keep the length of lines in the relevant headers (notably
+ Newsgroups and Followup-To) to no more than than 79 characters (or
+ other agreed policy limit - see 4.5). Before such critical mass
+ occurs, injecting agents MAY reformat such headers by removing
+ whitespace inserted by the posting agent, but relaying agents MUST
+ NOT do so.
+
+ A newsgroup-name consists of one or more components. Components MAY
+ contain non-ASCII letters, but these MUST be encoded in UTF-8 and not
+ according to [RFC 2047]. A component MUST contain at least one
+ letter (and MUST, according to the syntax, begin with a letter or
+ digit). Components SHOULD begin with a letter. Composite characters
+ (made by overlaying one character with another) and format
+ characters, as allowed in certain parts of Unicode and needed by
+ certain languages, must use whatever canonical conventions apply to
+ those parts of Unicode (such conventions are not defined in this
+ Standard). The use of "_" in a component is deprecated. Serving
+ agents MAY refuse to accept newsgroups using such a component.
+
+ NOTE: Components composed entirely of digits would cause
+ problems for the commonly used implementation technique of using
+ the component as the name of a directory, whilst also using
+ sequential numbers to distinguish the articles within a group.
+ Components containing other non-permitted characters could cause
+ problems when newsgroup-names appear in URLs [RFC 1738] (for
+ example an '@' character would prevent distinguishing between
+ newsgroup-names and message identifiers).
+
+ NOTE: According to the syntax, uppercase letters cannot occur in
+ newsgroup-names, but this standard imposes no requirement on
+ software to check this condition, since it would be unreasonable
+ to expect it to do so in parts of Unicode for which it was not
+ configured (in general, a table lookup is required). Rather, it
+ is the responsibility of those creating new newsgroups (7.1) not
+ to violate it. It is, moreover, to be expected that a newsgroup
+ created in violation of this condition will not be propagated
+ particularly well.
+
+ Whilst there is no longer any technical reason to limit the length of
+ a component (formerly, it was limited to 14 characters) nor to limit
+ the total length of a newsgroup-name, it should be noted that these
+ names are also used in the newsgroups line (7.1.2) where an overall
+ policy limit applies, and moreover excessively long names can be
+ exceedingly inconvenient in practical use. Agencies responsible for
+ individual hierarchies SHOULD therefore, as a matter of policy, set
+ reasonable limits for the length of a component and of a newsgroup-
+ name. In the absence of such explicit policies, the default figures
+ are 30 characters and 71 characters respectively.
+[If the checkpolicies proposal is included in the Standard, there should
+be a reference to it here.]
+ NOTE: The newsgroup-name as encoded in UTF-8 should be regarded
+ as the canonical form. Reading agents may convert it to whatever
+ character set they are able to display (see 4.4.1) and serving
+ agents may possibly need to convert it to some form more
+ suitable as a filename. Simple algorithms for both kinds of
+ conversion are readily available. Observe that the syntax does
+ not allow comments within the Newsgroups header; this is to
+ simplify processing by relaying and serving agents which have a
+ requirement to process this header extremely rapidly.
+
+ Posters SHOULD use only the names of existing newsgroups in the
+ Newsgroups header. However, it is legitimate to cross-post to
+ newsgroup(s) which do not exist on the posting agent's host, provided
+ that at least one of the newsgroups DOES exist there, and followup
+ agents SHOULD accept this (posting agents MAY accept it, but SHOULD
+ at least alert the poster to the situation and request confirmation).
+ Relaying agents MUST NOT rewrite Newsgroups headers in any way, even
+ if some or all of the newsgroups do not exist on the relaying agent's
+ host. Serving agents MUST NOT create new newsgroups simply because an
+ unrecognised newsgroup-name occurs in a Newsgroups header (see 7.1
+ for the correct method of newsgroup creation).
+
+ The Newsgroups header is intended for use in Netnews articles rather
+ than in mail messages. It MAY be used in a mail message to indicate
+ that it is a copy also posted to the listed newsgroups, but it SHOULD
+ NOT be used in a mail-only reply to a Netnews article (thus the
+ "inheritable" property of this header applies only to followups to a
+ newsgroup, and not to followups to the poster). Moreover, if a
+ newsgroup-name contains any non-ASCII character, it MAY be encoded
+ using the mechanism defined in [RFC 2047] when sent by mail but, if
+ it is subsequently returned to the Netnews environment, it MUST then
+ be re-encoded into UTF-8.