usefor-article-08 August 2002

4.4.2.  Character Sets within Article Bodies

   Within article bodies, characters are represented as octets according
   to the encoding scheme implied by any Content-Transfer-Encoding- and
   Content-Type-headers [RFC 2045].  In the absence of such headers,
   reading agents cannot be relied upon to display correctly more than
   the US-ASCII characters, though they MUST display at least those.

        NOTE: Observe that reading agents are not forbidden to "guess"
        when confronted with unannounced non-ASCII characters, and in
        particular it would be reasonable at least to test whether they
        were in the form of valid UTF-8 (see also the suggestion for
        such a test in 4.4.1).

        NOTE: It is not expected that reading agents will necessarily be
        able to present characters in all possible character sets. For
        example, a reading agent might be able to present only the ISO-
        8859-1 (Latin 1) characters [ISO 8859], in which case it Ought
        to present undisplayable characters using some distinctive
        glyph, or by exhibiting a suitable warning.

   Followup agents MUST be careful to apply appropriate encodings to the
   outbound followup. A followup to an article containing non-ASCII
   material is very likely to contain non-ASCII material itself.

[< Prev] [TOC] [ Next >]
#Diff to first older

Newer	Older
News Article Format and Transmission May 2004 News Article Format and Transmission November 2003 News Article Format June 2003 News Article Format April 2003 News Article Format February 2003	News Article Format May 2002 News Article Format November 2001 News Article Format July 2001 News Article Format April 2001 News Article Format February 2000


--- ../usefor-article-07/Character_Sets_within_Article_Bodies.out          May 2002
+++ ../usefor-article-08/Character_Sets_within_Article_Bodies.out          August 2002
@@ -5,9 +5,12 @@
    Content-Type-headers [RFC 2045].  In the absence of such headers,
    reading agents cannot be relied upon to display correctly more than
    the US-ASCII characters, though they MUST display at least those.
-        NOTE: Observe that reading agents are not forbidden to "guess",
-        or to interpret as UTF-8 regardless, which would be the simplest
-        course for them to take.
+
+        NOTE: Observe that reading agents are not forbidden to "guess"
+        when confronted with unannounced non-ASCII characters, and in
+        particular it would be reasonable at least to test whether they
+        were in the form of valid UTF-8 (see also the suggestion for
+        such a test in 4.4.1).
 
         NOTE: It is not expected that reading agents will necessarily be
         able to present characters in all possible character sets. For

Documents were processed to this format by Forrest J. Cavalier III