usefor-article-08 August 2002
[< Prev]
[TOC] [ Next >]
4.4.2. Character Sets within Article Bodies
Within article bodies, characters are represented as octets according
to the encoding scheme implied by any Content-Transfer-Encoding- and
Content-Type-headers [RFC 2045]. In the absence of such headers,
reading agents cannot be relied upon to display correctly more than
the US-ASCII characters, though they MUST display at least those.
NOTE: Observe that reading agents are not forbidden to "guess"
when confronted with unannounced non-ASCII characters, and in
particular it would be reasonable at least to test whether they
were in the form of valid UTF-8 (see also the suggestion for
such a test in 4.4.1).
NOTE: It is not expected that reading agents will necessarily be
able to present characters in all possible character sets. For
example, a reading agent might be able to present only the ISO-
8859-1 (Latin 1) characters [ISO 8859], in which case it Ought
to present undisplayable characters using some distinctive
glyph, or by exhibiting a suitable warning.
Followup agents MUST be careful to apply appropriate encodings to the
outbound followup. A followup to an article containing non-ASCII
material is very likely to contain non-ASCII material itself.
[< Prev]
[TOC] [ Next >]
#Diff to first older
--- ../usefor-article-07/Character_Sets_within_Article_Bodies.out May 2002
+++ ../usefor-article-08/Character_Sets_within_Article_Bodies.out August 2002
@@ -5,9 +5,12 @@
Content-Type-headers [RFC 2045]. In the absence of such headers,
reading agents cannot be relied upon to display correctly more than
the US-ASCII characters, though they MUST display at least those.
- NOTE: Observe that reading agents are not forbidden to "guess",
- or to interpret as UTF-8 regardless, which would be the simplest
- course for them to take.
+
+ NOTE: Observe that reading agents are not forbidden to "guess"
+ when confronted with unannounced non-ASCII characters, and in
+ particular it would be reasonable at least to test whether they
+ were in the form of valid UTF-8 (see also the suggestion for
+ such a test in 4.4.1).
NOTE: It is not expected that reading agents will necessarily be
able to present characters in all possible character sets. For