forked from mirrors/gecko-dev
Implements https://github.com/whatwg/html/issues/6962 . Improves performance when <meta charset> occurs in head but after the first kilobyte and aligns behavior better with WebKit and Blink. The main change is to avoid reloads when meta appears within head but after the first kilobyte. Prior to this change, Gecko reloaded in that case (in compliance with the spec!) even though WebKit and Blink did not. Differences from WebKit and Blink: * WebKit and Blink honor <meta charset> in <noscript>. This implementation does not. * WebKit and Blink look for meta as if the tree builder was unaware of foreign content. This implementation is foreign content-aware. This makes a difference for CDATA sections that contain a > before the meta as well as style and script elements within foreign content. This could happen if the CDATA section that has mysteriously been introduced around a what looks like a meta tag also contains another prior tag-looking run of text. * This implementation processes rel=preload and speculative loads that are seen before <meta charset> has been seen. WebKit and Blink instead first look for the meta and rewind before starting speculative parsing. * Unlike WebKit, if there is neither an honored meta nor syntax resembling an XML declaration, detection from content takes place (as in Blink). * Unlike Blink, if there is neither an honored meta nor syntax resembling an XML declaration, the detection from content is not dependent of network buffer boundaries. * Unlike Blink, detection from content can trigger a reload at the end of the stream if the guess made at that point differs from the first guess. (See below for the definition of the input to the first guess.) Differences from the old spec and Gecko previously: * Meta inside script and RCDATA elements is no longer honored. * Late meta is now ignored and no longer triggers a reload. * Later meta counts as early enough meta: In addition to the previous meta within the first 1024 bytes, now a meta that started within the first 1024 bytes counts as early enough. Additionally, if by then there hasn't been a template start tag and head hasn't ended, meta occurring before the earlier of the end of the head or a template start tag counts as early enough. * Meta now counts as not-late even if the encoding label has numeric character reference escapes. * Syntax resembling an XML declaration longer than a kilobyte is honored if there is no honored meta. * If there is neither an honored meta nor syntax resembling an XML declaration, the initial chardetng scan is potentially longer than before: the first 1024 bytes, the token spanning the 1024-byte boundary if there is such a token, and, if by then head hasn't ended and there hasn't been a template start tag until the end of the template start tag or the end of the token that causes head to end, ever comes first. However, if the token implying the end of the head is a text token, bytes only to the end of the previous non-text token is considered. (This definition avoids depending on network buffer boundaries.) * XML View Source now uses the code for syntax resembling an XML declaration instead of expat for extracting the internal encoding label. Reftest are added as both WPT and Gecko reftests in order to test both http: and file: URL scenarios. The Gecko tests retain the WPT <link> tags in order to use the exact same bytes. An encoding declaration has been added to a number of old tests that didn't intend to test the new speculation behavior especially in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=1727750 . Differential Revision: https://phabricator.services.mozilla.com/D125808
131 lines
7.4 KiB
HTML
131 lines
7.4 KiB
HTML
<!doctype html>
|
|
<html>
|
|
<!--
|
|
https://bugzilla.mozilla.org/show_bug.cgi?id=672453
|
|
-->
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<title>Test for Bug 672453</title>
|
|
<script src="/tests/SimpleTest/SimpleTest.js"></script>
|
|
<link rel="stylesheet" href="/tests/SimpleTest/test.css">
|
|
</head>
|
|
<body>
|
|
<a target="_blank"
|
|
href="https://bugzilla.mozilla.org/show_bug.cgi?id=672453"
|
|
>Mozilla Bug 672453</a>
|
|
<iframe></iframe>
|
|
<script>
|
|
/** Test for Bug 672453 **/
|
|
|
|
var tests = [
|
|
"file_bug672453_not_declared.html",
|
|
"file_bug672453_xml_decl.html",
|
|
"file_bug672453_late_meta.html",
|
|
"file_bug672453_meta_restart.html",
|
|
"file_bug672453_meta_after_head.html",
|
|
"file_bug672453_meta_unsupported.html",
|
|
"file_bug672453_http_unsupported.html",
|
|
"file_bug672453_meta_utf16.html",
|
|
"file_bug672453_meta_non_superset.html",
|
|
"file_bug672453_meta_userdefined.html",
|
|
"file_bug672453_meta_replacement.html",
|
|
"file_bug672453_http_replacement.html",
|
|
"file_bug672453_enc_error.html",
|
|
"file_bug672453_enc_error_inherited.html",
|
|
"file_bug672453_meta_speculation_fail.html",
|
|
"file_bug672453_xml_speculation_fail.html",
|
|
];
|
|
|
|
// The general idea here is that encoding substitutions or failures to declare the encoding
|
|
// (except when inherited or, not tested here, all-ASCII) are errors and ineffeciencies
|
|
// or risks about things failing under further editing (i.e. meta after head but within
|
|
// the extended scan zone) are warnings.
|
|
|
|
var expectedErrors = [
|
|
{ errorMessage: "The character encoding of a framed document was not declared. The document may appear different if viewed without the document framing it.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_not_declared.html",
|
|
lineNumber: 0,
|
|
isWarning: true },
|
|
{ errorMessage: "The character encoding of an HTML document was declared using the XML declaration syntax. This is non-conforming, and declaring the encoding using a meta tag at the start of the head part is more efficient.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_xml_decl.html",
|
|
lineNumber: 1,
|
|
isWarning: true },
|
|
{ errorMessage: "A meta tag attempting to declare the character encoding declaration was found too late, and the encoding of the parent document was used instead. The meta tag needs to be moved to the start of the head part of the document.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_late_meta.html",
|
|
lineNumber: 1028,
|
|
isWarning: false },
|
|
{ errorMessage: "A meta tag attempting to declare the character encoding declaration was found too late, and the encoding of the parent document was used instead. The meta tag needs to be moved to the start of the head part of the document.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_meta_restart.html",
|
|
lineNumber: 1028,
|
|
isWarning: false },
|
|
{ errorMessage: "The meta tag declaring the character encoding of the document should be moved to start of the head part of the document.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_meta_after_head.html",
|
|
lineNumber: 7,
|
|
isWarning: true },
|
|
{ errorMessage: "An unsupported character encoding was declared for the HTML document using a meta tag. The declaration was ignored.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_meta_unsupported.html",
|
|
lineNumber: 1,
|
|
isWarning: false },
|
|
{ errorMessage: "An unsupported character encoding was declared on the transfer protocol level. The declaration was ignored.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_http_unsupported.html",
|
|
lineNumber: 0,
|
|
isWarning: false },
|
|
{ errorMessage: "A meta tag was used to declare the character encoding as UTF-16. This was interpreted as an UTF-8 declaration instead.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_meta_utf16.html",
|
|
lineNumber: 1,
|
|
isWarning: false },
|
|
{ errorMessage: "An unsupported character encoding was declared for the HTML document using a meta tag. The declaration was ignored.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_meta_non_superset.html",
|
|
lineNumber: 1,
|
|
isWarning: false },
|
|
{ errorMessage: "A meta tag was used to declare the character encoding as x-user-defined. This was interpreted as a windows-1252 declaration instead for compatibility with intentionally mis-encoded legacy fonts. This site should migrate to Unicode.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_meta_userdefined.html",
|
|
lineNumber: 1,
|
|
isWarning: false },
|
|
{ errorMessage: "A meta tag was used to declare an encoding that is a cross-site scripting hazard. The replacement encoding was used instead.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_meta_replacement.html",
|
|
lineNumber: 0,
|
|
isWarning: false },
|
|
{ errorMessage: "An encoding that is a cross-site scripting hazard was declared on the transfer protocol level. The replacement encoding was used instead.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_http_replacement.html",
|
|
lineNumber: 0,
|
|
isWarning: false },
|
|
{ errorMessage: "The byte stream was erroneous according to the character encoding that was declared. The character encoding declaration may be incorrect.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_enc_error.html",
|
|
lineNumber: 0,
|
|
isWarning: false },
|
|
{ errorMessage: "The byte stream was erroneous according to the character encoding that was inherited from the parent document. The character encoding needs to be declared in the Content-Type HTTP header, using a meta tag, or using a byte order mark.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_enc_error_inherited.html",
|
|
lineNumber: 0,
|
|
isWarning: false },
|
|
{ errorMessage: "The start of the document was reparsed, because there were non-ASCII characters before the meta tag that declared the encoding. The meta should be the first child of head without non-ASCII comments before.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_meta_speculation_fail.html",
|
|
lineNumber: 5,
|
|
isWarning: true },
|
|
{ errorMessage: "The start of the document was reparsed, because there were non-ASCII characters in the part of the document that was unsuccessfully searched for a meta tag before falling back to the XML declaration syntax. A meta tag at the start of the head part should be used instead of the XML declaration syntax.",
|
|
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_xml_speculation_fail.html",
|
|
lineNumber: 11,
|
|
isWarning: true },
|
|
];
|
|
|
|
SimpleTest.waitForExplicitFinish();
|
|
|
|
window.onload = function() {
|
|
var iframe = document.getElementsByTagName("iframe")[0];
|
|
|
|
function runNextTest() {
|
|
var url = tests.shift();
|
|
if (!url) {
|
|
SimpleTest.endMonitorConsole();
|
|
return;
|
|
}
|
|
iframe.src = url;
|
|
}
|
|
iframe.onload = runNextTest;
|
|
|
|
SimpleTest.monitorConsole(SimpleTest.finish, expectedErrors);
|
|
runNextTest();
|
|
};
|
|
</script>
|
|
</body>
|
|
</html>
|