Browse Source
Given an input document like this: <div> <p> <pre>hi</pre> </p> </div> It will validate just fine in `--raw-xml` mode. However, in normal "html/xhtml" mode, the "pre" opening tag automatically closes the currently open "p" tag leading to this: <div> <p> </p><pre>hi</pre> </p> </div> Without further intervention, the closing "p" tag that was already there (just before the closing "div" tag), now has no matching open "p" tag to close anymore -- the corresponding open tag is now the open "div" section. Obviously the document fails to validate at this point. The naive fix simply has the closing tag that corresponds to the opening tag that caused the "p" to be auto-closed to then automatically re-open a "p" at that point producing this: <div> <p> </p><pre>hi</pre><p> </p> </div> While such a solution does work, it frequently ends up introducing extra unwanted "p" sections. Instead of reopening the "p" immediately upon seeing the closing tag that matches the opening tag that auto-closed the "p", simply set a "reopen p" flag. When the "reopen p" flag is set and suitable conditions are met, then go ahead and "reopen" a new "p" tag. The exact conditions are a bit of an heuristic at the moment but amount to clearing the "reopen p" flag when the next start tag is seen and inserting a new "p" at that time only if the open tag is a text level element opening tag. Alternatively, if the "reopen p" flag is currently set and some non-whitespace text shows up before seeing another open tag, re-open a new "p" at that point (and clear the "reopen p" flag). Finally, if the flag is currently set and a closing "p" tag appears, just discard it and clear the "reopen p" flag. Essentially this case has the effect of just moving the closing "p" tag. With these changes, the troublesome document now produces this: <div> <p> </p><pre>hi</pre> </div> An improvement on what came before. Some might argue that the empty "p" section ought to simply be omitted entirely. Perhaps. But there was an explicit open "p" tag in the text -- auto closing it is one thing -- removing an explicit open tag entirely is something else. Additionally, since the validator validates in a "streamy" way, that's much more difficult to accomplish since at the time the initial opening "p" has been seen there's not yet any information available about the fact it's about to be auto-closed while still not containing any text and it therefore gets emitted to the output. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>master
Kyle J. McKay
4 years ago
1 changed files with 27 additions and 3 deletions
Loading…
Reference in new issue