thread. been stuck on a programing problem in last 2 days. recently i tried to experiment of using google code prettify instead of span tags to color code in html. see this video
but now i realized a big subtle problem. not sure am going to use it anymore.

it's logically impossible to determine if a text is ampersand encoded or decoded.
you might say, if the text contains “&”, then it is encoded.
e.g. if we have the code
str.replace("&", "&")
that's actually not encoded.


Encoded version would be:
str.replace("&", "&")

the fact that there is no simple logical way to determine
if text is ampersand encoded or decoded, it's a problem.
because if & gets encoded twice, the code is screwed. won't run correctly anymore.

why would it happen twice?
because when you put lots source code in html on the web, and you constantly edit
it, i.e. decode edit encode cycle,
it's very easy to make mistakes of encoding twice or decoding twice overtime.

note in html4 and , all & ≺ ≻ must be encoded.
in , the rule is relaxed. if such char are surrounded by space, then it's ok, no need encoding to entities. (the rule is more complex than this)

but why not just leave ≺ ≻ & as is without encoding at all?
That's a problem, because suppose your code process html tags.
Code that process html are actually quite common.
so if are writing programing tutorial or blog, you may have lots code snippets containing lots html tags in string or regex.

so, at this point, the advantage of using JavaScript to syntax color your code in your blog, is half gone. you still need to ampersand encode your code. and if it's web dev code (lots ≺ ≻ &), your code is unreadable without decoding. Same as using span tags to syntax color.

using span tags to syntax color code on the web, now has a advantage. i.e. it's easy to determine the encoded/decoded state. (by simply checking existence of ≺span≻) So, not prone to error of encode/decode twice.

this problem, is a general problem of nesting. It happens with string escape sequence in langs. e.g. have you ever tried to grep a string that is from a code snippet's regex pattern? basically, it is impossible to figure out the backslash escape sequence.

here's a example of how unreadable code is, after ampersand encode, even if you are not using span tags for syntax coloring. Note, if you do not ampersand encode it, your html page is totally screwed after the code snippet.

Note, the problem can be solved trivially in XML by using <![CDATA[like this]]>, unfortunately, it does not work in html4 or 5. it's sad to be reminded the entire coup of xml by wtfg that was Apple and Google for $

just took a 1 hour walk to think about a final decision on this problem that's bugging me for a week. I decided not use google code prettify. It is possibly worse a solution than using span tags. Now i need to revert 1 day's work. It'll take half a day.

@xahlee I think I was replacing something like


It just looked silly in vim. It wasnt exactly that but it was something small that blew up into escape sequence hilariousness.

Sign in to participate in the conversation

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!