thread. been stuck on a programing problem in last 2 days. recently i tried to experiment of using google code prettify instead of span tags to color code in html. see this video
youtube.com/watch?v=ml7V_OnBAO
but now i realized a big subtle problem. not sure am going to use it anymore.

it's logically impossible to determine if a text is ampersand encoded or decoded.
you might say, if the text contains “&”, then it is encoded.
wrong!
e.g. if we have the code
str.replace("&", "&")
that's actually not encoded.

Encoded version would be:
str.replace("&", "&")

the fact that there is no simple logical way to determine
if text is ampersand encoded or decoded, it's a problem.
because if & gets encoded twice, the code is screwed. won't run correctly anymore.

why would it happen twice?
because when you put lots source code in html on the web, and you constantly edit
it, i.e. decode edit encode cycle,
it's very easy to make mistakes of encoding twice or decoding twice overtime.

note in html4 and , all & ≺ ≻ must be encoded.
in , the rule is relaxed. if such char are surrounded by space, then it's ok, no need encoding to entities. (the rule is more complex than this)

but why not just leave ≺ ≻ & as is without encoding at all?
That's a problem, because suppose your code process html tags.
Code that process html are actually quite common.
so if are writing programing tutorial or blog, you may have lots code snippets containing lots html tags in string or regex.

so, at this point, the advantage of using JavaScript to syntax color your code in your blog, is half gone. you still need to ampersand encode your code. and if it's web dev code (lots ≺ ≻ &), your code is unreadable without decoding. Same as using span tags to syntax color.

using span tags to syntax color code on the web, now has a advantage. i.e. it's easy to determine the encoded/decoded state. (by simply checking existence of ≺span≻) So, not prone to error of encode/decode twice.

this problem, is a general problem of nesting. It happens with string escape sequence in langs. e.g. have you ever tried to grep a string that is from a code snippet's regex pattern? basically, it is impossible to figure out the backslash escape sequence.

here's a example of how unreadable code is, after ampersand encode, even if you are not using span tags for syntax coloring. Note, if you do not ampersand encode it, your html page is totally screwed after the code snippet.

Note, the problem can be solved trivially in XML by using <![CDATA[like this]]>, unfortunately, it does not work in html4 or 5. it's sad to be reminded the entire coup of xml by wtfg that was Apple and Google for $

Follow

just took a 1 hour walk to think about a final decision on this problem that's bugging me for a week. I decided not use google code prettify. It is possibly worse a solution than using span tags. Now i need to revert 1 day's work. It'll take half a day.

Sign in to participate in the conversation
Mastodon

Fast, secure and up-to-date instance, welcoming everyone around the world. Join us! 🌍
Up since 04/04/2017. ✅

Why should you sign up on mstdn.io?

This instance is not focused on any theme or subject, feel free to talk about whatever you want. Although the main language is english, we accept every single language and country.

We're connected to the whole OStatus/ActivityPub fediverse and we do not block any foreign instance nor user.

We do have rules, but the goal is to have responsible users. So far we haven't had any issue with moderation

The instance uses a powerful server to ensure speed and stability, and it has good uptime. We follow state-of-the-art security practices.

Also, we have over 300 custom emojis to unleash your meming potential!


Looking for a Kpop themed instance? Try kpop.social