Myself, Coding, Ranting, and Madness

The Consciousness Stream Continues…

Mangler's Tag Parser

30 Nov 2012 8:00 Tags: Blog, PHP, Programming, Web Design

Wordpress might well be an unauthenticated remote shell, with a blog addedCitation Needed, but it does has a number of exciting features. These include the flexible support for ‘Short Codes’2, essentially extra custom tags which are processed server side. These formed the basis of my footnotes plug-in, Note-n-Cite.

However, the underlying code for these short codes is actually somewhat poor: for example, nesting is supported as long as the inner tags is not of the same type as the outer tag3. This may have been a design decision to improve speed, but it was certainly an annoying one.

Now, there is a rather good thread on Stack Overflow on why trying to use regex for HTML parsing is a terrible idea — and it's correct. However, what we need to do here is not fully fledged parsing. Instead, we just need to find each opening tag, and the partner closing tag. If, when finding the closing tag, we ignore the rest of the document's structure, we can greatly simplify the while process.

You can see the current state of the TagParser in the blog's repository. You can also try out a very similar implementation on a static test page.

It is aware of self-closing tags, nesting of all the supported types of tags, and is able to render in a time worth measuring in microseconds.

The code is dual-licensed, as part of Mangler under the "New BSD" License, and the GPL Version 2

  1. 1 Citation Needed
  2. 2 http://codex.wordpress.org/Shortcode_API
  3. 3 http://codex.wordpress.org/Shortcode_API#Limitations