The next beta and XML

Here's the next one: LnBlog 0.8.2 beta 2. I think I'm done adding things for this release, so I'm going to let this beta cook for a few days or so and then release it. I would have done the release today, but I added some fairly major things in the last couple of days and wanted to test them out a little more.

If you've been keeping score so far, this release contains changes to the theme stylesheets, mass comment/TrackBack/Pingback deletion, reply ungrouping, and a new AJAX calendar. The last big change, and the reason I want to let it cook, is the entry file format.

I finally got with the program and decided to start storing entry, article, and comment data in XML format. The ad hoc format I was using before worked well enough, but it was, well, very hacky. Basically the only reason I was using it was because I didn't want to depend on any optional XML libraries and I couldn't find a good one that I could just drop into the package. However, I finally broke down and wrote my own.

All I really wanted was a little library that did two things:
1) Read an XML file and put the contents in a DOM tree type data structure.
2) Take such a structure and serialize it to an XML file, automatically escaping angle braces and entities.
Unfortunately, PHP 4 doesn't come with anything like that. At least, not as a standard module. All it has is a thin wrapper around expat, an event-based XML parser. Expat is OK, but it doesn't output a DOM tree structure and it doesn't do anything related to serializing such a structure. So I basically just wrote a couple of classes: SimpleXMLReader and SimpleXMLWriter. They're found in the new lib/xml.php file.

The SimpleXMLReader is a wrapper around expat that builds a sloppy DOM tree. It also has the ability to convert that tree into an object. The SimpleXMLWriter takes an object and serializes it as XML. The way it works is that each element under the base node is treated as a field of the same name in a PHP class. This allows me to add fields to classes without having to account for them in the serialization routine. Anything that's new will automatically be written to the file and old files will simply not set the new fields.

Of course, the new classes are still pretty simplistic, but they're working pretty well so far. However, I strongly suspect they'll choke on text that's not encoded in UTF-8 or something compatible. Thus I added an option to turn it off. Just set the following in your userconfig.cfg to revert to writing old-style files:
USE_OLD_ENTRY_FORMAT = true
ENTRY_DEFAULT_FILE = current.htm
ENTRY_PATH_SUFFIX = .htm
COMMENT_PATH_SUFFIX = .txt

The last three lines just change back all the settings back to the old values.

Note that support for the old file formats has been maintained, so there's no need to convert old data. New and edited posts will be written in XML format, but old entries and comments will be read normally.

You can reply to this entry by leaving a comment below. This entry accepts Pingbacks from other blogs. You can follow comments on this entry by subscribing to the RSS feed.

Add your comments #

A comment body is required. No HTML code allowed. URLs starting with http:// or ftp:// will be automatically converted to hyperlinks.