February 24, 2006 03:23 | WebTech

Adding missing features

I spent the end of this past summer building the USC Interactive Media Department's website/weblogs/aggregator system. As I said back then, it was a heck of a fun project and I realize now that it gave me many of the skills and certainly a lot of the technical confidence I needed to then tackle GlobalVoices Online as well as everything that has come up since then. Nothing beats learning by doing and trial by fire.

I briefly explained the slug system I built for the IMD site, and I won't get into the details here now, but in a nutshell... All IMD users have their own blog, from which they have a "del.icio.us-like" tag (I call them slugs, because they are a controlled, system defined and relied-upon vocabulary) selection UI which "marks" an entry for "cross-posting", or more accurately, it adds a keyword which the concerned aggregators built into the site then look for.

The homepage is an example of this. It displays entries "tagged" (slugged) as "main" in the center column.

Now, that was 6 months ago. From day one, Scott had complained that entries in the aggregators would roll off the page after the proscribed 20 "most recent entries", with no way to "find them again". In other words, the date context was lost. I needed to figure out a way to get back to them: an archive system.

Six months went by, and I learned all about multi-axis contexts. Heh. I also kept putting it off.

Until tonight.

You'll notice on the main page, as well as the Research area "weblogs", Games, Immersive & Mobile, that a small pulldown menu appears under the green title. The menus allow you to select a month that is known by the system to contain entries, as well as a count of those entries. (The display of this is kinda iffy... not sure I will leave it like that.)

I'll get back to the pulldown in a moment, for it is the "really cool" part of this.

The first thing I needed to do was to modify my slug.aggregator function so as to accept a date context. In english, that means if I ask for "the main page archive for August, 2005", that's what I want to see. (Notice the "2005-08" in the URL there? That is my date context.) This was mostly quite easy to do, especially once Francis taught me how to request database entries throttled by year and month:
MONTH(entry_created_on) = '08' AND YEAR(entry_created_on) = '2005'
(Notice the brilliance of that SQL WHERE: first it looks for entries created in August, the from those only the ones created in 2005. So efficient! Merci encore Francis! ;)

Ok so that went well. Now, the pulldown...

The information I needed in order to generate this pulldown seems pretty straightforward--"give me a list of months where entries with the slug "X" appear, and give me a count of how many entries there are while you're at it"--but the reality of the situation was not so rosy.

You see, the dirty little secret, the Achilles heel of what I built for the IMD site is the fact that the slugs system relies on the keywords field associated with each blog entry. If I want to get all the entries that have been slugged as "main", I have to run though the entire table, looking into each keywords field and character matching in search of the string "main". Some of you are already cringing... yes... I do a "keywords LIKE %slug%" request. This is a dirty, nasty, mean thing to do to a database. But let's not dwell on that.

The fact of the matter is that executing such a request on every page load is not a good idea... especially since I am already doing just that just to render the content. Heh. Also, realistically, the data that generates that pulldown only needs to be requested from the database every so often.

Enter CRON and JSON. (I just lost 95% of those left reading... )

So the solution goes like this: every 24 hours the server runs a script which goes and gets the date-context arrays I need to generate the pulldown, converts it from volatile PHP memory to a string of JSON and writes it to a cache file. Coincidentally, the PHP library that handles this JSON conversion was written by my pal Michal Migurski who just keeps doing cooler and cooler stuff. Getting this JSON stuff to work was a snap. It was SO easy, I literally got up and danced a jig. Thank you Mike!

With that figured out, writing a small function to read the cached data, convert it back to a PHP array and then output the html that renders the pulldown was child's play. Copy and paste the function call across aggregator screens, done.

I feel So much better now that I got this done. Not only to have finally delivered the goods, but it was an easy quick learn of a few things that will serve me VERY well as I move back to extending GVO's interface (as well as a few others...)

Comments

Francis just informed me that I could bypass the whole JSON encoding/decoding by simply using the PHP serialize and unserialize functions.

Both of these solutions would have saved me a whole night of headbanging over at GVO. It'll take me 5 minutes to convert what took 6 hours of pain to create, to serialized of JSON'ed format. Sigh.

(yes, I used a WP template function to WRITE OUT a PHP array to a text file. It was hell.)