BlockSpy – My First Magento Extension

(Note: There’s a bug in this that prevents the full ancestry from showing – only “root” and the current block are listed. I probably won’t get round to fixing this for a long time, since I’ve abandoned the Magento dev work I was planning when I wrote it, and since nobody seems to be interested in the extension.)

BlockSpy is a Magento extension that provides a bookmarklet, along the lines of CSS XRAY, for viewing the block-by-block structure of Magento pages. If all you’re after is the add-in itself to download, it’s here. Just unzip it and follow the instructions in INSTALL.txt. The article below explains how it works. Note that I’ve only tested this on Magento 1.7.0.2 with Chrome 21 and Firefox 15 (on Linux), and don’t intend to put much effort into further testing and maintaining it beyond my own needs. The extension is provided as-is, though I hope it proves useful. It’s released under the AFL, since that’s what Magento itself uses.

While learning my way around Magento, I became a bit irritated at the rigmarole that was involved in finding the details of the block that contains a given element on the page. For HTML, I’m used to having things like the major browsers’ dev functionality, together with external tools such as CSS XRAY, and I wanted something similar for viewing the block structure of Magento pages.

Magento itself provides some built-in functionality for viewing this information (System -> Configuration -> Developer -> Debug -> Template Path Hints), but it is ugly, alters the page layout and only works for template blocks (which admittedly does account for the vast majority of blocks).

In order to get a Javascript-based tool like CSS XRAY to work, the obvious prerequisite is that the page must contain enough information about the blocks displayed to be able to populate the UI. Normally, Magento’s generated (X)HTML pages contain no trace of the blocks used to generate them, so the extension needs to add this information on the server-side. It needs to do this in such a way that Javascript can access the data but the visual layout of the page is unaffected.

The major challenge comes from the fact that the block abstraction in Magento doesn’t work in terms of HTML as such, but rather in terms of strings. Each block produces a chunk of text, which becomes part of the generated document, but the processing at this stage is not aware of the node-by-node structure of the HTML.

The natural place to add block information to generated HTML without editing core files is in a handler for the core_block_abstract_to_html_after event, which is fired after each block gets asked for its generated HTML. Via a “transport” object, the handler for this event is able to alter the generated HTML content.

I tried using a handler for this event to add comments to the HTML generated, containing “blockStart” and “blockEnd” markers, which actually worked pretty well. However, I wasn’t convinced by this approach alone. Although comment nodes in the DOM are supposed to be accessible from Javascript, from Googling around it seemed a vaguely obscure thing to do, and I’m not sure how good the browser support is for it. Also, since Magento pages are served up with content type “text/html“, it seems the comment nodes get messed with sometimes – eg. comment nodes outside the root element (which are legal) can get moved into the body element. It may well be that this approach would work fine and would result in a simpler extension than what I ended up with, but I felt uneasy enough about it to take a different tack.

A much nicer way of adding this information to the document is via “data-*” attributes. All web browsers ignore all attributes with names starting with “data-” when rendering, but these attributes still appear in the DOM, which is perfect for getting the block info through to Javascript. The difficulty, though is adding the attributes to the document on the server side. Doing it as a pure text-processing job would be a real nightmare – really you need to properly parse the HTML to add the attributes. However, there you come up against the problem that HTML itself is not very nice to parse and process, in comparison to XML. What’s needed (as I’ll explain later) is a SAX parser, but I only know of one SAX-like parser for HTML (TagSoup) and that’s in Java.

Theoretically, at least, Magento produces XHTML output, which in turn is valid XML and therefore can be tackled with a SAX parser. However, like the vast majority of web sites/apps out there that do so, it still serves it up with a Content-Type of “text/html“, which means that on the browser side it gets processed as HTML “tag soup”. In practice there are a few minor problems with the output of Magento and more with the output of various extensions that may be present, so that you can’t really rely on the pages being truly valid XML/XHTML. (I’m not blaming Magento or the extension authors for this, BTW – it’s the common state of XHTML on the web resulting from the unfortunate difficulties you encounter when serving it up properly as “application/xhtml+xml”.)

And that’s a shame, because PHP does have a SAX (or near enough) parser together with an XMLWriter class (http://php.net/manual/en/book.xmlwriter.php), and between them these offer quite a nice way of processing the XML. The SAX parser is fed a stream of XML/XHTML (which it doesn’t mind arriving in chunks), and generates events as it sees interesting stuff happening, eg. calling the start and end element_handler functions as opening and closing tags are encountered. Meanwhile you can call corresponding functions on an XMLWriter, eg. startElement() and endElement() as the events are received. This effectively allows you to stream the document through your own handlers, which are aware of the structure of the document and are free to perform simple XML-aware transformations on it, such as adding attributes. By textually locating the markers that were added in the core_block_abstract_to_html_after handler and feeding the chunks of XML data in between the markers into the SAX parser, it’s easy to keep track of which block we are in and add the necessary attributes as we go.

Fortunately, there is a get-out clause for the almost-but-not-quite-XHTML we have to deal with – PHP comes with HTML tidy built in! This library takes potentially messy inputs and generates guaranteed clean, groovy HTML. If configured appropriately, it can produce XHTML, too. Although the transformations it has to perform can theoretically affect the visual layout this should only happen if the input is invalid, and Magento produces output that is very close to valid. In practice, I haven’t seen any rendering changes resulting from this transformation.

So the approach is:

  1. Hook the core_block_abstract_to_html_after event, using it to add XHTML comments marking the start and end of each block (and adding information like template locations).
  2. Hook the controller_front_send_response_before event and pass the entire generated document into HTMLTidy.
  3. Look through the resulting, valid XHTML document, textually locating the block inforation and passing everything else into a SAX parser. Keep track of the current block as we go.
  4. In the registered SAX parser handler functions, add “data-*” attributes containing block information.
  5. On the client side, provide a Javascript bookmarklet based on jQuery and jQuery.ui to display details about blocks on the page.

There were lots of little quirks, trips and traps to find workarounds for, but I don’t have time to go into all the details. You’ll find the forensic evidence of them in the code if you’re interested, but the above gives the outline of the mechanism. The extension seems to work pretty well: it doesn’t conflict with any of the extensions I have installed, doesn’t produce any layout changes and has proven very handy already in learning my way around Magento. I hope it’s useful for somebody else.

One last thing, though. Once the extension is installed, there’s an admin page that gives you a bookmarklet to copy onto your toolbar to activate BlockSpy. On that page there is a health warning, which bears repeating here:

When BlockSpy is active, all pages are filtered through HTMLTidy on the server side and then parsed and processed to add metadata for block browsing. This may interfere with the display of some pages, is likely to obscure any errors you may commit in your markup (since HTMLTidy will always produce valid XHTML of some sort, whatever you do), and will be a major drag on site performance. This add-in should only be used for dev purposes, to explore the block-by-block structure of a site, and must be disabled for BOTH testing and production. The add-in must be disabled fully, by editing Omnicognate_BlockSpy.xml in app/etc/modules/ and switching the ‘active’ attribute to false. Simply disabling module output in System -> Configuration -> Advanced is not sufficient.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: