I am now blogging at blog.alexmaccaw.com
I had to do some HTML parsing recently to convert some markdown into the format required for Nettuts+ tutorials. It required moving various elements around, adding classes and appending some new elements.
Now normally I'd go with Ruby's de-facto solution to XML parsing, Nokogiri. However, I quickly ran into issues which, combined with the library's class based excuse for documentation, made me decide to take a different approach.
First, install the necessary npm dependencies (in the app's directory):
npm install -g coffee-script npm install jquery node-markdown
Then create a CoffeeScript
fs = require('fs') $ = require('jQuery') md = require('node-markdown').Markdown task 'build', 'Build index.html', -> # Read in file html = fs.readFileSync('./index.md', 'utf8') # Convert to markdown html = md(html) # Create jQuery object doc = $('<body />').append(html) # Insert <hr /> before all <h2 /> elements doc.find('h2').before('<hr />') doc.find('hr:first').remove() # Correct pre syntax doc.find('pre code').each -> $(@).parent().html $(@).html() doc.find('pre').attr('name', 'code').addClass('cs') # Remove images from p tags, and wrap them correctly doc.find('p img').each -> parent = $(@).parent() parent.after $(@) parent.remove() doc.find('img').wrap('<div class="tutorial_image" />') # Add required class to blockquotes doc.find('blockquote').addClass('pullquote pqRight') # Write out file fs.writeFileSync('./index.html', doc.html())
Now tell me that syntax isn't concise and beautiful, a vast improvment over XML parsing with other libraries.
Our build task can be invoked by running
cake build, generating the resultant
Now, of course this approach won't be suitable for all use cases. For example, I've no idea of the script's performance. However for my needs, where it only needs to be run once, it's ideal. If needs be, we could even pipe the resultant HTML back to Ruby via STDOUT.