ParseHTML
What
ParseHTML is an HTML parser which works with Ruby 1.8 and above. ParseHTML will even try to handle invalid HTML to some degree.
Installing
sudo gem install parsehtml
Demonstration of usage
require 'parsehtml' html = %Q( <h1>This is my HTML code</h1> <p>Pass this <b>directly</b> into the parser</p> ) parser = ParseHTML.new(html) # Create a new parser object parser.next_node # traverse through the HTML nodes parser.node # gives the current node (<h1>) parser.node_type # gives the node type (tag) parser.open_tags # lists any open tags ([]) parser.tag_name # gives the DOM tag name (h1) parser.is_block_element # is this a block element? (true) parser.is_empty_tag # is this an empty tag? (false) parser.is_start_tag # is this a start tag? (true) parser.tag_attributes # lists the current tags attributes ({})
Forum
http://groups.google.com/group/parsehtml
How to submit patches
First read the 8 steps for fixing other people’s code
Then fork your own copy of the repository on github, make your patch and then submit a pull request via github.
git clone git://github.com/cpjolicoeur/parsehtml.git
Build and test instructions
cd parsehtml rake test rake install_gem
Documentation
http://parsehtml.rubyforge.org/rdoc
License
This code is free to use under the terms of the MIT license.
Contact
Comments are welcome. Send an email via the forum
Craig P Jolicoeur, 22nd December 2008
Theme extended from Paul Battley