ParseHTML is an HTML parser which works with Ruby 1.8 and above. ParseHTML will even try to handle invalid HTML to some degree.


sudo gem install parsehtml

Demonstration of usage

require 'parsehtml'

html = %Q(
  <h1>This is my HTML code</h1>
  <p>Pass this <b>directly</b> into the parser</p>

parser =  # Create a new parser object
parser.next_node              # traverse through the HTML nodes
parser.node                   # gives the current node (<h1>)
parser.node_type              # gives the node type (tag)
parser.open_tags              # lists any open tags ([])
parser.tag_name               # gives the DOM tag name (h1)
parser.is_block_element       # is this a block element? (true)
parser.is_empty_tag           # is this an empty tag? (false)
parser.is_start_tag           # is this a start tag? (true)
parser.tag_attributes         # lists the current tags attributes ({})


How to submit patches

First read the 8 steps for fixing other people’s code

Then fork your own copy of the repository on github, make your patch and then submit a pull request via github.

git clone git://

Build and test instructions

cd parsehtml
rake test
rake install_gem



This code is free to use under the terms of the MIT license.


Craig P Jolicoeur, 22nd December 2008
