ParseHTML

What

ParseHTML is an HTML parser which works with Ruby 1.8 and above. ParseHTML will even try to handle invalid HTML to some degree.

Installing

sudo gem install parsehtml

Demonstration of usage

require 'parsehtml'

html = %Q(
  <h1>This is my HTML code</h1>
  <p>Pass this <b>directly</b> into the parser</p>
)

parser = ParseHTML.new(html)  # Create a new parser object
parser.next_node              # traverse through the HTML nodes
parser.node                   # gives the current node (<h1>)
parser.node_type              # gives the node type (tag)
parser.open_tags              # lists any open tags ([])
parser.tag_name               # gives the DOM tag name (h1)
parser.is_block_element       # is this a block element? (true)
parser.is_empty_tag           # is this an empty tag? (false)
parser.is_start_tag           # is this a start tag? (true)
parser.tag_attributes         # lists the current tags attributes ({})

Forum

http://groups.google.com/group/parsehtml

How to submit patches

First read the 8 steps for fixing other people’s code

Then fork your own copy of the repository on github, make your patch and then submit a pull request via github.

git clone git://github.com/cpjolicoeur/parsehtml.git

Build and test instructions

cd parsehtml
rake test
rake install_gem

Documentation

http://parsehtml.rubyforge.org/rdoc

License

This code is free to use under the terms of the MIT license.

Contact

Comments are welcome. Send an email via the forum

Craig P Jolicoeur, 22nd December 2008
Theme extended from Paul Battley