Rescue terrible HTML with TagSoup XHTML

Posted by solrac on May 23, 2006 7:54 AM EDT
developerWorks; By IBM
Mail this story
Print this story

The problem is that the Web is still mostly populated by the scary legacy of poorly structured HTML, much of it not even compliant to the more lenient SGML standard. XHTML is a friendly enough format for parsing and screen-scraping, but the Web still has a lot of messy HTML out there. In this tip Uche Ogbuji demonstrates the use of TagSoup to turn just about any HTML into neat XHTML.

The problem is that the Web is still mostly populated by the scary legacy of poorly structured HTML, much of it not even compliant to the more lenient SGML standard. XHTML is a friendly enough format for parsing and screen-scraping, but the Web still has a lot of messy HTML out there. In this tip Uche Ogbuji demonstrates the use of TagSoup to turn just about any HTML into neat XHTML.

Full Story

  Nav
» Read more about: Story Type: Tutorial; Groups: Community

« Return to the newswire homepage

This topic does not have any threads posted yet!

You cannot post until you login.