Python modules you should know: Scrapy

Posted by topdog on Apr 24, 2012 6:42 AM EDT
topdog.za.net; By Andrew Colin Kissa
Mail this story
Print this story

Next in our series of Python modules you should know is Scrapy. Do you want to be the next Google ? Well read on. Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing. You can use Scrapy to extract any kind of data from a web page, in HTML, XML, CSV and other formats. I recently used it to automate the extraction of domains and emails on the ISPA Spam Hall of Shame list, for use in a DNSBL.

Next in our series of Python modules you should know is Scrapy. Do you want to be the next Google ? Well read on.

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

You can use Scrapy to extract any kind of data from a web page, in HTML, XML, CSV and other formats. I recently used it to automate the extraction of domains and emails on the ISPA Spam Hall of Shame list, for use in a DNSBL.

Full Story

  Nav
» Read more about: Story Type: Tutorial; Groups: Linux

« Return to the newswire homepage

This topic does not have any threads posted yet!

You cannot post until you login.