Tonite at Silicon Valley Perl Mongers Group (sv.pm.org), Lambert Lum gave a talk on “Building a Web Crawler with Perl.”
Largely he talked about the following Perl modules:
Joe McMahon, a CPAN maintainer of 30 modules including WWW::Mechanize::Plugin, was available to answer detailed questions.
Afterwards, several members made very interesting comments on their search projects, including using HTML Tidy to fix broken pages before parsing, and to pay attention to site maps.
There was a lot of interest in what to do with JavaScript in crawled web pages.
Thanks to Plug and Play Tech Center for hosting the meeting again.