[Discuss] What's the best site-crawler utility?
Daniel Barrett
dbarrett at blazemonger.com
Wed Jan 8 09:43:08 EST 2014
>Daniel Barrett wrote:
>> For instance, you can write a simple script to hit Special:AllPages
>> (which links to every article on the wiki), and dump each page to HTML
>> with curl or wget.
On January 7, 2014, Richard Pieri wrote:
>Yes, but that's not humanly-readable. It's a dynamically generated
>jambalaya of HTML, JavaScript, PHP, CSS, and Ghu only knows what else.
Well, a script doesn't need human-readability. :-) Trust me, this is
not hard. I did it a few years ago with minimal difficulty (using a
couple of Emacs macros, if memory serves).
The HTML source of Special:AllPages is just a bunch of <a> tags (with
some window dressing around it) that all match a simple pattern.
--
Dan Barrett
dbarrett at blazemonger.com
More information about the Discuss
mailing list