[Discuss] What's the best site-crawler utility?
Eric Chadbourne
eric.chadbourne at gmail.com
Wed Jan 8 00:37:54 EST 2014
Plus one for HTTrack. I used it a couple of months ago to convert a
terrible Joomla hacked site to HTML. It was a pain to use at first,
like having to use Firefox, but it worked as advertised.
Hope that helps.
On Tue, Jan 7, 2014 at 10:34 PM, Greg Rundlett (freephile)
<greg at freephile.com> wrote:
> Hi Bill,
>
> GPL - licensed HTTrack Website Copier works well (http://www.httrack.com/).
> I have not tried it on a MediaWiki site, but it's pretty adept at copying
> websites including dynamically generated websites.
>
> They say: "It allows you to download a World Wide Web site from the
> Internet to a local directory, building recursively all directories,
> getting HTML, images, and other files from the server to your computer.
> HTTrack arranges the original site's relative link-structure. Simply open a
> page of the "mirrored" website in your browser, and you can browse the site
> from link to link, as if you were viewing it online. HTTrack can also
> update an existing mirrored site, and resume interrupted downloads. HTTrack
> is fully configurable, and has an integrated help system.
>
> WinHTTrack is the Windows 2000/XP/Vista/Seven release of HTTrack, and
> WebHTTrack the Linux/Unix/BSD release which works in your browser. There is
> also a command-line version 'httrack'.
>
> HTTrack is actually similar in it's result to the wget -k -m -np
> http://mysite that Matt mentions, but may be easier in general to use and
> offers a GUI to drive the options that you want.
>
> Using the MediaWiki API to export pages is another option if you have
> specific needs that can not be addressed by a "mirror" operation (e.g. your
> wiki has namespaced contents that you want to treat differently.) If you
> end up exporting via "Special:Export" or the API, then you will be faced
> with the option to convert your XML to HTML. I have some notes about wiki
> format conversions at https://freephile.org/wiki/index.php/Format_conversion
>
> There's pandoc. "If you need to convert files from one markup format into
> another, pandoc is your swiss-army knife."
> http://johnmacfarlane.net/pandoc/
>
> ~ Greg
>
> Greg Rundlett
>
>
> On Tue, Jan 7, 2014 at 6:49 PM, Bill Horne <bill at horne.net> wrote:
>
>> I need to copy the contents of a wiki into static pages, so please
>> recommend a good web-crawler that can download an existing site into static
>> content pages. It needs to run on Debian 6.0.
>>
>> Bill
>>
>> --
>> Bill Horne
>> 339-364-8487
>>
>> _______________________________________________
>> Discuss mailing list
>> Discuss at blu.org
>> http://lists.blu.org/mailman/listinfo/discuss
>>
> _______________________________________________
> Discuss mailing list
> Discuss at blu.org
> http://lists.blu.org/mailman/listinfo/discuss
--
Eric Chadbourne
617.249.3377
http://theMnemeProject.org/
http://WebnerSolutions.com/
More information about the Discuss
mailing list