BLU archives
Tom Metro
blu at vl.com
Mon Jan 30 12:42:21 EST 2006
Bill Horne wrote:
> Tom Metro wrote:
>> I just tried looking something up in the BLU (discuss list) archives:
>> http://olduvai.blu.org/pipermail/discuss/
>> and noticed we don't have a search engine.
>>
>> Are there steps we can take to get Google to spider the site?
>
> Please don't. The discuss archive has so many of my "private" email
> addresses in it that I'd have to go into the Internet witness protection
> program if Google gets to it. Let's leave the archive un-indexed.
I agree that divulging email addresses is bad. (I have a "Google Alert"
that helps me know when archives are "leaking" my addresses.)
But the software BLU is using does obfuscate email addresses in the
headers and the body of the messages. Unfortunately it does it in a
fairly predictable way (replacing "@" with "at" in the displayed text
and hyperlinking to the mailing list addresses). A determined spammer
could easily get around this, and it's probably the same scheme used by
all pipermail (the software BLU uses[1]) archives.
1. http://www.amk.ca/python/unmaintained/pipermail.html
It could be argued that we'd be better off using one of the public
archive sites that use more sophisticated obfuscation (such as
converting the addresses to images).
Some public archive services include openSubscriber.com[1] (also
provides RSS feeds), The Mail Archive[2], and Gmane[3] (also provides
NNTP and RSS access). The mailing list[4] for Boston Perl Mongers is
archived by all of these services, and you can compare for yourself the
presentation and obfuscation used by each.
1. http://www.opensubscriber.com/
2. http://www.mail-archive.com/
3. http://gmane.org/
4. http://boston.pm.org/kwiki/index.cgi?MongerLists
Some (Gmane) even support a X-No-Archive header so users can control
whether their messages get archived.
Adding a list to these services is as simple as adding an address to the
BLU discuss subscription list:
http://www.opensubscriber.com/faq.html
http://www.mail-archive.com/addlist.html
http://gmane.org/add.php
(I didn't run across any options for importing past archives.)
As for Google spidering, the cat's already out of the bag. Google
already has some of the archives, and the archives are publicly
accessible, so either Google will eventually get the rest of it, or a
spammer will spider the site themselves.
Matt Galster wrote:
> I agree with Bill.
For the same or different reasons?
> You can download a copy of the archive and search it if you want.
I already have a full archive locally.
Having a public, searchable archive has several benefits:
Users who aren't list subscribers can discover BLU when one of our
postings turn up as an answer to their query (particularly true if we
get indexed by Google).
New BLU members can find answers to common things by searching the archives.
Existing BLU subscribers can point new BLU members to past postings that
answer their questions.
Existing BLU subscribers who don't bother to archive the postings can
more conveniently find things they saw on the list in the past.
The archives embody the collective knowledge of the group, and as group
that follows the traditions of open source, I'd think we'd want to make
that information public.
-Tom
--
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/
More information about the Discuss
mailing list