Amid some recent handwringing1 about the state of digital archiving in general and Usenet archiving in particular, I decided to investigate the state of the current Usenet archives we have available to us. What are they? Where are they? What format are they in? How can we access them? What do they have? What do they omit?2
My initial searches turned up the following archives:
- Google Groups - based on the Deja News archive.
- The Internet Archive’s “Usenet Archive of UTZOO Tapes” - 2GB of compressed text total (18GB uncompressed), from between February of 1981 and June of 1991.
- The Internet Archive’s “Usenet Historical Collection” - “This historical collection of Usenet spans more than 30 years and was given to us by a generous donor.” The archive of
alt.*
groups is 219GB of compressed text. - “The Usenet Archive” - a search site that claims text archives “back to 1980”.
- A-News Archive - “Early Usenet News Articles: 1981 to 1982”. This page was originally linked to from Bruce Jones at UCSD’s “Archive for the History of Usenet Mailing List” page, but this archive seems to not have survived the migration off UCSD’s servers. Therefore, the only live archive seems to be what’s in the Internet Archive Wayback Machine.
According to Katharine Mieszkowski’s 2002 article, The geeks who saved Usenet, the oldest post in the Deja News/Google Groups archive was on May 11, 1981 by Mark Horton, starting us in media res of a thread with the subject “newsgroup fa, net, etc.” on net.general
. This gives us a good starting point to search for in our archives.
- Google Groups - you can find the original thread here, with follow-ups after decades. That thread also had a post pointing to this post on the “usenet.hist” mailing list which contains some archived messages that predate Deja News.
- Internet Archive UTZOO Tapes - extracting
news001f1.tgz
should give us the oldest messages in the archive if the tapes were made in strictly chronological order. Grep can then find us the message we’re looking for, innews001f1/a2/ucbarpa.111
:3 -
Internet Archive Usenet Historical Collection - due to the organization of these files, we can just download
net.general.mbox.zip
(2.7MB) from theusenet-net
item. Unfortunately the mbox format of these files seems a bit too idiosyncratic for the mailreaders I tried them with. Fortunately, they’re still just plaintext. Here’s our entry (the headers here seem to indicate this is some sort of dump or scrape of Google Groups data): - The Usenet Archive - no luck. I also get zero results searching in
net.general
from 1980 to 1981. - A-News Archive: the Wayback Machine has this snapshot of posts to
net.general
which includes our subject (and two which predate it!), but no copies of the messages themselves appear to be in the Wayback Machine.
In fact, looking through our net.general.mbox
file from The Internet Archive Usenet Historical Collection for the net.general
messages that predate our test message (“DEC on Usenet” and “New Disk Drive”) reveals that we can recover one of them:
Note here the difference between the Date
header and the X-Google-ArrivalTime
header, which is probably why this wasn’t counted as the “oldest” message in the archive.
We can find the same message in the UTZOO archive in news001f1/a2/decvax.116
. Interestingly, we can also find the “New Disk Drive” message in the UTZOO archive in news001f1/a2/duke.757
(a message which I cannot find in net.general.mbox
):
This is just an initial investigation with one test, and by no means comprehensive. A good next step for someone interested in early Usenet posts would probably be to try to check coverage between the UTZOO collection and the Usenet Historical Collection to see if there are any gaps which can be filled in by merging them together. Another question to try to answer would be how comprehensive the Usenet Historical Collection is for the 1991-on range not covered by the UTZOO collection.
Footnotes
-
Matthew Braga. Google, a Search Company, Has Made Its Internet Archive Impossible to Search. Vice Motherboard. Published 2015-02-13. Accessed 2015-02-23.
Andy Baio. Never trust a corporation to do a library’s job. Medium. Published 2015-01-28. Accessed 2015-02-23.
Ian Sample. Google boss warns of ‘forgotten century’ with email and photos at risk. The Guardian. Published 2015-02-13. Accessed 2015-02-23.
Gareth Millward. I tried to use the Internet to do historical research. It was nearly impossible. The Washington Post. Published 2015-02-17. Accessed 2015-02-23. ↩
-
Preserving all of Usenet, including all binary postings, would be a pretty daunting task. I’m not really aware of anyone who’s actually trying to do that, though even archiving just metadata about binary postings might provide an interesting historical record. ↩
-
See this blog post on bang path addressing for a note on the email addresses in these early Usenet archives. ↩