Project Gutenberg in 2009, the PGERT?

Update: As pointed out to me, I did sound acrimonious writing this without necessarily meaning to be. More on that over at DP’s forum, and I’ve appended a footnote to the main text here.

The statistics usually found in the weekly Project Gutenberg email do get a bit tedious for me,I so I was surprised to see the latest gweekly volume (mirrored at the DP-News blog) was a nicely written piece – a call-of-arms by Michael Hart for an error correction team to be formed, and for everybody to send along the message to get the word out.

This is what irritates me; they openly call for help, but give absolutely no explanation as to the specifics. Who is co-ordinating the effort? How will errors will be reported, catalogued or fixed? Or anything else for that matter. And with two far simpler solutions, it really is potentially duplicating efforts on a project already time-poor.

Distributed Proofreaders already run Smooth Reading, where people can pick up a text and read it like a bookII and pay critical attention to sniff out errors that have slipped through the net thus far. Like every other round, SRing already has a shortage of help at hand, so why not add add a sprinkling of books from PG and actively promote it on the PG site to get the ever elusive new user into helping out by reading?

The other option would be to run a simple bug-reporting system on PG,III of which there are plenty of open source options on offer. Let people anonymously add errors (with minimal fuss, just a text box and captcha) so that the errors can be catalogued, verified and acted upon by a team of volunteers. Important to both methods is a list of false-positives, otherwise as time goes on there’ll be more time wasted on duplicates.

The key of any solution to the error reporting problem is to get new people helping out – there’s no point redirecting people already busy spending their free time on DP and PG.

An example ebook is given in the email that had a glut of unreported errors, although it isn’t specifically listed and there’s no reference that it was provided through DP. I seriously doubt we’d let 23 errors in a 300 page book get through, even though it’s less than 1% error rate. Anybody who proofs at DP would agree. One interesting sidenote is that the errors were found by somebody who was preparing a human-read audiobook, and would no doubt be looking at the text the same way a DP Smooth Reader would.

A more interesting point is that PG has received more than 10,000 emails with suggested corrections. In 37 years. Why on earth is PG still using email as the only way to report these? A lot of casual readers/visitors would no doubt notice errors and not bother to write an email, IV something a simple text box (with captcha) and a mention/link in the introductory text in the books would exponentially increase.

To be fair, the blog is a step in the right direction to building a better community around the project, although it’s hardly referenced on PG at all. They really should have user forums,V as well as open up the mailing list archives to the public that don’t want to subscribe to the actual list. The only community communications are effectively going on behind closed doors.

A one sided rant? Undoubtedly. :) I just hate to see time spent at DP diverted elsewhere, which deserves so much more than it receives.

  1. And I really wish somebody would make that damned domino stack PPS for Michael. :) []
  2. Instead of a page-by-page typical DP text. []
  3. garweyne from DP noted in the aforementioned forum thread that a student of his developed software specifically for this purpose, but that it was rejected sight unseen. It’ll be interesting to see if they’re more interested now. []
  4. Which is something you find out the hard way, as I’ve found with OpenDisc. []
  5. It might be a coup for the blog to start their own. []
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>