I joined the Trove project for our crowdsourcing assignment. It’s a database run by the National Library of Australia which aims to connect various archives, museums, universities and libraries in the country. On one hand, Trove shares the metadata and digital reproductions of the institutions’ holdings, on the other – and this is where the crowd comes in – it enables the public to practically engage with the archival work.
The main crowdsourcing aspect is the correction of electronically translated text from newspapers. Trove is asking the crowd to correct the texts from over 120 million newspaper articles. Many texts have not been translated accurately due to the poor quality of the newspapers and/or to the limits of the translation program used (Trove states in the FAQ section: “Computers are not as good at reading as humans, and often make mistakes”). With the corrections made by humans, Trove aims to improve the search in the digital text and its general quality.
In order to participate in the project, I created a Trove account, and went through some newspaper articles which the database provided me randomly. The correcting is pretty straight forward. The page is divided in two panels: the left shows the electronically translated text, the right the pdf of a digitized newspaper article:
Already the title of the example displays the (current) limits of computer generated text translations: “BRISBANE CBRL FBGHTS FOR BABY”. The user can correct the text in in the left panel: “BRISBANE GIRL FIGHTS FOR BABY”.There were over twenty mistakes in this short article. Some were minor typos but many were grave and would mislead many potential search requests, e.g. the mothers name was translated “Sounder” instead of “Saunders”. Therefore, a person searching for her would not find the article. It’s only one misspelled letter and one letter missing, but imagine all articles mentioning “Bill Clinton” would translated the name “Bill Clinso”, “Ill Olinton” or whatever.
According to Trove, the users made over 52’000 corrections in newspapers today. The most productive user apparently made 3 million corrections already. You need plenty of time and patience for that.
I feel it is a good example of how the crowd can help the archives improving their services. It is kind of an out-Sourcing of work they will never have the staff to complete.