skip to content

Darwin Correspondence Project



‘Dot means new form – say in Birds’
‘Dot means new form – say in Birds’
DAR 205.5: 184r
Reproduced with the permission of the Syndics of Cambridge University Library

As the final year of the Darwin Correspondence Project loomed, we wanted to make sure we celebrated the creation of a data set almost fifty years in the making as well as the scholarly achievement of the print volumes. Thus was born Hack Darwin! It was intended to expand awareness of our data beyond its usual audience, and provide an opportunity for innovation and creativity. We hoped to provide a stimulating and inventive experience for the participants and to enhance the legacy of the Darwin Correspondence Project by investigating and showcasing future possibilities for the Project’s data.

To this end we sent out a general invitation for applications from interested participants of any discipline. We received a wonderful variety of responses from archivists, historians, mathematicians and programmers, just to name a few. We invited 16 to join us, of whom 13 were able to come on the day. Each participant received a complete set of our files in advance: all the letters, bibliography entries, and biographies. When they arrived in Cambridge we further briefed them on what to look for in the data and demonstrated some website features showcasing what we had already accomplished.  We put on a display of treasures of the Darwin archive to inspire them with the physical items our data is based on. The participants divided themselves into three groups and were tasked with coming up with a project they could complete in a day and a half.

After an intensive full day of work, and a half day mainly devoted to honing their presentations, the hackathon participants presented three entirely different strategies for finding new ways into the Darwin correspondence.

Group 1 proposed using the existing data as a way to identify the sex and occupation of an unknown recipient of a Darwin letter based on the language in a letter. They proposed using Natural Language Processing to assess the letters sent to known correspondents along with sex and occupation information coded into our name register entries. They found that using 60% of the data for training, they were able to achieve a high degree of accuracy on the remaining 40% of material.

Group 2 focussed mainly on enclosures: items that were enclosed with letters. These enclosures can be anything sent to Darwin in the post: newspaper clippings, butterfly wings, photographs, or even beard hair! The group created a ‘typology of enclosures’, using standard archival terminology to describe the variety of items classed as enclosures in the Darwin data. In addition they created a search engine specifically to look for enclosures, so each item could be seen in the context of its letter.

Group 3 proposed quite a different method for allowing a new entry point to the data for the casual reader. They ran the correspondence though a sentiment analysis engine and created an ‘ask Darwin’ search interface. Searching for a term would result in a coloured graph showing how positively or negatively Darwin or his correspondents felt about that topic. It could be overlayed with key timeline events in Darwin’s life for context.

All three of these proposals have the potential to be valuable strategies for presenting the Darwin correspondence data – further, they all have the potential to be applied to other historical data sets, whether the wider data set of our nineteenth century science letters platform Epsilon or other sets of correspondence similarly marked up in TEI XML. Moreover, it was a wonderfully intense creative exercise, bringing together very different people from across academia and industry to celebrate the hard-won achievements of the Darwin Correspondence Project.

What's  next?

All three final presentations are available on our Vimeo channel. Any code or other files developed for the hackathon will be publicly available on a Git repository.