skip to content

Darwin Correspondence Project


Darwin Correspondence Project letter transcriptions have now been converted into TEI P5 XML (,  widely recognised as the industry standard for the markup of historical texts, with the metadata for the correspondence exchange encoded within the recently established ‘correspDesc’ element (‘correspondence description’). Using this format optimises the encoded texts for long-term compatibility with other corpora. The migration of our data into TEI P5 is the first phase in our long-term sustainability plan to enhance the quality of some of our oldest data, which are close to 30 years old. The second phase of this plan (scheduled for the first quarter of 2017), will see the public release of our Tagging and Editorial Guidelines.

The TEI P5 XML files are the primary versions of our transcriptions and are the source for both the print edition of The Correspondence of Charles Darwin, and online publication on this website. The files are ingested into an XTF (eXtensible Text Framework instance maintained by Cambridge University Library.  XTF is open source. The XTF instance provides two distinct output versions of the letters:

  • the XML version which is typeset in-house using Adobe InDesign
  • a web-service used for all queries/display of the letter metadata and transcriptions at

Our editorial data is held in a password protected git repository at, which provides for file management and version control systems.  The public sites are hosted on instances at Amazon Web Services, managed by Cambridge University Library’s ‘Digital Initiatives & Strategy’ team. The Project's website was relaunched in February 2016 and development has been undertaken in collaboration with Surface Impression (

Images of original letters from the Cambridge University Library collections are courtesy of Cambridge University Digital Library (