Organising Data: What Open Data Can Do

2 August 2021

James Benstead, Edinburgh Napier University

In this article I’m going to write about organising the data from the project on the Scottish War Books Boom, and how working with Open Data principles can identify new connections among the material and open up new avenues for scholarship.  The overall findings to date are discussed in a parallel article in this month’s TMR.  

I came to the project at the point when data had been collected by the previous research assistant Louise Bell.  She trawled the Bookman, a literary magazine founded by the Scottish journalist W.R. Nicoll, which ran from 1891 to 1934 and published lists of publications, advertisements, reviews, illustrations, and gave a general account of the literary sphere of the moment.[1]  We took a maximalist view of what might be considered as Scottish in this context, bearing in mind the difficulties of establishing the national identities of authors, and the myriad potential connections by ancestry, military service, domicile, and other means.  This initial trawl through the periodical established a small corpus based on (apparent) author identity, and so we expanded the dataset to include reviews of war books in Scottish periodicals.  This enables us also to see how works that explicitly address Scotland, or are by Scottish authors, sit among the more general Scottish response to the First World War in the newspapers and periodicals of the era, and the influence of the review response in shaping the boom.  This gives a picture of the reception of the literary response to the conflict not only in the metropolitan centres of Glasgow and Edinburgh, but throughout the country.

Once the data had been collected, I organised it on the principles suggested by Sir Tim Berners-Lee in the 5-star Open Data project, a series of steps that data providers can take in order to make their data more accessible and useful.  The initial steps, such as making data available on the internet and providing that data in a structured, non-proprietary format such as CSV (comma-separated values, a widely-used data exchange format), are relatively straightforward to implement. Later steps, such as presenting data in the Resource Description Framework (RDF) format or linking RDF data to other data sources (and allowing other data sources to link to your data) require more technical expertise. Our intention is that our dataset will eventually be available as Open Data – that is, freely available without copyright restrictions and in an easily accessible format – that can be linked to other datasets that have been shared on the internet. Far from being an abstract technical exercise, this approach to working with data encourages researchers to ask new questions that can have an increased scope and depth.

The first step was to arrange the data in a spreadsheet that could be easily filtered, sorted, and queried in response to research questions. Many of the initial research questions of the project could be answered in this way, such as ‘in each year under consideration in our project, how many of our texts are included in our dataset because their author (or subject matter, publisher, translator, reviewer, etc.) is Scottish’. However, using Open Data principles to link our data to the data that is available at wikidata.org gives us access to background information on every author in our dataset and allows us to make further connections that would otherwise have been laborious and/or difficult to establish. 

To work through one particular example, we already know that John Charteris is the author of the 1929 book Field-Marshal Earl Haig, published by Cassell & Co, because we collected this data ourselves. But once we have linked our dataset to wikidata.org, we will also know (amongst many, many other things) that Charteris was born in Glasgow, that he lived from 1877 to 1946, that he spoke German and French and studied mathematics and physics at the University of Göttingen between 1892 and 1893, and that he served as the Unionist Party MP for Dumfriesshire between 1924 and 1929. Moreover, connecting our dataset to the data that is available at wikidata.org automatically connects it to all of the datasets that wikidata.org is itself connected to (and to the datasets connected to those datasets, and so on). In fact, once we have linked our dataset to wikidata.org, the information we then have access to regarding the languages spoken by Charteris is itself taken from the Oxford Dictionary of National Biography, which itself provides an open dataset which wikidata.org has linked to. 

All of this data is available to query as easily as the data in our own dataset, meaning that rather than being limited to simple questions such as ‘in what year during the (Scottish) War Books Boom were the most books published’, we can ask richer questions such as ‘how many of the authors in our dataset spoke a European language that wasn’t English’, or ‘how many of our authors were educated in mainland Europe’. Similarly, by linking our data to large bibliographic datasets such as Worldcat we can locate each book in our dataset within the context of its author’s overall output, facilitating questions such ‘did the authors who contributed to the Scottish War Books Boom go on to write further books?’, or ‘had those authors had books published before they contributed to the Scottish War Books Boom, or was their contribution usually the first work they had had published?’. The way in which linked open data makes it possible to answer questions such as these pushes back the limitations within which researchers frame their research questions, and means that researchers can be more ambitious in the scope of their projects.


Source:

[1] H.C.G. Matthew, ‘Nicoll, Sir William Robertson (1851-1923)’, Oxford Dictionary of National Biography<https://www.oxforddnb.com/view/10.1093/ref:odnb/9780198614128.001.0001/odnb-9780198614128-e-35236> [accessed 5 July 2021]; see also Margaret D. Stetz, ‘Internationalizing Authorship: Beyond New Grub Street to the Bookman in 1891’, Victorian Periodicals Review, 48.1 (2015), 1-14.

Cover image: The LOD Cloud Diagram, Wikimedia Commons

Comments are closed.

Blog at WordPress.com.

Up ↑

Create your website with WordPress.com
Get started