Text Mining: Opportunities and Tools

Nov 27, 2014 · London, United Kingdom
At British Library Labs, we want to bridge the gap between your research ideas and our data, and enable you to study the digital works we hold with novel and useful techniques. This one-day forum will focus on text and data mining. It will bring together experts developing tools and services with those who might want to use these techniques in their research. This event is an important opportunity to influence the approach that the British Library uses to support text and data mining in the coming years. By the end of the day, we hope to have a better understanding of: what everyone’s needs are, what the Library could be doing to support text and data mining, what accommodations tools need to make for non-experts, what basic skills and tools are necessary, and how to use these techniques in research. The first half of the day will consist of short overviews on a variety of text and data mining projects. This is a mixture of exciting projects from outside the British Library as well as work we have been directly involved in.
After lunch, the day will change format to encourage active discussion. The starting topics for discussion will be about what needs to change to make these computational techniques an easy choice for any researcher, what minimal skills they might need to do interesting research with them and what British Library can do to support them when they work with data we hold. We want to learn how we should be providing access to data, what we will need to do to the data to make it easy to work with, who we should be engaging with and what initiatives we should be following.
AttendanceYou should be either be experienced in using text and data mining or have an interest in wanting to use it in your research.
Time0930 - 1630
LocationThe British Library, Centre for Conservation, Foyle Suite*, London, NW1 2DB*The Foyle Suite is at the back of the British Library in London, in the Centre of Conservation. Come through the main entrance and follow the signs to the Centre of Conservation. If you plan to visit the exhibition at the Library, you will need to pick up a Visitors Pass at the front Information desk when you enter the Library building (Say you are part of the 'Text Mining event' organised by Mahendra Mahey).

0930 - 0955Coffee, registration and networkingFoyle Suite, Centre of Conservation, British Library
0955 - 1000Introduction and welcomeBen O’Steen, Technical Lead of British Library LabsBen will welcome all our delegates and give a brief over view of the purpose of the event.
Text Mining Projects and Tools(please note the morning session will be filmed)
1000 - 1010Mining the History of Medicine and Mining BiodiversitySophia Ananiadou, Professor, University of Manchester and Director, National Centre for Text MiningSophia will briefly talk about NaCTeM's recent work on Mining the History of Medicine (http://www.nactem.ac.uk/MHM/). NaCTeM are mining the London-area Medical Officer of Health reports (digitised by the Wellcome Library). She will also talk about Mining Biodiversity (http://www.nactem.ac.uk/DID-MIBIO/), which is enriching the Biodiversity Heritage Library (BHL) library with semantic metadata, developing crowdsourcing facilities using their platform Argo, among other features such as automatic correction of OCR errors in text.
1010 - 1020Edinburgh Text Mining and Geoparsing in Digital HumanitiesClaire Grover, Senior Research Fellow in Text Mining, University of EdinburghClaire will present some of the work they have been doing on historical and literary text in two projects, Trading Consequences and Palimpsest. In both these projects they use text mining to identify named entities and geoparsing to georeference place names mentions.
1020 - 1030Wrinkles in the dataDavid King, Researcher in Natural Language Processing, Open UniversityDavid will present a brief overview of his work with biodiversity legacy literature, focusing on the data cleaning issues that have to be addressed before a researcher can engage in a meaningful analysis of the literature.
1030 - 1040Contentmine - see http://contentmine.orgPeter Murray-Rust, Reader Emeritus in Molecular Informatics, Unilever Centre, Dep. Of ChemistryThe Content mine is using machines to extract 100,000,000 facts from the scientific literature. 
1040 - 1050Cross Publisher Text and Data Mining with DOIs and CrossRefJoe Wass, Cross RefCrossRef's text and data mining API allows researchers to download articles for TDM without having to worry about where they're published. This talk will explain DOIs and give a live demo of how to use the API, for more information see: http://tdmsupport.crossref.org
1050 - 1100Using GATE for semantic enrichment and enhanced search in digital collections. Mark Greenwood, Research Associate, University of SheffieldThis talk will present a brief overview of the open-source GATE infrastructure for text mining and semantic search. The focus will be on practical examples of semantic enrichment through thesauri, term and entity recognition, and semantic annotation based on Linked Open Data. Semantic search interfaces developed within projects with the British Library, the UK National Archives, and FERA will also be presented.
1100- 1130Coffee Break
British Library Text Mineable Data and Projects
1130- 1145What text mineable data does the British Library Labs have?Ben O'Steen, Technical Lead, British Library Labs
1145 - 1155Digging into the Web Archive at the British LibraryAndrew Jackson, Technical Lead, Web Archive, British LibraryAndrew will talk briefly about some of the work he has been doing in the area of text mining and the British Library's UK Web Archive. 1155 - 1205Text and data mining TV and Radio at the LibraryLuke McKernan, Curator News and the Moving Images 1205 - 1215Tentative steps towards mining PhD thesesSara Gould, Development Manager, British LibrarySara will talk about the Library’s recent participation in a national project to mine chemical compounds from the pages of PhD theses, describe some of the challenges in accessing theses for Text and Data Mining, and invite participants to ‘have a go’ at mining theses for new research purposes.
1215 - 1225Metadata collections at the British LibraryNeil Wilson, Head of Collection Metadata, British LibraryNeil will give a brief overview of some of the kinds of metadata the British Library has that could be used for text and data mining.
What researchers want...
1225 - 1230Searching for Readers in the First World WarFrancessca Benatti, Researcher in Digital Humanities, Open UniversityThe increased availability of digitised textual and visual information on the First World War offers researchers who study the history of reading unprecedented opportunities. For the first time, we can attempt to study the reading experiences of WW1 participants by performing large-scale examinations of thousands of sources through 'distant reading' techniques. Francessca will talk about why she wants to discover what soldiers read, why text mining can help, and what problems can arise (especially the question of how representative digitised sources are)..
Delegates will break for lunch and activities after are designed for researchers to continue to communcate what they would like from text mining and to have discussions with text mining experts attendinf the event.1230 - 1300Lunch 1300 - 1400Breakout Session 1Led by Ben O’Steen and Mahendra MaheyDelegates will be asked to work in groups and work together to answer the following questions below. A person from the group will report back the group's responses.
What are the gaps?

Do you have the skills you need?
Do you have access to the tools you need?
Do you have access to data you can use?
Are you confident on publishing your findings? Attribution or citation? reproducibility?
Can you refine the data?
What is the gap between useful tools and interesting datasets?
What are the key barriers?

1400 - 1430Results of Session 1Led by Ben O’Steen and Mahendra MaheyA representative from each group will report back their answers to the questions above.
1430 - 1450Coffee Break1450 - 1550 Breakout Session 2Led by Ben O’Steen and Mahendra MaheyDelegates will be asked to work in groups and work together to answer the following questions below. A person from the group will report back the group's responses.
How do we fill those gaps?

Who is doing a great job of this already? Who is exemplary at providing:


How do we overcome the key barriers?
What role should the British Library (and other content holders) play in Text- and Data-Mining (TDM)?
What main recommendations would you make to the British Library in order to better support research?

research using TDM?
research into TDM techniques?

1550 - 1610Results of Session 2Led by Ben O’Steen and Mahendra MaheyA representative from each group will report back their answers to the questions above.
1610 - 1630Form conclusions and recommendations to share.
1630*Finish (please note we will finish promptly at 1630 as there will be another event in the room straight after)
*Optional events after the eventSome of you may be interested in visiting the Terror and Wonder - The Gothic Imagination exhibition. If you are, please let us know on the booking form and we can arrange attendance to the exhibition as a thank you for attending.
1630 - 1730Terror and Wonder - The Gothic imagination exhibitionCost: Free to attendees of the text mining event
For more information, please visit the following page
1900 - 2030
Afrika Bambaataa: 40 Years of Hip HopConference Centre, British LibraryPrice: £10/£8/£7 In this conversation with broadcast music journalist Jacqueline Springer he explores the past, present and future of a street culture that became a global phenomenon.
Book tickets via the British Library box office
