The crisis in reproducible research & tools for a DRY data analysis workflow

Aug 9, 2018 · Brisbane City, Australia

Title: The crisis in reproducible research and tools for a DRY (Don't Repeat Yourself) data analysis workflow

Presenter: Peter Baker, School of Public Health, University of Queensland, Herston, Australia. [masked]


Ioannidis et. el (2009) estimated that over fifty percent of published papers in some fields of research are not reproducible. Many researchers in diverse fields believe there is a crisis in reproducible research. This is unlikely to be restricted to scientific research and numerous examples of irreproducibility can be found in the social sciences and economics.

The data analysis cycle starts a lot earlier than many researchers appreciate whether they are collecting primary data or users of secondary or tertiary data. Planning, study design and organising workflow before the first data point is obtained is crucial. Statistical considerations are paramount as are computational tools for managing the workflow of data analysis projects.

As a statistical consultant and collaborator I have been involved in the design and analysis of hundreds of studies. Also, since the early 90s, I've employed version control systems and Make to project manage data analysis using GENSTAT, BUGS, SAS, R and other statistical packages. As an early adopter of Sweave and R Markdown for reporting, I have found these approaches invaluable because, unlike the usual cut and paste approach, reports are reproducible. In addition to statistical issues I've encountered, my overall computational strategy will be briefly described and illustrated. For GNU Make pattern rules, preliminary R packages and examples see


Peter has worked as a statistical consultant and researcher in areas such as agricultural research, Bayesian methods for genetics, health, medical and epidemiological studies for thirty years including twenty years at CSIRO. He is a Senior Lecturer in Biostatistics at the School of Public Health, UQ where he also acts as a senior statistical collaborator and adviser to several research projects in the Faculty of Medicine.

Our Sponsor this month:

AARNet Pty Ltd (APL) is the not for profit company that operates Australia's Academic and Research Network (AARNet). The shareholders are 38 Australian universities and the CSIRO.

AARNet provides high capacity Internet and other communications services for the nation's research and education community, including universities, health and other research organisations, schools, vocational training providers and cultural institutions. AARNet serves over one million end users who access the network for teaching, learning and research.

For further information, please visit: (

Location and Time:

Our event space is sponsored by Thoughtworks:
We're a software consultancy and community of passionate, purpose-led individuals. We think disruptively to deliver technology to address our clients' toughest challenges, all while seeking to revolutionise the IT industry and create positive social change.

The event is at the Thoughtworks offices, level 19, 127 Creek Street, with official kick-off at 6:00 pm.

Event organizers

Are you organizing The crisis in reproducible research & tools for a DRY data analysis workflow?

Claim the event and start manage its content.

I am the organizer

based on 0 reviews