+ - 0:00:00
Notes for current slide
Notes for next slide

Data analysis in the era of reproducible and open science

Daniel Witte

Luke Johnston

1 / 15
  • Openness
  • Transparency
  • Quality: reproducibility
  • Collaboration / Team work
  • Communication
2 / 15

We are in the middle of an exponential growth curve:

  • Data production
  • Data storage and transfer
  • Computing power
  • Published research
  • Complexity of methods
3 / 15
  • Industrialisation of the research work flow
  • Specialisation in research tasks

4 / 15

5 / 15
  • Changes in the way we work:

    • Remote work
    • Online communities
    • Ad hoc teams
  • Research on research:

    • Meta analysis (your output is somebody else's input)
    • Metaresearch: evidence based development of research methods
6 / 15

There are still very strong barriers

  • Tools needed
  • Tradition, culture and common practices need to change
  • Researchers need to see the value in adopting an open, reproducible workflow
  • Training and reward systems need to be adapted:
    • Publication
    • Academic recognition / careers
    • Research funding mechanisms
  • Law: privacy concerns about sharing data, IP protection, patents, etc
7 / 15

Current scientific culture not prepared for analytic and computation era

8 / 15

Open science debates and initiatives don't recognize role of software

E.g. EU H2020 Open Science Mandate only mentions data and publications.

9 / 15

Little to no training in software or programming

Source from xkcd.

10 / 15

But, data analysis in science is evolving quickly

11 / 15

What does it mean for you?

12 / 15

What does it mean for you?

  • Find and collaborate with those familiar with these concepts (online and/or in real life)
12 / 15

What does it mean for you?

  • Find and collaborate with those familiar with these concepts (online and/or in real life)

  • Cite research that is or tries to be more reproducible

12 / 15

What does it mean for you?

  • Find and collaborate with those familiar with these concepts (online and/or in real life)

  • Cite research that is or tries to be more reproducible

  • Keep the principles of reproducibility in mind, then find the tools

12 / 15

What does it mean for you?

  • Find and collaborate with those familiar with these concepts (online and/or in real life)

  • Cite research that is or tries to be more reproducible

  • Keep the principles of reproducibility in mind, then find the tools

  • Practice reproducible and open science

    • More on this later in session
12 / 15

Recognize importance of code and data: Cite them!

13 / 15

Recognize importance of code and data: Cite them!

# Example:
citation("dplyr")
##
## To cite package 'dplyr' in publications use:
##
## Hadley Wickham, Romain François, Lionel Henry and Kirill Müller
## (2019). dplyr: A Grammar of Data Manipulation. R package version
## 0.8.0.1. https://CRAN.R-project.org/package=dplyr
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {dplyr: A Grammar of Data Manipulation},
## author = {Hadley Wickham and Romain François and Lionel Henry and Kirill Müller},
## year = {2019},
## note = {R package version 0.8.0.1},
## url = {https://CRAN.R-project.org/package=dplyr},
## }
13 / 15

Comment: True reproducibility is very difficult

  • Requires self-contained virtual environment
  • With exact package versions and operating system used
  • Tools to do this include:
    • Docker virtual containers [1]
    • Continuous intergration with [Travis] [2]
14 / 15

Comment: True reproducibility is very difficult

  • Requires self-contained virtual environment
  • With exact package versions and operating system used
  • Tools to do this include:
    • Docker virtual containers [1]
    • Continuous intergration with [Travis] [2]

...but this is not the goal, nor should it be

  • Tools to simplify this are being developed
    • Keep eye out

[1] Want more info, see this tutorial.
[2] For easier integration and use of Travis in R, see the travis package.

14 / 15

This is or will be the future. Be prepared.

(...i hope...)

15 / 15
  • Openness
  • Transparency
  • Quality: reproducibility
  • Collaboration / Team work
  • Communication
2 / 15
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow