+ - 0:00:00
Notes for current slide
Notes for next slide

Introduction to the course

Reproducibilty and open science

Luke Johnston

1 / 16

Outline

  • What is reproducibility
  • What is open science
    • Open access
    • Open data
    • Open source (code)
  • What is R and why learn it
  • Course introduction
    • What it is (and isn't)
    • Layout and website
    • Expected learning outcome
    • Code of Conduct
2 / 16

Question:

How many know or have heard or know about reproducibility?

3 / 16

Question:

How many know or have heard or know about open science?

4 / 16

Question:

...or even open access, open data, or open source?

5 / 16

How many have read a method in a paper and wondered how they actually did it?

6 / 16

Because you are trying to do the same or similar?

Often, (way) more is done than shown in "Methods"

Source: Kim, Poline, and Dumas [1]

7 / 16

Has anyone ever received confusing code? Or maybe have written your own confusing code?

Source: PhD Comics

8 / 16

These issues can be fixed by creating and nurturing a culture of openness

9 / 16

Code sharing: From scientific principle of "reproducibility"

... often confused with "replicability" [2]1

10 / 16

Code sharing: From scientific principle of "reproducibility"

... often confused with "replicability" [2]1

Replicability

  • Repeating a study by independently performing another identical study
  • Difficult, usually needs funding
  • Linked to the "irreproducibility crisis"2 (covered later)
10 / 16

Code sharing: From scientific principle of "reproducibility"

... often confused with "replicability" [2]1

Replicability

  • Repeating a study by independently performing another identical study
  • Difficult, usually needs funding
  • Linked to the "irreproducibility crisis"2 (covered later)

Reproducibility

  • Generating the exact same results when using the same data and code
  • Should be easy right? Wrong, often just as hard
  • Question: If we can't even reproduce a studies results, how can we expect to replicate it?
  1. Also from a American Statistical Association statement.
  2. Or rather "irreplicability crisis".
10 / 16

Biomedical studies almost entirely don't publish code with the published paper

  • Very few papers provide code [3; 4]
    • Except bioinformatics, about 60% of studies do
11 / 16

Biomedical studies almost entirely don't publish code with the published paper

  • Very few papers provide code [3; 4]
    • Except bioinformatics, about 60% of studies do
  • Example: In epidemiology, of 90 articles reviewed [5]:
    • 43 (48%) did not report how data was processed
    • 21 (24%) did not report how analysis was conducted
    • 0 made code available in any way
11 / 16

Biomedical studies almost entirely don't publish code with the published paper

  • Very few papers provide code [3; 4]
    • Except bioinformatics, about 60% of studies do
  • Example: In epidemiology, of 90 articles reviewed [5]:
    • 43 (48%) did not report how data was processed
    • 21 (24%) did not report how analysis was conducted
    • 0 made code available in any way
  • Why? Likely due to:
    • Lack of awareness and training
    • Difficulty of adoption
    • No incentive or reward
    • Little to no culture to do it
11 / 16

Open science: Terms and meanings

Term Meaning
Open science Freely available, openly licensed * material for all things related to scientific activity
Open access Free, unrestricted, publicly available published articles
Open data Freely available, re-usable, openly licensed data
Open source/code Freely available, re-usable, openly licensed scientific code used in generating results

* We'll cover licenses later.

12 / 16

Goal of this course? Start changing the culture by providing the training.

13 / 16

Course website and layout:

14 / 16

Course website and layout:

What we aim to teach (and you learn)

  • Recognize importance of reproducibility and open science
  • Know and be aware of (some of) the modern tools to use
  • Know a general workflow for doing analyses
  • Know in general how to navigate and code in R
  • How to generally think about data
14 / 16

Course website and layout:

What we aim to teach (and you learn)

  • Recognize importance of reproducibility and open science
  • Know and be aware of (some of) the modern tools to use
  • Know a general workflow for doing analyses
  • Know in general how to navigate and code in R
  • How to generally think about data

What you won't learn

  • No (or minimal) statistics or modelling
  • No writing skills
  • No study or experimental design
  • No spreadsheet or large databases (e.g. SQL)
14 / 16

Other considerations

  • We have a Code of Conduct
  • Asking for help is as easy as using a sticky on your laptop
    • We have lots of helpers!
  • We're all learning here, this is a supportive and safe environment
15 / 16

Other considerations

  • We have a Code of Conduct
  • Asking for help is as easy as using a sticky on your laptop
    • We have lots of helpers!
  • We're all learning here, this is a supportive and safe environment

Resources and further reading

  • Best practices see [6; 7]
  • Case studies and lessons for doing reproducibility
  • Case study in Bioinformatics see [1]
15 / 16

References 1

[1] Y. Kim, J. Poline, et al. "Experimenting with Reproducibility: A Case Study of Robustness in Bioinformatics". In: GigaScience 7.7 (Jun. 2018). DOI: 10.1093/gigascience/giy077.

[2] H. E. Plesser. "Reproducibility Vs. Replicability: A Brief History of a Confused Terminology". In: Frontiers in Neuroinformatics 11 (Jan. 2018). DOI: 10.3389/fninf.2017.00076.

[3] J. T. Leek and L. R. Jager. "Is Most Published Research Really False?" In: Annual Review of Statistics and Its Application 4.1 (Mar. 2017), pp. 109-122. DOI: 10.1146/annurev-statistics-060116-054104.

[4] E. C. Considine, G. Thomas, et al. "Critical Review of Reporting of the Data Analysis Step in Metabolomics". In: Metabolomics 14.1 (Dec. 2017). DOI: 10.1007/s11306-017-1299-3.

[5] R. D. Peng, F. Dominici, et al. "Reproducible Epidemiologic Research". In: American Journal of Epidemiology 163.9 (Mar. 2006), pp. 783-789. DOI: 10.1093/aje/kwj093.

[6] G. Wilson, D. A. Aruliah, et al. "Best Practices for Scientific Computing". In: PLoS Biology 12.1 (Jan. 2014). Ed. by J. A. Eisen, p. e1001745. DOI: 10.1371/journal.pbio.1001745.

[7] S. Eglen, B. Marwick, et al. "Towards Standard Practices for Sharing Computer Code and Programs in Neuroscience". In: bioRxiv (Mar. 2016). DOI: 10.1101/045104.

16 / 16

Outline

  • What is reproducibility
  • What is open science
    • Open access
    • Open data
    • Open source (code)
  • What is R and why learn it
  • Course introduction
    • What it is (and isn't)
    • Layout and website
    • Expected learning outcome
    • Code of Conduct
2 / 16
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow