Modern Statistical Thinking for Biologists

Modern Statistical Thinking for Biologists (online 19 Sep - 19 Dec 2024)

This course will build up your statistical thinking skills, in a way that does not require lots of maths. We will primarily work with concepts from Bayesian statistics, which beginners often find more intuitive than classical approaches.

Buy €390,00

A course in statistics is one of the most ubiquitous elements of training for researchers in biology and biomedicine. Despite this, many scientists struggle enormously when they need to analyse their own data. At its worst, data analysis becomes nothing more than a dull exercise in pressing buttons in a statistics package, with constant nagging doubt as to what the buttons really do and whether they are the right buttons to press.

These difficulties are partly linked to the way many introductory statistics courses are taught, with focus on memorising lists of tests rather than on conceptual understanding, and with few opportunities to practice on real-life data sets. The problem is compounded by biomedicine’s unhealthy fixation on P-values – a concept that is so unintuitive that it is frequently misunderstood and misapplied.

In this course, we will focus on obtaining a good conceptual understanding of common types of analyses, and will apply them to real data using R. The maths will be kept to an absolute minimum. As a result, the course has no pre-requisites in terms of maths or statistics skills.

In addition, most introductory courses are based on a framework referred to as frequentist statistics. This framework is focused on tools like P-values, confidence intervals, t-tests... Although we will discuss such classical methods as well, we will put more emphasis on an alternative methodology called Bayesian statistics. Bayesian statistics is a powerful approach to data analysis that is becoming more and more common in biology and biomedicine. It may sound like a very advanced and complicated topic but in reality, students often find it easier to learn than more traditional approaches, as it more closely follows the way that scientists intuitively think about their research questions. Getting a good grasp of the basics of statistical thinking using Bayesian tools should also make it much easier to learn frequentist concepts afterwards. At the end of the course, we will indeed see how to transition from Bayesian to frequentist methods, should you wish to do so.

We will learn using a combination of short lectures, group discussions and hands-on activities on real data. There will also be weekly individual assignments. The assignments are a crucial component of the course because you will receive individual written feedback each time, to keep track of your progress. In addition, we will dedicate one session to a journal club where we will practice reading real scientific papers that use Bayesian methods. Finally, two units have been set aside for projects. In the first project, the students will work in groups to apply the methods they have learned to a new dataset supplied by the instructor. In the second project, the students will work individually to analyze their own data, and will receive feedback from the instructor.

Although this course is open to students and researchers from all the natural and social sciences, we will focus on data sets and types of analyses that are particularly relevant to biology and biomedicine.

After completing this course, you will be able to…

…use data visualisation and summarisation

...recognise different types of data (e.g. counts or percentages) and navigate the difficulties inherent to the analysis of each type

…draw inferences based on your sample data. For instance, if in a sample of 50 individuals, 10 were infected with COVID-19 at some point during the last year, then what can you conclude about the prevalence of COVID-19 in the whole population? And how much confidence can you have in this conclusion?

…use regression modelling to study relationships between variables.

…understand what is meant by Bayesian statistics, and how this differs from classical statistics.

…interpret P-values appropriately, and avoid common pitfalls associated to the use of P-values.

...grasp the concept of a statistical test.

Format: weekly 2.5-hour Zoom sessions on Thursdays, from 4pm to 6.30pm Paris time.

Pre-requisites: The course requires very basic skills in R. If you have no previous R experience, you should complete this free online R course, or a similar course, prior to the first session. The free course shouldn’t take you more than a day or so to do, and comes with a discussion forum, where you can always ask for help.

15 spots are available on a first-come first-served basis. The course content is under constant development, and so the final syllabus may differ slightly from that shown here. If you have any questions, don’t hesitate to drop me a line on [email protected].

This is the pre-course seminar that took place on 30 June 2023 prior to the first edition of this course. Although some aspects of the curriculum have changed in this edition to make the course a bit more compact (you can see the up-to-date syllabus below), it may still make for useful viewing. Click on the full-screen button at the bottom right to see the video in a larger format.

Course curriculum

1. Welcome!
2. Try out the discussion forum
3. Practical details
4. Installing R
5. Installing RStudio
6. Just to get to know you a bit better...
7. Entry quiz explanation
8. Entry quiz (to be completed by Sep 12)
9. Text book
10. TO DO FOR THE FIRST SESSION: Download data files
1. Data detectives: why are these graphs misleading?
2. Why is data visualisation important?
3. Case study: Describing the properties of human genes.
4. Mean and median: what is a typical value for this variable?
5. A few common types of graphs.
6. Standard deviation, variance, interquartile range: how much variability is there around the typical value?
7. EXERCICE: First group activity
8. EXERCICE: Practice summary statistics
9. Unit 1 code
10. Unit 1 slides
11. Recording 19 Sep 2024
12. Extra materials 1
13. TO DO for next week
14. Quiz 1
15. Assignment 1
1. Biological data comes in many flavours.
2. Representing counts through discrete probability distributions.
3. Representing other kinds of variables through continuous probability distributions.
4. EXERCICE: match each variable with its description!
5. Unit 2 slides
6. Unit 2 code
7. TO DO FOR NEXT WEEK: possums data file
8. Quiz 2
9. Assignment 2
10. Extra materials 2
1. What do we mean when we talk about "estimating a parameter"?
2. Case study: What is the sex ratio in possums?
3. EXERCICE: Formulating different priors
4. A bit of history: what is "Bayesian" statistics and why is it usually not taught in introductory courses?
5. TO DO FOR NEXT WEEK: download female heights
6. TO DO FOR NEXT WEEK: install packages
7. Unit 3 slides
8. Recording 3 Oct 2024
9. Unit 3 code
10. Extra materials 3
11. Quiz 3
12. Assignment 3
1. Case study: How tall is the typical American woman? And how much variability do we expect around that typical value?
2. Thinking about our problem as a model.
3. Markov Chain Monte Carlo: a clever tool for estimating parameters.
4. EXERCICE: model another variable
5. Quantifying our uncertainty about the likely parameter values.
6. EXERCICE: posterior predictive distribution (PPD)
7. Unit 4 code
8. Unit 4 slides
9. Recording 10 Oct 2024
10. Extra materials 4
11. Quiz 4
12. Assignment 4
13. Recording 17 Oct 2024
14. Quiz 4 (II)
15. Assignment 4 (II)
1. Case study: can we predict a person's weight from their height?
2. Predicting new data from our model
3. EXERCICE: changing the model
4. Recording 24 October 2024
5. Unit 5 slides
6. Unit 5 code
7. TO DO FOR UNIT 6: download data
8. Quiz 5
9. Data for men and women
10. Assignment 5
1. Case study: What determines the price of LEGOs?
2. Data detectives: what's wrong with this graph?
3. Dealing with numerical and categorical predictor variables.
4. Interactions between predictors
5. EXERCICE: model LEGO prices
6. EXERCICE: adding the number of pieces
7. Recording 31 October 2024
8. Unit 6 slides
9. Extra materials 6
10. TO DO FOR UNIT 7: download data files
11. Unit 6 code
12. Quiz 6
13. Assignment 6
1. Case study: Gene expression in patients with schizophrenia
2. Case study: Predicting COVID-19 survival
3. EXERCICE: predict the probability of passing away from COVID-19
4. Recording 7 November 2024
5. Unit 7 slides
6. Unit 7 code
7. Quiz 7
8. Assignment 7
9. Recording 14 Nov 2024
10. Extra materials 7
11. Assignment 7 (II)
12. Quiz 7 (II)
1. Why build several models for the same problem?
2. Comparing between models.
3. EXERCICE: Comparing between models
4. Recording 21 Nov 2024 Part I
5. Recording 21 Nov 2024 Part II
6. Unit 8 slides
7. Quiz 8
8. Assignment 8
9. Extra materials 8
1. Methods for complex multilevel data sets
1. Reading and discussing real research papers that use Bayesian methods.
2. Recording 28 Nov 2024
3. Assignment 9
1. The students apply the concepts and methods learned to a new data set.
2. Group 1 project
3. Group 2 project
4. Group 3 project
5. Recording 5 Dec 2024
6. Assignment 10
1. What is frequentist statistics?
2. EXERCICE: obtain the sampling distribution through simulation
3. Standard errors and confidence intervals.
4. EXERCICE: Central Limit Theorem
5. P-values, and why they are useful. P-values, and why they are dangerous.
6. What is a statistical test? Types of statistical tests.
7. EXERCICE: more practice
8. EXERCICE: Practice different strategies for comparing between group means
9. Recording 12 December 2024
10. Unit 11 code
11. Unit 11 slides
12. Quiz 11
13. Assignment 11
14. Recording 19 December 2024
15. Extra materials 11
1. Application of concepts and tools learned to your own data
2. Feedback
3. Your thoughts on the course
4. Assignment 12 (final project)

About this course

€390,00
135 lessons
38 hours of video content

For payment via invoice, please get in touch on [email protected]