Master Statistics for Data Analysis
10 weeks · 5 milestones
Apply descriptive statistics, hypothesis testing, regression, and A/B test design to real datasets — every milestone requires working with real data and documenting the decisions, not just the calculations.
Milestone map
Milestone map
0 of 5 done
Current milestone
Describe a real dataset with statistics
1 week. The computation takes a day. The interpretation takes the rest of the week — most people underestimate how hard it is to say what numbers mean rather than just what they are.
Take any real dataset with at least 5,000 rows and at least 4 numeric columns. For each numeric column compute the mean, median, standard deviation, and identify outliers using the IQR method. Then write a one-paragraph interpretation of each column — not the numbers, what the numbers MEAN about the thing being measured. Use Python with NumPy and pandas, or R. No spreadsheets.
Proof required
Share a Jupyter notebook or Python script (GitHub link) that loads the dataset, computes the descriptive statistics, and prints the results. Share a screenshot of the output. Write 200 words on one column where the mean and median differ significantly — what does that gap tell you about the distribution, and what would be misleading about reporting only the mean?
What gets checked
- Outlier detection uses IQR method (Q1 - 1.5×IQR, Q3 + 1.5×IQR) — the code must show the explicit calculation, not a z-score cutoff or eyeballed filter
- The mean/median interpretation names the specific column, states both values, and explains what the gap reveals about the shape of the distribution — not just 'they are different'
- The dataset has at least 5,000 rows and is real — not generated or invented; a CSV import from a public source with the data provenance noted in the notebook
Run and interpret a hypothesis test
Build and evaluate a regression model
Design and analyse an A/B test
Communicate findings to a non-technical audience
Sign in to start this outcome and track your progress publicly.
Sign in to start this outcome →