The Law of Large Numbers

What is the law of large numbers?

What is the normal distribution and why do we care?

\( \begin{align} f(x) & = & \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{(x - \mu)^2}{2 \sigma^2}} \end{align} \)

How can we use this to quantify confidence?

{% include figure id="two-tailed-test" cap="Two-Tailed Significance Test" fixme=true alt="FIXME" title="Normal curve overlaid on grid. Symmetric segments in the low and high ends of the normal curve are highlighted to show regions more than a certain distance from the cente fixme=true ." width="50%" credit="'Boundless Statistics', Lumen Learning, https://courses.lumenlearning.com/boundless-statistics/chapter/hypothesis-testing-one-sample/" %}

Student's t-distribution

\( \begin{align} s^2 & = & \frac{1}{n-1} \sum_{i=1}^{n}(X_i - \bar{X})^2 \ & = & \frac{\sum X_i^2 - n\bar{X}^2}{n - 1} \end{align} \)

How can we compare the means of two datasets?

from scipy.stats import ttest_ind

def main():
    # ...parse arguments...

    # ...read data and calculate actual means and difference...

    # test and report
    result = ttest_ind(data_left, data_right)
python bin/t-test.py --left ../hypothesis-testing/data/javascript-counts.csv --right ../hypothesis-testing/data/python-counts.csv --low 1 --high 200
Ttest_indResult(statistic=-269.67014904687954, pvalue=0.0)

{% include figure id="programmer-hours" cap="Programmer Hours (Weekday vs. Weekend)" alt="FIXME" title="A pair of vertical violin plots. The mean for weekday equals false is near 2.1 hours per day and the mean for weekday equals true is slightly above 7 hours per day. The profile for weekday equals false does not look normal, but the profile for weekday equals true looks more normal." fixme=true %}

python bin/weekends.py --data data/programmer-hours.csv
weekday mean 6.804375000071998
weekend mean 3.232482993312492
Ttest_indResult(statistic=12.815512046971827, pvalue=6.936182610195961e-31)

Higher standards