Topic 4: Statistics and Probability

Concepts

Essential understandings:
- Statistics is concerned with the collection, analysis and interpretation of quantitative data and uses the theory of probability to estimate parameters, discover empirical laws, test hypotheses and predict the occurrence of events. Statistical representations and measures allow us to represent data in many different forms to aid interpretation.
- Probability enables us to quantify the likelihood of events occurring and so evaluate risk. Both statistics and probability provide important representations which enable us to make predictions, valid comparisons and informed decisions. These fields have power and limitations and should be applied with care and critically questioned, in detail, to differentiate between the theoretical and the empirical/observed. Probability theory allows us to make informed choices, to evaluate risk and to make predictions about seemingly random events.
Suggested concepts embedded in this topic:
- Quantity, validity, approximation, modelling, relationships, patterns.
- AHL: Systems, representation.
Content-specific conceptual understandings:
- Organizing, representing, analysing and interpreting data, and utilizing different statistical tools facilitates prediction and drawing of conclusions.
- Different statistical techniques require justification and the identification of their limitations and validity.
- Approximation in data can approach the truth but may not always achieve it.
- Correlation and regression are powerful tools for identifying patterns and equivalence of systems.
- Modelling and finding structure in seemingly random events facilitates prediction.
- Different probability distributions provide a representation of the relationship between the theory and reality, allowing us to make predictions about what might happen.
- AHL: Statistical literacy involves identifying reliability and validity of samples and whole populations in a closed system.
- AHL: A systematic approach to hypothesis testing allows statistical inferences to be tested for validity.
- AHL: Representation of probabilities using transition matrices enables us to efficiently predict long-term behaviour and outcomes.

SL Content

SL 4.1
- Concepts of population, sample, random sample, discrete and continuous data.
- Reliability of data sources and bias in sampling.
- Interpretation of outliers.
- Sampling techniques and their effectiveness.
SL 4.2
- Presentation of data (discrete and continuous): frequency distributions (tables).
- Histograms.
- Cumulative frequency; cumulative frequency graphs; use to find median, quartiles, percentiles, range and interquartile range (IQR).
- Production and understanding of box and whisker diagrams.
SL 4.3
- Measures of central tendency (mean, median and mode).
- Estimation of mean from grouped data.
- Modal class.
- Measures of dispersion (interquartile range, standard deviation and variance).
- Effect of constant changes on the original data.
- Quartiles of discrete data.
SL 4.4
- Linear correlation of bivariate data.
- Pearson’s product-moment correlation coefficient, $r$ .
- Scatter diagrams; lines of best fit, by eye, passing through the mean point.
- Equation of the regression line of $y$ on $x$ .
- Use of the equation of the regression line for prediction purposes.
- Interpret the meaning of the parameters, $a$ and $b$ , in a linear regression $y=ax+b$ .
SL 4.5
- Concepts of trial, outcome, equally likely outcomes, relative frequency, sample space (U) and event.
- The probability of an event $A$ is $P(A)=\frac{n(A)}{n(U)}$
- The complementary events $A$ and $A'$ (not $A$ ).
- Expected number of occurrences.
SL 4.6
- Use of Venn diagrams, tree diagrams, sample space diagrams and tables of outcomes to calculate probabilities.
- Combined events: $P(A \cup B)=P(A)+P(B)-P(A \cap B)$
- Mutually exclusive events: $P(A \cap B)=0$ .
- Conditional probability: $P(A|B)=\frac{P(A \cap B)}{P(B)}$
- Independent events: $P(A \cap B)=P(A) P(B)$ .
SL 4.7
- Concept of discrete random variables and their probability distributions.
- Expected value (mean), $E(X)$ for discrete data.
- Applications.
SL 4.8
- Binomial distribution.
- Mean and variance of the binomial distribution.
SL 4.9
- The normal distribution and curve.
- Properties of the normal distribution.
- Diagrammatic representation.
- Normal probability calculations.
- Inverse normal calculations
SL 4.10
- Spearman’s rank correlation coefficient, $r_s$ .
- Awareness of the appropriateness and limitations of Pearson’s product moment correlation coefficient and Spearman’s rank correlation coefficient, and the effect of outliers on each.
SL 4.11
- Formulation of null and alternative hypotheses, $H_0$ and $H_1$
- Significance levels. $p$ -values.
- Expected and observed frequencies.
- The $\chi^2$ test for independence: contingency tables, degrees of freedom, critical value.
- The $\chi^2$ goodness of fit test.
- The $t$ -test.
- Use of the $p$ -value to compare the means of two populations.
- Using one-tailed and two-tailed tests.

AHL Content

AHL 4.12
- Design of valid data collection methods, such as surveys and questionnaires.
- Selecting relevant variables from many variables.
- Choosing relevant and appropriate data to analyse.
- Categorizing numerical data in a $\chi^2$ table and justifying the choice of categorisation.
- Choosing an appropriate number of degrees of freedom when estimating parameters from data when carrying out the $\chi^2$ goodness of fit test.
- Definition of reliability and validity. Reliability tests.
- Validity tests.
AHL 4.13
- Non-linear regression.
- Evaluation of least squares regression curves using technology.
- Sum of square residuals $(SS_{res})$ as a measure of fit for a model.
- The coefficient of determination ( $R^2$ ). Evaluation of R2R2 using technology.
AHL 4.14
- Linear transformation of a single random variable.
- Expected value of linear combinations of $n$ random variables.
- Variance of linear combinations of $n$ independent random variables.
- $\overline{x}$ as an unbiased estimate of $\mu$ .
- $s^2_{n-1}$ as an unbiased estimate of $\sigma^2$ .
AHL 4.15
- A linear combination of $n$ independent normal random variables is normally distributed. In particular, $X \sim N (\mu, \sigma^2) \rightarrow \overline{X} \sim N(\mu, \frac{\sigma^2}{n})$
- Central limit theorem.
AHL 4.16
- Confidence intervals for the mean of a normal population.
AHL 4.17
- Poisson distribution, its mean and variance.
- Sum of two independent Poisson distributions has a Poisson distribution.
AHL 4.18
- Critical values and critical regions.
- Test for population mean for normal distribution.
- Test for proportion using binomial distribution.
- Test for population mean using Poisson distribution.
- Use of technology to test the hypothesis that the population product moment correlation coefficient ( $\rho$ ) is 0 for bivariate normal distributions.
- Type I and II errors including calculations of their probabilities.
AHL 4.19
- Transition matrices.
- Powers of transition matrices.
- Regular Markov chains.
- Initial state probability matrices.
- Calculation of steady state and long-term probabilities by repeated multiplication of the transition matrix or by solving a system of linear equations.

PreviousTopic 3: Geometry and Trigonometry NextTopic 5: Calculus

Last updated 2 years ago

Concepts

Essential understandings:

Statistics is concerned with the collection, analysis and interpretation of quantitative data and uses the theory of probability to estimate parameters, discover empirical laws, test hypotheses and predict the occurrence of events. Statistical representations and measures allow us to represent data in many different forms to aid interpretation.
Probability enables us to quantify the likelihood of events occurring and so evaluate risk. Both statistics and probability provide important representations which enable us to make predictions, valid comparisons and informed decisions. These fields have power and limitations and should be applied with care and critically questioned, in detail, to differentiate between the theoretical and the empirical/observed. Probability theory allows us to make informed choices, to evaluate risk and to make predictions about seemingly random events.

Suggested concepts embedded in this topic:

Quantity, validity, approximation, modelling, relationships, patterns.
AHL: Systems, representation.

Content-specific conceptual understandings:

Organizing, representing, analysing and interpreting data, and utilizing different statistical tools facilitates prediction and drawing of conclusions.
Different statistical techniques require justification and the identification of their limitations and validity.
Approximation in data can approach the truth but may not always achieve it.
Correlation and regression are powerful tools for identifying patterns and equivalence of systems.
Modelling and finding structure in seemingly random events facilitates prediction.
Different probability distributions provide a representation of the relationship between the theory and reality, allowing us to make predictions about what might happen.
AHL: Statistical literacy involves identifying reliability and validity of samples and whole populations in a closed system.
AHL: A systematic approach to hypothesis testing allows statistical inferences to be tested for validity.
AHL: Representation of probabilities using transition matrices enables us to efficiently predict long-term behaviour and outcomes.

SL Content

SL 4.1

Concepts of population, sample, random sample, discrete and continuous data.
Reliability of data sources and bias in sampling.
Interpretation of outliers.
Sampling techniques and their effectiveness.

SL 4.2

Presentation of data (discrete and continuous): frequency distributions (tables).
Histograms.
Cumulative frequency; cumulative frequency graphs; use to find median, quartiles, percentiles, range and interquartile range (IQR).
Production and understanding of box and whisker diagrams.

SL 4.3

Measures of central tendency (mean, median and mode).
Estimation of mean from grouped data.
Modal class.
Measures of dispersion (interquartile range, standard deviation and variance).
Effect of constant changes on the original data.
Quartiles of discrete data.

SL 4.4

Linear correlation of bivariate data.
Pearson’s product-moment correlation coefficient, $r$ .
Scatter diagrams; lines of best fit, by eye, passing through the mean point.
Equation of the regression line of $y$ on $x$ .
Use of the equation of the regression line for prediction purposes.
Interpret the meaning of the parameters, $a$ and $b$ , in a linear regression $y=ax+b$ .

SL 4.5

Concepts of trial, outcome, equally likely outcomes, relative frequency, sample space (U) and event.
The probability of an event $A$ is $P(A)=\frac{n(A)}{n(U)}$
The complementary events $A$ and $A'$ (not $A$ ).
Expected number of occurrences.

SL 4.6

Use of Venn diagrams, tree diagrams, sample space diagrams and tables of outcomes to calculate probabilities.
Combined events: $P(A \cup B)=P(A)+P(B)-P(A \cap B)$
Mutually exclusive events: $P(A \cap B)=0$ .
Conditional probability: $P(A|B)=\frac{P(A \cap B)}{P(B)}$
Independent events: $P(A \cap B)=P(A) P(B)$ .

SL 4.7

Concept of discrete random variables and their probability distributions.
Expected value (mean), $E(X)$ for discrete data.
Applications.

SL 4.8

Binomial distribution.
Mean and variance of the binomial distribution.

SL 4.9

The normal distribution and curve.
Properties of the normal distribution.
Diagrammatic representation.
Normal probability calculations.
Inverse normal calculations

SL 4.10

Spearman’s rank correlation coefficient, $r_s$ .
Awareness of the appropriateness and limitations of Pearson’s product moment correlation coefficient and Spearman’s rank correlation coefficient, and the effect of outliers on each.

SL 4.11

Formulation of null and alternative hypotheses, $H_0$ and $H_1$
Significance levels. $p$ -values.
Expected and observed frequencies.
The $\chi^2$ test for independence: contingency tables, degrees of freedom, critical value.
The $\chi^2$ goodness of fit test.
The $t$ -test.
Use of the $p$ -value to compare the means of two populations.
Using one-tailed and two-tailed tests.

AHL Content

AHL 4.12

Design of valid data collection methods, such as surveys and questionnaires.
Selecting relevant variables from many variables.
Choosing relevant and appropriate data to analyse.
Categorizing numerical data in a $\chi^2$ table and justifying the choice of categorisation.
Choosing an appropriate number of degrees of freedom when estimating parameters from data when carrying out the $\chi^2$ goodness of fit test.
Definition of reliability and validity. Reliability tests.
Validity tests.

AHL 4.13

Non-linear regression.
Evaluation of least squares regression curves using technology.
Sum of square residuals $(SS_{res})$ as a measure of fit for a model.
The coefficient of determination ( $R^2$ ). Evaluation of R2R2 using technology.

AHL 4.14

Linear transformation of a single random variable.
Expected value of linear combinations of $n$ random variables.
Variance of linear combinations of $n$ independent random variables.
$\overline{x}$ as an unbiased estimate of $\mu$ .
$s^2_{n-1}$ as an unbiased estimate of $\sigma^2$ .

AHL 4.15

A linear combination of $n$ independent normal random variables is normally distributed. In particular, $X \sim N (\mu, \sigma^2) \rightarrow \overline{X} \sim N(\mu, \frac{\sigma^2}{n})$
Central limit theorem.

AHL 4.16

Confidence intervals for the mean of a normal population.

AHL 4.17

Poisson distribution, its mean and variance.
Sum of two independent Poisson distributions has a Poisson distribution.

AHL 4.18

Critical values and critical regions.
Test for population mean for normal distribution.
Test for proportion using binomial distribution.
Test for population mean using Poisson distribution.
Use of technology to test the hypothesis that the population product moment correlation coefficient ( $\rho$ ) is 0 for bivariate normal distributions.
Type I and II errors including calculations of their probabilities.

AHL 4.19

Transition matrices.
Powers of transition matrices.
Regular Markov chains.
Initial state probability matrices.
Calculation of steady state and long-term probabilities by repeated multiplication of the transition matrix or by solving a system of linear equations.