# Topic 4: Statistics and Probability

## Concepts

Essential understandings:

Statistics is concerned with the collection, analysis and interpretation of data and the theory of probability can be used to estimate parameters, discover empirical laws, test hypotheses and predict the occurrence of events. Statistical representations and measures allow us to represent data in many different forms to aid interpretation.

Probability enables us to quantify the likelihood of events occurring and so evaluate risk. Both statistics and probability provide important representations which enable us to make predictions, valid comparisons and informed decisions. These fields have power and limitations and should be applied with care and critically questioned to differentiate between the theoretical and the empirical/observed. Probability theory allows us to make informed choices, to evaluate risk, and to make predictions about seemingly random events.

Suggested concepts embedded in this topic:

Quantity, validity, approximation, generalization.

AHL: Change, systems.

Content-specific conceptual understandings:

Organizing, representing, analysing and interpreting data and utilizing different statistical tools facilitates prediction and drawing of conclusions.

Different statistical techniques require justification and the identification of their limitations and validity.

Approximation in data can approach the truth but may not always achieve it.

Some techniques of statistical analysis, such as regression, standardization or formulae, can be applied in a practical context to apply to general cases.

Modelling through statistics can be reliable, but may have limitations.

AHL: Properties of probability density functions can be used to identify measure of central tendency such as mean, mode and median.

AHL: Probability methods such as Bayes theorem can be applied to real-world systems, such as medical studies or economics, to inform decisions and to better understand outcomes.

## SL Content

SL 4.1

Concepts of population, sample, random sample, discrete and continuous data.

Reliability of data sources and bias in sampling.

Interpretation of outliers.

Sampling techniques and their effectiveness.

SL 4.2

Presentation of data (discrete and continuous): frequency distributions (tables).

Histograms.

Cumulative frequency; cumulative frequency graphs; use to find median, quartiles, percentiles, range and interquartile range (IQR).

Production and understanding of box and whisker diagrams.

SL 4.3

Measures of central tendency (mean, median and mode).

Estimation of mean from grouped data.

Modal class.

Measures of dispersion (interquartile range, standard deviation and variance).

Effect of constant changes on the original data.

Quartiles of discrete data.

SL 4.4

Linear correlation of bivariate data.

Pearson’s product-moment correlation coefficient, $r$ .

Scatter diagrams; lines of best fit, by eye, passing through the mean point.

Equation of the regression line of $y$ on $x$ .

Use of the equation of the regression line for prediction purposes.

Interpret the meaning of the parameters, $a$ and $b$ , in a linear regression $y=ax+b$ .

SL 4.5

Concepts of trial, outcome, equally likely outcomes, relative frequency, sample space ( $U$ ) and event.

The probability of an event $A$ is $P(A)=\frac{n(A)}{n(U)}$

The complementary events $A$ and $A'$ (not $A$ ).

Expected number of occurrences.

SL 4.6

Use of Venn diagrams, tree diagrams, sample space diagrams and tables of outcomes to calculate probabilities.

Combined events: $P(A \cup B)=P(A)+P(B)-P(A \cap B)$

Mutually exclusive events: $P(A \cap B)=0$ .

Conditional probability: $P(A|B)=\frac{P(A \cap B)}{P(B)}$

Independent events: $P(A \cap B)=P(A) P(B)$ .

SL 4.7

Concept of discrete random variables and their probability distributions.

Expected value (mean), $E(X)$ for discrete data.

Applications.

SL 4.8

Binomial distribution.

Mean and variance of the binomial distribution.

SL 4.9

The normal distribution and curve.

Properties of the normal distribution.

Diagrammatic representation.

Normal probability calculations.

Inverse normal calculations

SL 4.10

Equation of the regression line of $x$ on $y$ .

Use of the equation for prediction purposes.

SL 4.11

Formal definition and use of the formulae: $P(A|B)=\frac{P(A \cap B)}{P(B)}$ for conditional probabilities, and $P(A|B) = P(A) = P(A|B')$ for independent events.

SL 4.12

Standardization of normal variables ( $y$ -values).

Inverse normal calculations where mean and standard deviation are unknown.

## AHL Content

AHL 4.13

Use of Bayes’ theorem for a maximum of three events.

AHL 4.14

Variance of a discrete random variable.

Continuous random variables and their probability density functions.

Mode and median of continuous random variables.

Mean, variance and standard deviation of both discrete and continuous random variables.

The effect of linear transformations of $X$ .

Last updated