Tiber Tutor

definitions

IB Maths AI Topic 4 Definitions

This page contains our IB Maths AI definitions for topic 4. By learning each one of these definitions, you will fully cover the content for IB Maths AI 'Stats & Probability'.

0

alternative hypothesis

The claim investigated when sample evidence is strong enough against H0H_0, written as H1H_1.

bias

A tendency in the sampling or data collection process that favours certain outcomes or groups, so the results are not representative.
HL

biased

Systematically favours certain outcomes, for example through question wording or poor sampling, so results are not fairly representative.
HL

binomial test

A hypothesis test for a population proportion using XBin(n,p)X\sim\text{Bin}(n,p) under H0H_0, where there are a fixed number of independent trials, two outcomes per trial, and constant success probability.

bivariate

Involving two numerical variables measured together as paired values, usually written as (x,y)(x,y).
HL

central limit theorem

Stating that for large 'nn', the sample mean 'Xˉ\bar{X}' is approximately normally distributed with 'XˉN(μ,σ2n)\bar{X}\approx N\left(\mu,\frac{\sigma^2}{n}\right)' even when the population distribution is not normal.

class

A range of values used to group continuous data, usually written as an interval such as '40x<5040\leq x\lt 50'.

HL

coefficient of determination

The proportion of the variation in yy explained by the chosen model, with values from 00 to 11; higher values usually indicate a better fit.

complement

The event that AA does not happen, written as AA', with probability P(A)=1P(A)P\left(A'\right)=1-P\left(A\right).

conditional probability

The probability that one event happens given that another event has already happened, written as
P(AB)=P(AB)P(B)P(A \mid B) = \frac{P(A \cap B)}{P(B)} provided that P(B)0P(B) \neq 0.

HL

confidence

A statement about the long-run performance of an interval-building method; for example, with a 95%95\% method, about 95%95\% of intervals from repeated samples would contain the true population mean.
HL

confidence interval

A range of plausible values for a population mean, built around the sample mean to show uncertainty due to sampling.

continuous

Able to take any value within an interval.

correlation

A description of the direction and strength of the relationship between two variables, as seen in a scatter diagram.
HL

criterion-related

Assesses whether results agree closely with another accepted measure of the same thing.
HL

critical region

The set of values of the test statistic for which H0H_0 is rejected; its probability under H0H_0 is chosen to be at most the significance level.
HL

critical value

The boundary point(s) that separate the critical region from the non-critical region for a chosen significance level.

cumulative

A running total found by adding successive values as you move through a table or list.

degrees of freedom

The number of independent pieces of information used to determine the sampling distribution of a test statistic; for χ2\chi^2 independence it is (rows1)(columns1)(\text{rows}-1)(\text{columns}-1) and for goodness of fit it is n1n-1.

discrete

Taking separate, countable values, usually whole numbers.

dispersion

The extent to which data values vary about a central value, quantified using measures such as interquartile range, standard deviation, and variance.
HL

eigenvector

A non-zero vector v\mathbf{v} that satisfies Av=λvA\mathbf{v}=\lambda\mathbf{v} for some scalar λ\lambda, meaning the matrix transformation changes only its size (and possibly reverses it) but not its direction.
HL

estimate

A sample-based value used to approximate an unknown population parameter such as 'μ\mu' or 'σ2\sigma^2'.

event

A set of outcomes from the sample space.

exclusive

Describes events that cannot happen at the same time, so P(AB)=0P\left(A\cap B\right)=0 and P(AB)=P(A)+P(B)P\left(A\cup B\right)=P\left(A\right)+P\left(B\right).

extrapolation

Using a regression line to predict a value that lies outside the range of the observed data, which is generally less reliable.

frequency

A table that shows how often each value (discrete) or each class interval (continuous) occurs.

gradient

The coefficient aa in y=ax+by=ax+b, giving the predicted change in yy when xx increases by 11.

histogram

A diagram for grouped continuous data in which each class interval is shown by a touching bar, allowing the shape and spread of the distribution to be seen.

hypothesis

A statement about a population parameter or about a relationship in a population that can be tested using sample data.

independent probability

Describes events or random variables where knowing one outcome gives no information about the other; for Poisson variables this condition allows their totals to remain Poisson.

interpolation

Estimating a value at an unknown point using known values at nearby points; in nearest neighbour interpolation, the estimate is taken from the closest site.

interquartile

Referring to the middle 50%50\% of the data, between the lower quartile Q1Q_1 and upper quartile Q3Q_3.

interquartile range

Measures the spread of the middle 50%50\% of an ordered data set, calculated as the difference between the upper and lower quartiles: IQR=Q3Q1IQR=Q_3-Q_1.
HL

least squares regression curve

The regression curve (of a chosen model type) that minimises SSresSS_{\text{res}} for the given data set.

HL

linear combinations

A weighted sum of random variables such as 'a1X1+a2X2++anXna_1X_1+a_2X_2+\dots+a_nX_n', whose expected value is the corresponding weighted sum of expected values.
HL

linear transformation

Changing a random variable 'XX' to 'aX+baX+b', which shifts the mean by 'bb' and scales it by 'aa', while scaling the variance by 'a2a^2'.
HL

margin of error

The half-width of a confidence interval, equal to a critical value multiplied by a standard error, such as zσnz\frac{\sigma}{\sqrt{n}} or tsnt\frac{s}{\sqrt{n}}.
HL

Markov

A stochastic process in which the next state depends only on the current state and not on the earlier history of the system.

mean

The expected value of a random variable; if XPoisson(λ)X \sim \mathrm{Poisson}(\lambda), then E(X)=λE(X) = \lambda

median

The value mm such that mf(x)dx=12\int_{-\infty}^{m} f(x)\,dx=\frac{1}{2}, so half the total area under the density curve lies on each side.

modal

The class interval with the highest frequency in grouped data (used when class intervals are equal).

negative correlation

Describing correlation where points tend to fall from left to right as xx increases.
HL

Non-linear regression

Using technology to fit a curved model (not of the form y=ax+by=ax+b) to data by estimating constants so the curve matches the overall pattern of the points.

HL

normal

A line through a point on a curve that is perpendicular to the tangent at that point, with gradient equal to the negative reciprocal of the tangent gradient.

null hypothesis

The default claim in a hypothesis test, usually stating no difference, no effect, no association, or that a parameter has a stated value, and written as H0H_0.

one-tailed

A test where H1H_1 is directional (for example, μ1>μ2\mu_1 \gt \mu_2 or μ1<μ2\mu_1 \lt \mu_2), so the rejection region is in one tail of the sampling distribution.

HL

one-tailed test

A hypothesis test where the critical region lies entirely in one tail of the sampling distribution, matching a directional alternative such as μ<μ0\mu\lt\mu_0 or p>p0p\gt p_0.

outcome

One possible result of a trial.

outlier

A value unusually far from the rest of the data; in this course, one that is below Q11.5×IQRQ_1-1.5\times IQR or above Q3+1.5×IQRQ_3+1.5\times IQR, where IQR=Q3Q1IQR=Q_3-Q_1.

p-value

The probability of obtaining a result at least as extreme as the sample result, assuming H0H_0 is true; smaller values give stronger evidence against H0H_0.

HL

parallel

Checks consistency by comparing results from two different versions designed to measure the same thing.

Pearson's

Referring to the product-moment correlation coefficient rr, a statistic that measures the strength and direction of a linear relationship and lies between 1-1 and 11.

percentiles

Values that split ordered data into 100100 equal parts; the ppth percentile has about pp% of the data below it and can be estimated from a cumulative frequency graph.

HL

Poisson distribution

A discrete probability model for the number of times an event happens in a fixed interval of time, length, area, or volume, assuming events occur independently at a constant average rate.
HL

Poisson test

A hypothesis test for a population mean rate using XPo(λ0)X\sim\text{Po}(\lambda_0) under H0H_0, appropriate for counts of events occurring independently at a constant average rate (with scaling if the observation period changes).

population

The full set of all possible values of interest, described by parameters such as the mean 'μ\mu' and variance 'σ2\sigma^2'.

positive correlation

Describing a correlation where the points in a scatter diagram tend to rise from left to right.

probability

Measures how likely an event is to happen, taking a value between 00 and 11, where 00 means impossible and 11 means certain.

quartiles

Values that split ordered data into four equal parts; on a cumulative frequency graph with nn values, they are estimated at cumulative frequencies n4\frac{n}{4} and 3n4\frac{3n}{4}.

random

Describing an outcome that cannot be predicted with certainty in advance, even though long-run patterns may be modelled using probabilities.

range

The set of possible outputs a function can produce.

regression

A method for modelling a linear relationship with a line used to make predictions from bivariate data.
HL

reliability

Consistency; a method is reliable if it gives similar results when repeated in similar conditions.

sample

A set of 'nn' observed values taken from a population and used to compute statistics such as 'xˉ\bar{x}' and 'sn12s_{n-1}^2'.
HL

sample mean

The arithmetic average of the sample values, given by 'xˉ=1ni=1nxi\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_i', and used as an unbiased estimator of 'μ\mu'.
HL

sample variance

A statistic measuring spread based on squared deviations from 'xˉ\bar{x}', commonly computed as 'sn12=1n1i=1n(xixˉ)2s_{n-1}^2=\frac{1}{n-1}\sum_{i=1}^{n}\left(x_i-\bar{x}\right)^2' to estimate 'σ2\sigma^2'.

significance

The chosen cut-off for deciding whether evidence against H0H_0 is strong enough to reject it, usually denoted by α\alpha.

HL

significance level

The maximum allowed probability of rejecting H0H_0 when H0H_0 is true, so P(Type I error)=αP\left(\text{Type I error}\right)=\alpha.

spread

How dispersed the data values are, described using measures of dispersion such as interquartile range, standard deviation, and variance.
HL

SSres

The total squared error from a model, calculated as SSres=(yactualypredicted)2SS_{\text{res}}=\sum\left(y_{\text{actual}}-y_{\text{predicted}}\right)^2, used to compare how closely different models fit the same data.

standard deviation

The square root of the variance, σ=Var(X)\sigma=\sqrt{\mathrm{Var}(X)}, giving spread in the same units as the random variable.
HL

Sum of square residuals

A measure of fit found by adding the squared residuals for all data points, SSres=(yactualypredicted)2SS_{\text{res}}=\textstyle\frac{}{}\sum\left(y_{\text{actual}}-y_{\text{predicted}}\right)^2; smaller values indicate a closer fit to the data.

HL

test-retest

Checks consistency by repeating the same test after a period of time and comparing the results.

two-tailed

A test where H1H_1 looks for a difference in either direction (for example, μ1μ2\mu_1 \neq \mu_2), so the rejection region is split between both tails.

HL

two-tailed test

A hypothesis test where the critical region is split between both tails of the sampling distribution, matching a non-directional alternative such as μμ0\mu\neq\mu_0.
HL

Type I error

Rejecting H0H_0 even though H0H_0 is true (a false positive), with probability α=P(Reject H0H0 true)\alpha=P\left(\text{Reject }H_0\mid H_0\text{ true}\right).
HL

Type II error

Failing to reject H0H_0 even though H1H_1 is true (a false negative), with probability β=P(Fail to reject H0H0 false)\beta=P\left(\text{Fail to reject }H_0\mid H_0\text{ false}\right) that depends on the true parameter value and the chosen critical region.
HL

validity

Accuracy; a method is valid if it measures what it is intended to measure.
HL

variables

Quantities that can change in a situation and are represented by symbols so relationships can be modelled mathematically.

variance

Measures spread using the mean of squared distances from the mean, so values far from the mean have greater influence; measured in squared units and equal to the square of the standard deviation.

Next Up

You have completed the topic 4 definitions for IB Maths AI - continue with related resources below or explore the full IB Maths AI course from the IBO.

Other topic 4 resources