
Contents
Introduction to Statistical Methods
Rationale
Statistics is a mathematical science pertaining to the collection, analysis, interpretation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities; it is also used for making informed decisions in all areas of business and government.
Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics. In addition, patterns in the data may be modeled in a way that accounts for randomness and uncertainty in the observations, to draw inferences about the process or population being studied; this is called inferential statistics. Both descriptive and inferential statistics can be considered part of applied statistics. There is also a discipline of mathematical statistics, which is concerned with the theoretical basis of the subject.
The word statistics is also the plural of statistic (singular), which refers to the result of applying a statistical algorithm to a set of data, as in employment statistics, accident statistics, etc.
- Historical overview
- Conceptual overview
- Statistical methods
- Specialized disciplines
- Software
- Criticism
- Analysis of variance (ANOVA)
- CHAID
- Confidence interval
- Correlation implies causation
- Data
- Data mining
- Extreme value theory
- Forecasting
- Instrumental variables estimation
- List of academic statistical associations
- List of national and international statistical services
- List of publications in statistics
- List of statistical topics
- List of statisticians
- Machine learning
- Multivariate statistics
- Prediction interval
- Predictive analytics
- Regression analysis
- Resampling (statistics)
- Statistical phenomena
- Statistical thermodynamics
- Statistician
- Structural equation modeling
- Trend estimation
- Scientific visualization
- Notes
- Bibliography
Learning Outcomes
After completing the programme students should be able to:
1. Demonstrate an overall understanding of the data collection process.
This includes sources of data, sampling methods, problems associated with surveys, questionnaire design, measurement scales (nominal, ordinal, interval and ratio scales) and sampling error.
2. Use a range of descriptive statistics to present data effectively.
This includes the presentation of data in tables and charts, frequency and cumulative frequency distributions and their graphical representations, measures of location, dispersion and skewness, index numbers and their applications.
3. Understand the basic concepts of probability and probability distributions.
This includes the basic rules of probability, expected values and the use of probability and decision trees, the binomial and Poisson distributions and their applications, and the characteristics and use of the normal distribution.
4. Apply the normal distribution and the t distribution in estimation and hypothesis testing.
This includes sampling theory and the Central Limit Theorem. The construction of confidence intervals for population means and proportions, using the standard normal distribution or the t distribution, as appropriate, and hypothesis tests of a single mean, a single proportion, the difference between two means and the difference between two proportions.
5. Use correlation and regression analysis to identify the strength and form of relationships between variables.
In correlation analysis, this includes the use of scatter diagrams to illustrate linear association between two variables, Pearsons coefficient of correlation and Spearmans rank correlation coefficient and the distinction between correlation and causality. In regression analysis, students are expected to be able to estimate the least squares regression line for a two-variable model and interpret basic results from simple and multiple regression models.
6. Demonstrate how time-series analysis can be used in business forecasting.
This includes the use of the additive and multiplicative models to decompose time series data, the calculation of trends and cyclical and seasonal patterns, and simple forecasting.
7. Distinguish between parametric and non-parametric methods and use the chisquared statistic in hypothesis testing.
This includes using the chi-squared statistic as a test of independence between two categorical variables and as a test of goodness-of-fit.
8. Show how mathematical relationships can be applied to economic and business problems.
This includes the algebraic and graphical representation of demand and supply functions and the determination of equilibrium price and quantity in a competitive market. It also includes the algebraic and graphical representation of cost, revenue and profit functions, with applications to pricing and output determination (including break-even analysis.) Throughout, students will be expected to be able to define relevant terms and to interpret all results.
Teaching and Learning Resources
Today's Videos
- Connect with us on http://www.youtube.com/finntrack
- Google's Playlists

Why Study Statistics? Describing Data
Lectures and Tutorials
Readings
Statistical Advisor, How To Describe Data
- Tabulate/plot categorical data (such as gender, occupation) and compute frequencies, percentages, etc., or
- Explore/summarize a time series?
Summarizing Descriptive Relationships. Probability
Lectures
and Tutorials
Readings
Probability is a way of expressing knowledge or belief that an event will occur or has occurred. The concept has an exact mathematical meaning in probability theory, which is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, artificial intelligence/machine learning and philosophy to draw conclusions about the likelihood of potential events and the underlying mechanics of complex systems.
- Virtual Laboratories in Probability and Statistics (Univ. of Ala.-Huntsville)
- Probability on In Our Time at the BBC. (listen now)
- Probability and Statistics EBook
- Edwin Thompson Jaynes. Probability Theory: The Logic of Science. Preprint: Washington University, (1996).
- People from the History of Probability and Statistics (Univ. of Southampton)
- Probability and Statistics on the Earliest Uses Pages (Univ. of Southampton)
- Earliest Uses of Symbols in Probability and Statistics on Earliest Uses of Various Mathematical Symbols
- A tutorial on probability and Bayes’ theorem devised for first-year Oxford University students
- pdf file of An Anthology of Chance Operations (1963) at UbuWeb
- Probability Theory Guide for Non-Mathematicians
- Understanding Risk and Probability with BBC raw
- Introduction to Probability - eBook, by Charles Grinstead, Laurie Snell Source (GNU Free Documentation License)
Discrete Random Variables and Probability Distributions. Continuous Random Variables and Probability
Lectures and Tutorials
Readings
Statistics Tutorial: Probability Distributions
To understand probability distributions, it is important to understand variables. random variables, and some notation.
- A variable is a symbol (A, B, x, y, etc.) that can take on any of a specified set of values.
- When the value of a variable is the outcome of a statistical experiment, that variable is a random variable.
Generally, statisticians use a capital letter to represent a random variable and a lower-case letter, to represent one of its values. For example,
X represents the random variable X.
P(X) represents the probability of X.
P(X = x) refers to the probability that the random variable X is equal to a particular value, denoted by x. As an example, P(X = 1) refers to the probability that the random variable X is equal to 1.
Probability Distributions
An example will make clear the relationship between random variables and probability distributions. Suppose you flip a coin two times. This simple statistical experiment can have four possible outcomes: HH, HT, TH, and TT. Now, let the variable X represent the number of Heads that result from this experiment. The variable X can take on the values 0, 1, or 2. In this example, X is a random variable; because its value is determined by the outcome of a statistical experiment.
A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurence. Consider the coin flip experiment described above. The table below, which associates each outcome with its probability, is an example of a probability distribution.
Read more ...
Sampling and Sampling Distributions. Estimation
Lectures and Tutorials
Readings
Sampling is that part of statistical practice concerned with the selection of a subset of individuals from within a population to yield some knowledge about the whole population, especially for the purposes of making predictions based on statistical inference.
Researchers rarely survey the entire population for two reasons (Adèr, Mellenbergh, & Hand, 2008): the cost is too high, and the population is dynamic in that the individuals making up the population may change over time. The three main advantages of sampling are that the cost is lower, data collection is faster, and since the data set is smaller it is possible to ensure homogeneity and to improve the accuracy and quality of the data.
Each observation measures one or more properties (such as weight, location, color) of observable bodies distinguished as independent objects or individuals. In survey sampling, survey weights can be applied to the data to adjust for the sample design. Results from probability theory and statistical theory are employed to guide practice. In business and medical research, sampling is widely used for gathering information about a population.[1]
- Process
- Population definition
- Sampling frame
- Probability and nonprobability sampling
- Sampling methods
- Replacement of selected units
- Sample size
- Sampling and data collection
- Errors in sample surveys
- Survey weights
- History
- Chapter on Sampling at the Research Methods Knowledge Base
- Survey Sampling Methods at the SatPac survey software site
- TRSL – Template Range Sampling Library
- Continuous Sampling vs. Costs - Electronics Industry Example
Hypothesis Testing. Regression Modeling and Analysis
Lectures and Tutorials
Readings
A statistical hypothesis is an assumption about a population parameter. This assumption may or may not be true.
The best way to determine whether a statistical hypothesis is true would be to examine the entire population. Since that is often impractical, researchers typically examine a random sample from the population. If sample data are not consistent with the statistical hypothesis, the hypothesis is rejected.
There are two types of statistical hypotheses.
Null hypothesis. The null hypothesis, denoted by H0, is usually the hypothesis that sample observations result purely from chance.
Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the hypothesis that sample observations are influenced by some non-random cause.
For example, suppose we wanted to determine whether a coin was fair and balanced. A null hypothesis might be that half the flips would result in Heads and half, in Tails. The alternative hypothesis might be that the number of Heads and Tails would be very different. Symbolically, these hypotheses would be expressed as
H0: P = 0.5
Ha: P ≠ 0.5
Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given this result, we would be inclined to reject the null hypothesis. We would conclude, based on the evidence, that the coin was probably not fair and balanced.
![]() |
Regression
Analysis: Statistical Modeling of a Response Variable
Rudolf
J. Freund, Check the availability and buy your books from our Bookshop. |
Simple Regression. Multiple Regression
Lectures and Tutorials
Readings
In statistics, regression analysis includes any techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables — that is, the average value of the dependent variable when the independent variables are held fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution.
Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables.
A large body of techniques for carrying out regression analysis has been developed. Familiar methods such as linear regression and ordinary least squares regression are parametric, in that the regression function is defined in terms of a finite number of unknown parameters that are estimated from the data. Nonparametric regression refers to techniques that allow the regression function to lie in a specified set of functions, which may be infinite-dimensional.
The performance of regression analysis methods in practice depends on the form of the data generating process, and how it relates to the regression approach being used. Since the true form of the data-generating process is in general not known, regression analysis often depends to some extent on making assumptions about this process. These assumptions are sometimes (but not always) testable if a large amount of data is available. Regression models for prediction are often useful even when the assumptions are moderately violated, although they may not perform optimally. However, in many applications, especially with small effects or questions of causality based on observational data, regression methods give misleading results.[1][2]
- History
- Regression models
- Underlying assumptions
- Linear regression
- Interpolation and extrapolation
- Nonlinear regression
- Power and sample size calculations
- Other methods
- Software
Additional Topics in Regression Analysis. Nonparametric Statistics
Lectures and Tutorials
Readings
Nonparametric Statistics- General Purpose
- Brief Overview of Nonparametric Procedures
- When to Use Which Method
- Nonparametric Correlations
Goodness-of-fit Tests and Contingency Tables. Analysis of Variance
Lectures and Tutorials
Readings
An important technique for analyzing the effect of categorical factors on a response is to perform an Analysis of Variance. An ANOVA decomposes the variability in the response variable amongst the different factors. Depending upon the type of analysis, it may be important to determine: (a) which factors have a significant effect on the response, and/or (b) how much of the variability in the response variable is attributable to each factor.
STATGRAPHICS Centurion provides several procedures for performing an analysis of variance:
1. One-Way ANOVA - used when there is only a single categorical factor. This is equivalent to comparing multiple groups of data.
2. Multifactor ANOVA - used when there is more than one categorical factor, arranged in a crossed pattern. When factors are crossed, the levels of one factor appear at more than one level of the other factors.
3. Variance Components Analysis - used when there are multiple factors, arranged in a hierarchical manner. In such a design, each factor is nested in the factor above it.
4. General Linear Models - used whenever there are both crossed and nested factors, when some factors are fixed and some are random, and when both categorical and quantitative factors are present.
![]() |
Analysis
of Variance in Statistical Image Processing Ludwik Kurz Check
the availability and buy your books from our Bookshop. |
Introduction to Quality. Time Series Analysis and Forecasting
Lectures and Tutorials
Readings
Aside from achievement of standards like BSI, firms can measure quality aspects such as:
- Failure or reject rates
- Level of product returns
- Customer complaints
- Customer satisfaction – usually measured by a survey
- Customer loyalty – evident from repeat purchases, or renewal rates
A detailed analysis of areas such as these would be an important part of Quality Improvement – see the separate revision note for more details.
Time-Critical Decision Making for Business Administration
![]() |
Realization of the fact that "Time is Money" in business activities, the dynamic decision technologies presented here, have been a necessary tool for applying to a wide range of managerial decisions successfully where time and money are directly related. In making strategic decisions under uncertainty, we all make forecasts. We may not think that we are forecasting, but our choices will be directed by our anticipation of results of our actions or inactions. Indecision and delays are the parents of failure. This site is intended to help managers and administrators do a better job of anticipating, and hence a better job of managing uncertainty, by using effective forecasting and other predictive techniques.
|
![]() |
Quality
Management Introduction to Quality
By: Fred Tickle BA Ceng MIMech E MIEE MIQA andGeoff Vorley MSc MIQA Check the availability and buy your books from our Bookshop. |
![]() |
Introduction
to Time Series Analysis and Forecasting: with Application
of SAS and SPSS Robert Yaffee, et al Check the availability and buy your books from our Bookshop. |
Additional Topics in Sampling. Statistical Decision Theory
Lectures and Tutorials
Readings
Statistical decision theory
Several statistical tools and methods are available to organize evidence, evaluate risks, and aid in decision making. The risks of Type I and type II errors can be quantified (estimated probability, cost, expected value, etc.) and rational decision making is improved.
![]() |
Introduction
to Statistical Decision Theory Check the availability and buy your books from our Bookshop. |
Recommended Texts
![]() |
Statistics
for Business and Economics and Student Check the availability and buy your books from our Bookshop. |
![]() |
A
First Course In Business Statistics Eighth Edition by Check the availability and buy your books from our Bookshop. |
![]() |
Introducing
Anova and Ancova A GLM Approach
Authored
by: Check the availability and buy your books from our Bookshop. |
![]() |
Discovering
Statistics Using SPSS for Windows - Advanced Techniques for Beginners Authored
by: Check
the availability and buy your books from our Bookshop. |
Resources
























