Statistics: Power from Data!
Glossary
Text begins
The definitions below provide information for those who have questions about some terms used in statistics, but who do not need highly technical definitions . These definitions provided here are, in some cases, oversimplifications of highly complex concepts. For more detailed explanations, you can consult the references provided one the Bibliography page.
Definitions of words that start with A

Administrative data
Data collected as a result of an organization’s daytoday operations.

Aggregate data
Data set in which one record represents a summary of multiple observation units.
Definitions of words that start with B

Big data
Data sets that have such a large number of records and variables that they exceed the capacity of traditional software to process the information within a reasonable time.

Box and whisker plot
Type of graph used to visualize the fivenumber summary, i.e. the median, the lower and upper quartiles, the minimum and the maximum. Synonym: box plot.
Definitions of words that start with C

Categorical variable
Characteristic that isn’t quantifiable. Synonym: qualitative variable.

Census
In general, survey that aims to collect information about every unit of a population. A census is also used to list and count all units of a population.

Central tendency
Measure of the location of the middle or the centre of a distribution.

Closed question
In a questionnaire, a closed question gives the respondent a list of predefined answers and the respondent is supposed to select one or more answers from the list.

Coefficient of variation
Ratio of the standard error of the estimate to the average value of the estimate across all possible samples.

Confidence interval
The range of values around the estimate that is likely to include the unknown population true value with a given probability.

Continuous variable
Numeric variable that assumes an infinite number of real values within a given interval.

Crowdsourcing
Collection of data information from a large community of users. It relies on the principle that citizens are the experts of their local environment.
Definitions of words that start with D

Data
Facts, figures, observations, or recordings that can take the form of image, sound, text or physical measurements (distance, weight, wave lengths, etc.). Data can be gathered and processed in order to form conclusions.

Data capture
The process used to convert data in a machinereadable format.

Data coding
The process that assigns a value (code) to a response. The code can be a numeric value or a character string.

Data editing
Application of checks to detect missing, invalid or inconsistent values or to point to data records that are potentially in error.

Data imputation
The process used to assign replacement values for missing, invalid or inconsistent data that have failed edits.

Data item
The smallest piece of information that can be gathered from a source of information.

Data processing
Transformation of raw data so they can be used to produce estimates or to carry other data analysis.

Data provider
Individual or organization that collect and process data because information is needed for different purposes, and make these data accessible to data users.

Data set
Grouping of data that have common definitions of observation units and variables.

Database
Structured set of data items, generally presented as tables.

Delimited text file
A text file used to store data, in which each line represents a unit, and each line has fields separated by a delimiter. The most common delimiters are commas, tab, and colon.

Discrete variable
Numeric variable that assumes only a finite number of real values within a given interval. The possible values can be enumerated and counted.

Dispersion
Measure of the spread of a distribution around the central tendency.
Definitions of words that start with F

Frequency
The number of times a value occurs in a data set. It can also be a number of events or items. Synonym: count.

Frequency distribution
Chart or table showing how many times each value or range of values of a variable appear in a data set.
Definitions of words that start with I

Interquartile range
Range of the 50% of data that is central to the distribution, i.e. the difference between the upper quartile and the lower quartile.
Definitions of words that start with L

Lower quartile
Value under which 25% of data points are found when they are arranged in increasing order. Synonym: first quartile.
Definitions of words that start with M

Margin of error
Half the width of the confidence interval associated to an estimate.

Mean
Measure of central tendency which is the sum of all values divided by the number of values.

Median
Value in the middle of a data set, meaning that 50% of data points have a value smaller or equal to the median and 50% of data points have a value higher or equal to the median. Synonym: second quartile.

Metadata
Data about data or data elements, including data descriptions, ownership, access paths, access rights, quality or other information that provides context to data.

Microdata
Data set in which one record represents one unit of observation.

Missing value
Blank or absent data point.

Mode
For categorical or discrete variables, it is the value(s) for which the highest frequency is observed. For continuous variables, the modalclass intervals are the peaks of the histogram. When the mode is unique, it can be used as a measure of central tendency.
Definitions of words that start with N

Nominal variable
Categorical variable that describes a name, label or category without natural order.

Nonsampling errors
All sources of error that are unrelated to sampling.

Numeric variable
A quantifiable characteristic whose values are numbers. Synonym: quantitative variable.
Definitions of words that start with O

Open data
Structured, machinereadable data that are freely shared and that can be used without restrictions.

Open question
In a questionnaire, an open question gives the respondent an opportunity to answer the question in their own words.

Ordinal variable
Categorical variable whose values are defined by an order relation between the different categories.
Definitions of words that start with P

Primary source of information
Data from a primary source was collected for the purpose of producing statistics and statistical information.
Definitions of words that start with Q

Questionnaire
Series of questions designed to elicit information on one or more topics from a respondent.
Definitions of words that start with R

Range
Difference between the largest value (maximum) and the smallest value (minimum).

Record linkage
The process by which records or units from different data sources are joined together into a single file using nonunique identifiers, such as names, date of birth, addresses and other characteristics. Synonyms: data matching, data linkage, entity resolution.

Remote sensing
Acquisition of information about an object or phenomenon from a distant point.
Definitions of words that start with S

Sample
A subset of the units of a population.

Sample survey
Survey for which the information is collected for some units of the target population only.

Sampling error
Difference between the estimate derived from a sample survey and the true value that would result if a census of the whole population were taken under the same conditions.

Sampling variation
Average of the squared differences between an estimate and the average of the estimates across all possible samples.

Secondary source information
Data from a secondary source was collected for a purpose other than producing statistical information.

Semiinterquartile range
Half the value of the interquartile range.

Spreadsheet
A software application that displays a table of cells arranged in rows and columns, in which the change of the contents of one cell can cause recalculation of other cells based on userdefined formulas.

Standard deviation
Square root of the variance.

Standard error
Square root of the sampling variance.

Statistical information
Data that have been recorded, classified, organized, related, or interpreted within a framework so that meaning emerges.

Statistical register
Data sets created for statistical purposes that are continuously updated with information about all units of a population.

Statistics
Type of information obtained through mathematical operations on data.

Structured data
Data that are organized into predefined items that each relates to a specific concept or data item.

Survey
Any activity to collect information in an organized and methodical manner about the characteristics of the units of a population. The word survey is often used to refer to a sample survey, as opposed to a census.
Definitions of words that start with U

Unstructured data
Unstructured data are any data that are not arranged according to a predefined model.

Upper quartile
Value under which 75% of data points are found when arranged in increasing order. Synonym: Third quartile.
Definitions of words that start with V

Variable
Characteristic that can be measured and that can assume different values.

Variance
Average of the squared differences between each data point and the centre of the distribution, measured using the mean.
Definitions of words that start with W

Web scraping
The process through which information is gathered and copied from the web for further analysis.
 Date modified: