Statistics: Power from Data!

Text begins

The definitions below provide information for those who have questions about some terms used in statistics, but who do not need highly technical definitions . These definitions provided here are, in some cases, oversimplifications of highly complex concepts. For more detailed explanations, you can consult the references provided one the Bibliography page.


Definitions of words that start with A

  • Administrative data

    Data collected as a result of an organization’s day-to-day operations.

  • Aggregate data

    Data set in which one record represents a summary of multiple observation units.

Definitions of words that start with B

  • Big data

    Data sets that have such a large number of records and variables that they exceed the capacity of traditional software to process the information within a reasonable time.

  • Box and whisker plot

    Type of graph used to visualize the five-number summary, i.e. the median, the lower and upper quartiles, the minimum and the maximum. Synonym: box plot.

Definitions of words that start with C

  • Categorical variable

    Characteristic that isn’t quantifiable. Synonym: qualitative variable.

  • Census

    In general, survey that aims to collect information about every unit of a population. A census is also used to list and count all units of a population.

  • Central tendency

    Measure of the location of the middle or the centre of a distribution.

  • Closed question

    In a questionnaire, a closed question gives the respondent a list of predefined answers and the respondent is supposed to select one or more answers from the list.

  • Coefficient of variation

    Ratio of the standard error of the estimate to the average value of the estimate across all possible samples.

  • Confidence interval

    The range of values around the estimate that is likely to include the unknown population true value with a given probability.

  • Continuous variable

    Numeric variable that assumes an infinite number of real values within a given interval.

  • Crowdsourcing

    Collection of data information from a large community of users. It relies on the principle that citizens are the experts of their local environment.

Definitions of words that start with D

  • Data

    Facts, figures, observations, or recordings that can take the form of image, sound, text or physical measurements (distance, weight, wave lengths, etc.). Data can be gathered and processed in order to form conclusions.

  • Data capture

    The process used to convert data in a machine-readable format.

  • Data coding

    The process that assigns a value (code) to a response. The code can be a numeric value or a character string.

  • Data editing

    Application of checks to detect missing, invalid or inconsistent values or to point to data records that are potentially in error.

  • Data imputation

    The process used to assign replacement values for missing, invalid or inconsistent data that have failed edits.

  • Data item

    The smallest piece of information that can be gathered from a source of information.

  • Data processing

    Transformation of raw data so they can be used to produce estimates or to carry other data analysis.

  • Data provider

    Individual or organization that collect and process data because information is needed for different purposes, and make these data accessible to data users.

  • Data set

    Grouping of data that have common definitions of observation units and variables.

  • Database

    Structured set of data items, generally presented as tables.

  • Delimited text file

    A text file used to store data, in which each line represents a unit, and each line has fields separated by a delimiter. The most common delimiters are commas, tab, and colon.

  • Discrete variable

    Numeric variable that assumes only a finite number of real values within a given interval. The possible values can be enumerated and counted.

  • Dispersion

    Measure of the spread of a distribution around the central tendency.

Definitions of words that start with F

  • Frequency

    The number of times a value occurs in a data set. It can also be a number of events or items. Synonym: count.

  • Frequency distribution

    Chart or table showing how many times each value or range of values of a variable appear in a data set.

Definitions of words that start with I

  • Interquartile range

    Range of the 50% of data that is central to the distribution, i.e. the difference between the upper quartile and the lower quartile.

Definitions of words that start with L

  • Lower quartile

    Value under which 25% of data points are found when they are arranged in increasing order. Synonym: first quartile.

Definitions of words that start with M

  • Margin of error

    Half the width of the confidence interval associated to an estimate.

  • Mean

    Measure of central tendency which is the sum of all values divided by the number of values.

  • Median

    Value in the middle of a data set, meaning that 50% of data points have a value smaller or equal to the median and 50% of data points have a value higher or equal to the median. Synonym: second quartile.

  • Metadata

    Data about data or data elements, including data descriptions, ownership, access paths, access rights, quality or other information that provides context to data.

  • Microdata

    Data set in which one record represents one unit of observation.

  • Missing value

    Blank or absent data point.

  • Mode

    For categorical or discrete variables, it is the value(s) for which the highest frequency is observed. For continuous variables, the modal-class intervals are the peaks of the histogram. When the mode is unique, it can be used as a measure of central tendency.

Definitions of words that start with N

  • Nominal variable

    Categorical variable that describes a name, label or category without natural order.

  • Non-sampling errors

    All sources of error that are unrelated to sampling.

  • Numeric variable

    A quantifiable characteristic whose values are numbers. Synonym: quantitative variable.

Definitions of words that start with O

  • Open data

    Structured, machine-readable data that are freely shared and that can be used without restrictions.

  • Open question

    In a questionnaire, an open question gives the respondent an opportunity to answer the question in their own words.

  • Ordinal variable

    Categorical variable whose values are defined by an order relation between the different categories.

Definitions of words that start with P

  • Primary source of information

    Data from a primary source was collected for the purpose of producing statistics and statistical information.

Definitions of words that start with Q

  • Questionnaire

    Series of questions designed to elicit information on one or more topics from a respondent.

Definitions of words that start with R

  • Range

    Difference between the largest value (maximum) and the smallest value (minimum).

  • Record linkage

    The process by which records or units from different data sources are joined together into a single file using non-unique identifiers, such as names, date of birth, addresses and other characteristics. Synonyms: data matching, data linkage, entity resolution.

  • Remote sensing

    Acquisition of information about an object or phenomenon from a distant point.

Definitions of words that start with S

  • Sample

    A subset of the units of a population.

  • Sample survey

    Survey for which the information is collected for some units of the target population only.

  • Sampling error

    Difference between the estimate derived from a sample survey and the true value that would result if a census of the whole population were taken under the same conditions.

  • Sampling variation

    Average of the squared differences between an estimate and the average of the estimates across all possible samples.

  • Secondary source information

    Data from a secondary source was collected for a purpose other than producing statistical information.

  • Semi-interquartile range

    Half the value of the interquartile range.

  • Spreadsheet

    A software application that displays a table of cells arranged in rows and columns, in which the change of the contents of one cell can cause recalculation of other cells based on user-defined formulas.

  • Standard deviation

    Square root of the variance.

  • Standard error

    Square root of the sampling variance.

  • Statistical information

    Data that have been recorded, classified, organized, related, or interpreted within a framework so that meaning emerges.

  • Statistical register

    Data sets created for statistical purposes that are continuously updated with information about all units of a population.

  • Statistics

    Type of information obtained through mathematical operations on data.

  • Structured data

    Data that are organized into pre-defined items that each relates to a specific concept or data item.

  • Survey

    Any activity to collect information in an organized and methodical manner about the characteristics of the units of a population. The word survey is often used to refer to a sample survey, as opposed to a census.

Definitions of words that start with U

  • Unstructured data

    Unstructured data are any data that are not arranged according to a pre-defined model.

  • Upper quartile

    Value under which 75% of data points are found when arranged in increasing order. Synonym: Third quartile.

Definitions of words that start with V

  • Variable

    Characteristic that can be measured and that can assume different values.

  • Variance

    Average of the squared differences between each data point and the centre of the distribution, measured using the mean.

Definitions of words that start with W

  • Web scraping

    The process through which information is gathered and copied from the web for further analysis.

Date modified: