Data Visualization: Best Practices
Introduction
When used properly, charts (including figures and diagrams) can simplify the presentation of information and the communication of clear and precise messages. However, with the wide range of options available, creating effective charts can be complex. This reference tool is intended to provide a basic guide to creating effective charts that take advantage of the options available.
Preparation
Before creating a chart, it is important to answer a few important questions.
- What data do I have access to? The purpose of this question is to clarify the nature and status of the data that you are aiming to present through a chart. The type of data, the number of variables available, and the presence of missing data will all have an impact on the type of chart to be used, as well as the way in which the information is presented.
- Who is my target audience? This question is designed to determine who the chart is primarily intended for. The target audience and their level of knowledge and expertise will have an impact on the way the information is presented, as well as the type of chart and the terminology used.
- What is the message I want to deliver? Charts can convey a message quickly and effectively. It is therefore important to establish the message that you want to highlight. This will have an impact on the type of chart chosen, its title and many other elements.
- Should I use a graph? Charts can be very useful in many
circumstances. However, there are some situations where it may be better not to
use charts:
- When the data are very scattered;
- When there is a lot of missing data;
- When the data are very homogeneous;
- When there is little data;
- When there is a lot of data.
Description for Figure 1
In this image of a three column bar chart, the first data point is very high, while the second and third are both very low. This wide range in data points makes it difficult to determine the values for the second and third columns.
Description for Figure 2
In this image of a line chart, there are missing data points for multiple variables, reducing the likelihood of identifying any trends or changes over time.
Description for Figure 3
In this image of a twenty column bar chart, all twenty bars are the same colour and height, making the visualization overly complex as all data points are identical.
Description for Figure 4
In this image of a pie chart, there are only two categories. One represents nearly the entire pie while the other represents very little, making the visualization overly complex and unnecessary.
Description for Figure 5
In this image of a 47 column bar chart, the data points are far too crowded and are represented using colours that are not distinguishable enough from one another to be easily interpreted.
Chart components
The different types of charts have common components, for which there are best practices.
Description for Figure 6
The image is a sample bar chart with labels and arrows pointing to different areas of the chart. The labels are pointing out the common components of charts, including Title, Axis, Tick marks, Grid lines, Legend, Notes, Sources, Colours.
1. Title
The title of a chart usually appears at the top of the chart and is part of it. The different axis of the chart can also have their own titles.
- The title of the chart can be descriptive (e.g., include variables and time periods covered) or informative (e.g., specify the intended message of the chart).
- Axis titles should be descriptive.
- The title should be clear and help the reader better understand the data presented.
- The chart, including its title, should be considered separate from the text. If necessary, the acronyms used should be redefined.
- If the data refer to a specific unit of measurement, this can be specified in the title of the chart or the title of the axis.
2. Axis
Some types of charts use axis to present the data.
- Axis titles are not always needed, unless there is a specific unit of measurement.
- With special exceptions, axis should start at zero and not be broken.
3. Tick marks
Tick marks provide visual cues for easy reading of the data.
- There should be a balance between including too many tick marks, cluttering the axis, and too few tick marks, making it difficult to read the data.
- The intervals should be regular, such as multiples of 10, 100, 1,000 or million.
- Tick marks should always be on the inside of plot area.
- Tick marks should not be used simultaneously with grid lines.
4. Grid lines
Grid lines appear on certain types of charts, in order to facilitate the reading and comparison of values.
- The lines should be thin, light and placed behind the elements presenting the data.
- The space between the lines should be sufficient to facilitate the reading of the data, without cluttering the chart.
- Grid lines should not be used simultaneously with tick marks.
5. Legend
A legend can be used to label the different variables or categories presented in a chart. It can be particularly useful when there are multiple items.
- The legend should appear close to the objects it labels, without interfering with the data.
- The legend should be less prominent than the objects it labels.
- A legend should always be used if there are multiple categories of data (in different colours).
6. Notes
Notes can be used to provide details about the chart, such as methodological considerations, data limitations, or a description of abbreviations used.
- Footnotes should be numbered, notes do not need to be.
- Notes should be brief and specific.
- Depending on the audience, notes may be used to clarify technical terms used.
7. Sources
A chart should always include the sources of the data used.
- Data sources should be specific enough to allow readers to find the data, if necessary.
- If the data source is frequently updated (e.g., exchange rates between two currencies), it is important to specify the date the data were extracted.
8. Colours
Colours can be used to make data easier to understand, to emphasise specific elements or to communicate certain messages quickly.
- To facilitate the visualisation of data, it is recommended to use colours that contrast with the background colour of the chart.
- Colours should be used only when necessary to achieve a specific communication objective.
- Different colours should be used only when they have different meanings.
- Do not use shade of similar colours without adequate space between them.
- Light, natural colours should be used to present most data, except for data to which attention is to be drawn, for which bright or dark colours are preferred.
- Where different values of a variable are to be presented, it is preferable to use a single colour, varying the intensity (lighter for low values and darker for high values).
- Patterns can be used instead of colours when required.
Description for Figure 7
The image is of a three bar chart where the three bars are a very light blue colour and do not contrast with the background colour of the chart.
Description for Figure 8
This is an image of a four bar chart. The color of each bar corresponds to the favorite color it represents.
Description for Figure 9
The image is of a three bar chart where each of the three bars represent the same variable, but for three different years. Positive results are in green and negative ones are in red.
Description for Figure 10
The image is of a six bar chart where five bars are blue and the sixth is also blue but much darker. This is to draw attention to the fact that the value of the sixth year is over six times larger than any of the other years.
Description for Figure 11
The image is of a map of Canada, where each province is coloured in a shade of blue, depending on the proportion is represents. The lightest blue represents the lowest proportions and the darkest blue represent the highest.
Description for Figure 12
In this image of a line chart, there are five years being represented by five lines. Instead of using different colours to differentiate one year from another, the image has used shapes, a circle, triangle, square and letter X to represent data points over time.
9. Other considerations
- Keep it simple and avoid overloading the charts.
- With some exceptions, avoid 3D charts, which are more difficult to understand.
- For accessibility, notes should be used if there are visual elements on the chart that are not part of the data table.
Description for Figure 13
In this image of a 14 column bar chart, the data points are too crowded and are represented using colours that are not distinguishable enough from one another, numerical values are overlapping one another and multiple years are being represented. The chart is too crowded to be easily interpreted.
Description for Figure 14
This is an image of a 3D bar chart that is 4 columns wide and three columns deep. Many of the columns in the back two rows of data are shorter than the front row which makes the data difficult to interpret.
Types of charts
There are several types of charts, each with advantages and disadvantages, depending on the context and the nature of the data. The selection of the right type of chart will be influenced by different factors such as the type of data, the messages to be communicated and the target audience.
1. Pie charts
A pie chart shows the percentage distribution of a given variable. Each segment represents a category and its size is proportional to its weight in the total.
Description for Figure 15
This is an image of a 3 category pie chart where the three categories add up to 100%.
- Pie charts can only be used for data where the sum of the different categories adds up to 100%.
- Ideally, a pie chart has between 2 and 6 different categories.
- The categories should be presented in descending order, clockwise.
- It is not possible to present uncertainty on a pie chart.
Description for Figure 16
This is an image of a 3 category pie chart where the three categories add up to more than 100%.
Description for Figure 17
This is an image of a 10 category pie chart, which makes the data difficult to interpret visually.
Description for Figure 18
This is an image of a 5 category pie chart where the categories are not presented in descending order.
Pie chart family
A. Bar of pie charts
Description for Figure 19
This is an image of a 4 category pie chart with “Other” category being represented by a 2 category stacked bar chart to the right of the pie chart.
A bar of pie charts can be used when there are more than six categories, or when there are several small categories, which are difficult to illustrate clearly in a regular pie chart. In these cases, a new category called 'Other', the amount of which is equal to the sum of the smaller categories, is inserted into the main chart. Stacked bars show these categories next to the pie chart.
B. Donut charts
Description for Figure 20
This is an image of a 4 category donut chart where each category is represented not just with a different colour, but also by a different logo.
A donut chart is a pie chart with a hole in the centre. The hole makes it more difficult to estimate the relative size of categories, but can be used to present relevant information, such as a logo.
2. Bar charts
A bar chart uses bars to represent the different categories. It can be vertical or horizontal and has two axes. The names of the different categories are shown on one axis or with labels on the bars. The value of the data is shown on the other axis: this is called the scale.
Description for Figure 21
This is an image of a 5 category, horizontal bar chart, where the titles are on the y axis and percentages on the x axis.
- A vertical bar chart provides more space for the names of the categories presented.
- If the aim is to emphasize the order of magnitude of different categories, present them in ascending or descending order. Do not change the order of categories that have a natural order, such as months or years.
- Ideally, a bar chart would have between 2 and 10 different categories.
- Where the data have a time element, this should be presented chronologically on the X-axis, from left to right.
- It is possible to present uncertainty on a bar chart.
Description for Figure 22
This is an image of a 5 category, vertical bar chart, where the titles are on the x axis and percentages on the y axis.
Description for Figure 23
This is an image of a 5 category bar chart where annual data are presented in chronological order, from left to right
Description for Figure 24
This is an image of a 5 category bar chart that displays a range of uncertainty at the top of each bar.
Bar chart family
A. Grouped bar charts
It is possible to present two or more series of data in a grouped bar chart. However, the more series there are, the more difficult it is to focus on one at a time.
Description for Figure 25
This is an image of a grouped bar chart that displays data for 4 categories over 3 years. Each year is represented as a grouping of the same of 4 categories.
B. Histograms
Histograms are used to illustrate the summary of a continuous variable measured on an interval scale. In a histogram, the bars are connected to each other, with no space between them.
Description for Figure 26
This is an image of a 7 category histogram.
C. Box plots
Box plots are used to illustrate the distribution of different categories of a variable. Each bar starts at the minimum value and ends at the maximum value of the category. There is usually a thick line inside each bar that shows the center of the distribution, usually the median.
Description for Figure 27
This is an image of a 2 category box plot.
D. Box and whisker plots
This chart is one of the most effective charts for visualizing information about the frequency distribution of variables and the distribution of a continuous variable. It displays minimum, first quartile, median, third quartile and maximum value of a category of a variable.
Description for Figure 28
This is an image of a 2 category box and whisker plot.
E. Stacked bar chart
Stacked bar charts are used to illustrate the total values of different categories. Additionally, each bar is broken down to show subcomponents in each category. Since the baseline of subcategories varies between bars, only the first subcomponent can be visualized efficiently.
Description for Figure 29
This is an image of a 4 category stacked bar chart that displays data for 2 years.
F. 100% stacked bar charts
100% stacked bar charts are used to illustrate the ratio of subcategories. They are similar to stacked bar charts, but show the relative value of each category rather than the absolute value.
Description for Figure 30
This is an image of a 4 category 100% stacked bar chart that displays data for 2 years.
G. Waterfall charts
Waterfall charts pull apart the pieces of a stacked bar chart and show each subcomponent separately. The first bar starts from its natural base value and the rest of the bars start at the value of the previous bar and can have a positive or a negative value.
Description for Figure 31
This is an image of a 4 category waterfall chart.
3. Line charts
Unlike bar charts which emphasize individual values, line charts emphasize continuity and evolution from point to point. They are commonly used to show changes and trends over time.
Description for Figure 32
This is an image of a line chart that displays data over 4 years.
- Dots can be used on the line to emphasize values.
- Spaces or dotted lines can be used when data are missing.
- When multiple items are presented on the same chart, they should have the same units of measure; different colors should be used to distinguish them; and the lines should be visually distinct.
- A different line (e.g., dotted or different color) should be used to distinguish actual data from trends, projections, and targets.
- Shading can be used to show uncertainty.
Description for Figure 33
It is a line chart that displays data over 4 years.
Description for Figure 34
This is an image of a line chart that displays data over 10 years with a space where data are missing for year 4.
Description for Figure 35
This is an image of a 3 category line chart that displays data over 10 years. Each line is a different colour in order to differentiate the measured items from one another.
Description for Figure 36
This is an image of a line chart that displays data over 10 years and the line is dotted from years 8 through 10 to show uncertainty.
Line chart family
A. Grouped line charts
It is possible to present two or more series of data in a grouped line chart. However, the more series there are, the more difficult it is to focus on one at a time.
Description for Figure 37
This is an image of a 3 category line chart that displays data over 10 years. Each line is a different colour in order to differentiate the measured items from one another.
B. Slope graphs
Slope graphs illustrate the relative increase or decrease in a set of variables between two data points. They provide a clear visual ordering among variables and can be used to visualize a ranking.
Description for Figure 38
This is an image of a slope graph that displays data for three categories over 2 years.
4. Point charts
Point charts are typically used to illustrate the trend or pattern of frequency distribution of variables. They usually have an additional element, i.e. a regression line which show the estimated slope of a model.
- It is important that the different dots can be clearly distinguished from each other.
- Dots of different sizes can be used to represent different values. If there is overlap, the smaller dots should be placed in front of the others so that they are visible.
- The relative size of dots can be difficult for readers to estimate.
- Point charts draw the reader's attention to the dots (and their values). If your goal is to illustrate a trend, it is best to connect the dots and use a line chart instead.
- Shading around the points can be used to show uncertainty.
Description for Figure 39
This is an image of 7 dots in a point chart that displays data by age on the X axis and income on the y axis.
Description for Figure 40
This is an image of 7 bubbles displayed in a point chart and the bubbles are of varying sizes which represent the value of a third variable.
Point charts family
A. Lollipop graph
This chart is similar to bar charts. It uses a line for visualizing the values of each variable instead of a bar.
Description for Figure 41
This is an image of a lollipop graph displaying 12 lollipops total, one for each month of the year.
B. Strip plots
Strip plots display the value of each points in a data set. They are useful for visualizing the precise value of each elements in a small data set.
Description for Figure 42
This is an image of a strip plot.
C. Bubble plots
With bubble plots, it is possible to illustrate a third element on the same chart, using bubbles that vary in size depending on the value.
Description for Figure 43
This is an image of 7 bubbles displayed in a bubble plot. The bubbles are of varying sizes which represent the value of a third variable.
5. Maps
A map is a geospatial design that displays information on geographical locations.
Description for Figure 44
This is an image of a map of Canada where different shades of color are assigned to defined regions.
The "choropleth map" uses colors to provide information: different shades of color are assigned to defined regions such as countries, provinces, and cities.
- With choropleth maps, it is preferable to use relative measurements to assign colors, rather than absolute measurements.
- On choropleth maps, it can be difficult to distinguish small areas, such as Prince Edward Island on a map of Canada.
- It is not possible to show uncertainty on the map.
Maps family
A. Tree maps
This type of maps uses rectangles proportional to the relative size of each category to illustrate them.
Description for Figure 45
This is an image of a 5 category tree map displaying rectangles proportional to the relative size of each of the 5 categories.
B. Tile grid maps
In a traditional geographical map, the size of each area has some effects on how we process the information. In tile grid maps, elements with same sizes and shapes are used and the audience can see and process the information without the side effect of element size on their judgement.
Description for Figure 46
This is an image of a tile grid map.
Glossary
Base value
The base value (also called "baseline") is the natural starting point of a variable. Usually the base value of variables is zero, but there are cases where the logical base value is different (e.g. the price index has a base value of 1).
Uncertainty
In statistics, uncertainty refers to the fact that estimates based on a sample or projections may not reflect the true value.
Continuous variable
A continuous variable can take all possible values in a predefined range, as opposed to discrete variables, which can only take certain values in a range, usually integers.
- Date modified: