The following post has been co-authored by Mirko Canovi.
Alberto Cairo is a journalist and infographic expert, who teaches information graphics and visualization at the University of Miami’s School of Communication. In his book “The Functional Art: An introduction to information graphics and visualization” he says:
“The first and main goal of any graphic and visualization is to be a tool for your eyes and brain to perceive what lies beyond their natural reach.”
In my opinion Cairo got the point, and he summarizes well the central role of the Data Presentation. No matter what kind of analysis are you doing, if the reader can’t understand the results, it’s better do not perform the analysis at all.
In this article i’d like to show you some considerations about data presentation. My purpose is not give you the correct “always-good recipe”. What I want is to talk about some tricks for improve the outcomes of a well-done job.
When we design the data presentation of an analysis, we must try to be as much as possible Simple, Clear, Complete and Effective.
1. Be simple.
According to Cairo, we must put the reader on the center. He’s the judge of our work, so we must present the data in a way as much as possible simple for him. In other words, the interpretation of the results must require the minimum effort for the reader.
There are several ways to do this.
Putting too many information in a graph can create confusion, it’s better in this case to use two illustrations instead of one. In addition, if you know that two metrics represent the same thing, then use only one.
Others elements that can create confusion are outliers. Almost every data-set in the real world has outliers or less-important samples. If you can’t filter these samples in your presentation, try to aggregate them.
In the following example, we consider the average number of users per hour of a web server. The chart in figure 1 shows the statistic for every hour. If we look at the data-set, we can find that in the early morning and the late night there are not too many users, so maybe we can show only one aggregate value, like I do in figure 2.
Figure 1: average number of users – 24 hours
Figure 2: average numbers of users – aggregated
Generally speaking, the best suggestion is to use commonly accepted practices. That means both using the old good graphical representation (histograms, curves, dispersion diagrams) and choosing the ones that the reader can understand better. So, if you are working for a specific customer, look at his document templates (if he has some, of course !).
2. Be clear.
When you present the data, the context of the analysis must be clear. Always be sure that the data represented are well understandable by the audience. Minimize or avoid ambiguity, giving the correct information (for example, using proper labels and axes in charts).
Choose always the right option between tables and charts. Charts are good for show differences between two metrics, but if the absolute values are important it would be better using a table. Plus, when you put data in a table, remember to correctly round off values when necessary. Data are results of calculation, and there are often round off errors: for example, don’t use sub-multiple of milliseconds for represent the Round trip Delay of a network packet.
We have also to keep in mind that there are different types of charts, and we must choose the best charts for different variables. Histograms are good for discrete-quantitative data (like the numbers of orders elaborated by an Order Management System), curves are good for continuous data (like the response time of a web server).
But it’s not only a matter of variables, it’s also important what we want to show to the audience. Bubble-charts are a good example. If you are using a bubble chart to show the difference between two variables, keep in mind that the circle’s Area is not proportional to the radius but to the square of the radius.
It’s a matter of perception. If there are two circles, and the radius of one is two times the other, often people say that also the area is two times, but actually is the square! So it’s difficult for some people in this case to “visualize” the correct information (see figure 3).
Figure 3: The radius of the blue circle is two times the purple’s one
Last but not least, pay attention using symbols and colors, they must be absolutely unambiguous. Think about what happens if you show a failure of a server with the “green” color (i swear, i met this case in my life, and it wasn’t a good experience).
3. Be complete.
To be useful, the information we are showing must be complete.
When I was at the university, one of the most frequent mistakes was to forget the measurement units. I agree with my professors, without units a data-set doesn’t make any sense. It’s also a good practice to always put in our chart axes, the origin, the correct labels and legend. This is a way to maximize the information in our chart.
In some analysis, it’s also important to use the confidence intervals for random quantities, when the variance of results is high and the average values are not enough to compare two metrics. Average values aren’t always the best statistics to explain data, but this is another story.
4. Be effective.
To capture the audience’s interest, we need to design the data presentation with the goal clear in mind.
We can’t simply put the data in a chart following the practices i presented before, we also have to help the reader browsing the data to get the information he needs. Remember, “he” and not “we”, because we made the analysis for him.
To achieve this result, when you are designing a chart or a table, try to ask yourself questions like the followings:
Is this chart/table truly necessary ?
Did I put the information in the correct order, the most important one first ?
Did I use the correct scale to represent data and to underline the behavior that i found on the data ? (sometimes the wrong scale can hide some important information)
Did I put a spotlight on the data correlations that i found ?
- Are the data correlations that i found correct/useful ?
Data presentation is a very important and actual topic and everyone talks about of “the art of Data Presentation” in almost every branches of science and media communications.
In fact, people often don’t know the effort needed to perform an analysis (and they don’t want to know); the only way to show them the value of your job is to design in a good way the presentation of the results.
Wait! What about tools?
Ok, I agree with you, we didn’t talk about the tools to build our presentations. But let me ask you a question: is it truly a matter of tools ?
Of course, in the real life we have to deal with different kinds of tools and products to manage an ever bigger amount of data. Tools are good, they are necessary, they sometimes are our best friends, because they can speed up our work.
In Moviri we help our customers in finding, designing and building software solutions to leverage the information “hidden” in their systems, increasing the efficiency and the performance in IT Service management.
But a software can’t be the “silver bullet” good for every situation. The most powerful tool is always located between the monitor and the chair.
Still skeptical? Look at this guy: http://www.ted.com/talks/hans_rosling_on_global_population_growth.html.