TDWI St. Louis September Meeting Recap – Data Visualization with Dona Wong and Daniel Murray

This is a recap I wrote for the internal newsletter at my day job. Since TDWI is open to the public and attended by professionals from many different industries I thought it would be smart to share it with anyone who is interested. 

This past month the TDWI (The Data Warehousing Institute) St. Louis chapter hosted a great meeting around the topic of data visualization. The two presenters were phenomenal and their talks couldn’t be more insightful to our own efforts.

The first presentation was from Dona Wong. A successful acolyte of data visualization with over 20 years of experience, Dona has worked to create meaningful storytelling through the visualization of data.

Currently Dona works as the vice president and director of research at the Federal Reserve Bank of New York. Before joining the Federal Reserve, Donna was a graphics editor for the New York Times and more recently the graphics director at The Wall Street Journal. There she spearheaded a clean, consistent, and well-designed guide to data visualization used across the paper’s online and print editions.

She compiled her research and approach into a best-selling book titled The Wall Street Journal Guide to Information Graphics.

The focus of Dona’s presentation was to provide a straight-forward overview on how to approach data visualization with practical real-world benefits. She told the story about how children learn first their ABCsthen words, then sentences, and finally the ability to editorialize and create new narratives. “We don’t do the same with information visualization. On the first day we jump into the deep end!

Dona’s approach is to not jump into the deep end but to think about telling a good story – not just the data – and identify a narrative. This is the focus of data visualization for Wong, to create stories that awaken our imagination and create positive, actionable experiences for our audience.

An example of this was a project she undertook to look at mortgage rates and foreclosures for the geographic area around New York City. There are many ways to visualize that data and each approach could tell a different story. When trying to tell a story around housing credit, the question she wanted to answer was Who is borrowing?. By visualizing the information as a geographic heat map she was able to tell the story of who by the percentages of foreclosures by zip code.

Dona then went on to share her Seven Deadly Sins of data visualization. These seven tropes are common mistakes she sees when individuals are tasked with visually representing data.

1. I have some numbers. Let’s make a chart

People will often pick whatever chart is handy or ‘looks cool’ to display their data. Colors should not be picked randomly – Colors have meaning!. Dona recommends that you take out whatever is not germane to the story. Make sure your visualizations have a crisp meaning.

Another simple recommendation is to add headlines to charts. Don’t just rely on axis and legend labels to tell the story. Declare it outright. Rate of mortgage foreclosures for the month of December showing and increase in poor neighborhoods tells a much better story than Chart 1.

2. We make our audiences work way too hard

Often in presentations in report we put our data pages away from our insights. Put them together so your audience can easily make the correlations and understand the story. Don’t put charts at random parts of your presentation/report to ‘break things up’. Put them were they make sense.

3. Choose chart format like a t-shirt

Dona regaled us with a story of the time she was on a plane and watched as another passenger worked on a chart. They went through all the preset chart types in Excel before randomly picking the one that ‘looked the coolest’ – without consideration that the chart from should follow the data. Picking the right chart is not an act of whimsy, but a deliberate decision to be made.

Her approach is to look at the strengths of the various chart types and leverage them for your narrative.

  • Trending Data – Line charts quickly show trends with the slop of a line and quickly show discrete quan
  • Ranking Data – Horizontal bars work well for ranking data. Make sure you include titles.
  • Share of a Whole – A pie chart is only useful for representing data that’s part of a bigger whole.

4. Pie charts

Pie charts suck. People can easily tell length and width, but not area. It’s hard for use to look at a pie chart and determine if the slices are exactly equal or close. 25% looks a lot like 30%!

If you do use a pie chart, again only to show the share of a whole, use only 5 slices – no more! Dona jokes, If I told you that you could use maybe 6, then you’d use 7 – and then you’d use 7 and say maybe 8! No more than 5 slices and group any further percentages together as an ‘Other’ group.

5. Start vertical bars at zero

Because vertical bars (and horizontal for that matter as well) measure discrete quantities they should always start at zero. Starting at a different number will skew the perception of the differences in the values your bars are representing.

6. 3D

Don’t ever use it.

Think about your audience and be sure to give them the most clear picture of the data. A 3D bar chart showing performance raises will skew the interpretation of the data. Does the data at the top of the bar mark the increase – or at the top of the 3D effect? That’s a big difference when you’re talking about things that are important to your audience.

7. Color

Dona told us that colors should be treated like our in-laws. Admit colors into your charts cautiously – as you would in-laws visiting from out of town. If there is a pattern in your visualization, don’t break it up with colors just for the sake of it. Colors can be consigns, or send the wrong message.

She also highlighted the importance in making sure the colors in your visualizations are accessible. As many as 8 percent of men and 0.5 percent of women experience the common form of red-green color blindness. A quick check is to turn your chart info black and white. Can you still make out the differentiation?

Dona ended by providing three takeaways. Ask What, ask Why, and ask How.

What are you trying to do. It’s not just about data. Identify a message and make sure it gets across to your audience.

Why are you providing this visualization? To provide insights. Make connections and motivate your audience to action.

How do you choose your design? Don’t accept default settings. Don’t trust them. Pick the right chart form.

Following Dona’s advice will allow you to engage and connect with your audience and have influence. Making a chart is not a form of self-expression. It’s all about communication.

You should really check out Dona’s book, The Wall Street Journal Guide to Information Graphics.

After Dona’s presentation Daniel Murray got up to talk about “Big Data Visualization. Daniel is the Director of BI Services at InterWorks a global IT consulting firm based out of Oklahoma.

Dan spoke about how two of the three strengths of Big Data have already been solved (Volume and velocity) but that the third – variety – will be an important intersection in the future of business intelligence.

Providing systems and tools that allows for a variety of data to come together and make data more accessible to non-technical information consumers is the reality of the future. No organization has a single source of data.

The future data ecosystem, according to Dan, will have five steps.

  1. Capture Data
  2. Select field of interest
  3. Tools will create a self-generating schema with semi-auto ETL
  4. Then consumers can visualize the data however they see fit

The importance of data visualization comes into play when making data accessible to end users allowing for faster responses, less effort, and new insights. Dan claims that traditional tools are like a train on tracks to a specific destination. Emerging tools are all about new discoveries. People come out of the woodwork with self-service tools. They bring unique views you can’t get with the traditional BI stack.

The role of the IT folks will be coordinating effective data governance and ensuring that new data consumers understand the responsibility that comes along with the new found accessibility.

By having good, clearly communicated governance you can empower consumers and let them use their intimate operational knowledge to provide greater insights than what could be discovered before.

Dan ended his presentation with a few recommendations. First he suggested that data folks create a proof-of-concept that takes four weeks or less to test out these ideas. Take a few data sources, provide access to non-devs, and see what insights and visualizations emerge.

He also recommend three great books to read.

Conclusion

The presentations were top-notch, the location was perfect (and comfortable even with a packed house), and the snacks and refreshments were very well done. Best of all, the vendors were all smartly introduced and relevant to the discussions for the day. In all, a professional event. I hope other co-workers have an opportunity to attend one of the upcoming TDWI St. Louis events.

Thanks for stopping by. Please add your thoughts below. Remember, there's a human on the other end.