VISUALIZATION BASICS

  1. The purpose of visualization is insight, not pictures. So turn data into something clear and meaningful in minds of the audience.
  2. The Human brain is devoted to fast visual processing
  3. We are built to detect, match and make sense of patterns
  4. The visual approach is much more effective than tabular display of data
  5. Assign visual attributes to different data: color and shape
  6. Good viz allow users to see what we want them to see before they know they have seen it
  7. Overview first, zoom in, filter, details on demand

WHO WHAT WHY HOW

  1. You have to start by knowing your audience, the stakeholders, who are the SMEs
  2. If you are new, ask SMEs about what they are looking for
  3. WHAT data is available
  4. WHY is the business goal we are going after, why the heck are we doing this
  5. HOW is the result

VISUAL PERCEPTION

Automatic and Immediate – Speed always outweighs accuracy. The brain is wired to quickly make assumptions.

How do you feel about this visualization? How would you reduce clutter?

COGNITIVE LOAD

Cognitive load is the amount of mental effort that we use to get the information that we need. We must reduce clutter to reduce the cognitive load.

INTRINSIC

The amount of memory that we need to understand something

EXTRANEOUS

The amount of extra brain power that I need to deal with poorly designed visualizations.

GERMANE

Is a way for the brain to look for patterns to develop context

CONTRAST

Use contrast to establish very effective charts. Grey out all bars except a specific bar for the item we are trying focus on

PRE-ATTENTIVE ATTRIBUTES

Even if you are new to reading data in a chart, you already have the built-in capabilities to spot light and dark colors, large and small shapes, groups and orientations of objects. These are referred to as pre-attentive attributes. Visual analytics leverages visual cues humans automatically process with sensory memory. These are attributes to get the brain to immediately focus on certain aspects of a visualization. We have three types of memory. We need to focus on the iconic and short term memory. Use color to draw attention

Change one of these to focus the user’s attention:
Size
Color
Orientation
Shape
Line composition
Enclosure
Intensity
Position

ICONIC

This is how we get a sense of what’s happening in front of us in that split second before the brain kicks in.

SHORT TERM

This where we deal with the cognitive load.

LONG TERM

This is what we retain for a very long time.

VISUAL ENCODING

  1. Translate data into visual code – for example if we display a long string of text it would be impossible for the brain to figure out what we are trying to convey, but if we color code – encode – specific letters with a different color that will while fading out the rest will clearly display the message we are trying to convey. If we assign different colors to different letters, it will make the message even harder to view
  2. Pre-attentive attributes – Use color hue, orientation, texture, position
  3. Clarify data for users
  4. Help people clearly and accurately see what we are trying to convey

Consider how your designs can help people make clear, accurate interpretations and gain useful insights based on what they see. There are two fundamental systems that drive how we think and make judgements: Automatic, immediate – Slower and more deliberate cognition

  • Automatic and Immediate – Speed always outweighs accuracy. The brain is wired to quickly make assumptions.
  • Deliberate and Cognitive thinking.

As you see in the image above, you quickly see that the middle bar has a gradient. Even though that’s not accurate until you slow down and analyze it further. But your first instinct is to think it contains a colored gradient – when it doesn’t.

Make sure you align what the perception of your visualization is compared to what you are trying to convey.

REDUCE CLUTTER

GESTALT PRINCIPLES

The Gestalt school of thought was studied in the early part of the last century to evaluate how the brain perceives the world around us. It’s how we ought to organize our visualizations to be effective at conveying information.

PROXIMITY

Is when you place items close together and the mind perceive them to belong together as a group

SIMILARITY

In scatter plots we associate circles to be similar and be together. Or same colored objects.

ENCLOSURE

If we put a dividing line or shade one side of a chart

CLOSURE

Our mind doesn’t need a border or extra text to make a conclusion, so avoid unnecessary lines, text and explanations

CONTINUITY

Bar graphs many times don’t need axis, and can easily have a continuity of thought between them

CONNECTION

Line charts that connect dots are a simple way to show that they are related to each other

TYPES OF VISUALIZATIONS

NUMBER

A simple number can be very powerful – like how many visitors per day. Show something at a very high level as part of an overall dashboard.

TABLE

Table is very effective way to convey lots of information, but be very careful. It can be used for comparison if we Keep it small in size, like a 2X2.

  1. Using —Superstore_Data_Sample
  2. Drag Category or Segment to Rows
  3. Measure Names to Columns/Order Date – Column Headers will display No Measure Values to display
  4. Drag Measure Values/Profit into the  empty table
  5. Measured values will appear now
  6. Drag Subcategory into rows

HEAT/HIGHLIGHT TABLE

Show the relationship of 2 categories depending on the third measure. Can identify the top tier and bottom tier of a composition of numerous outcomes. Highlight Table is a Heat Map but adds a number of the third dimension in the intersecting cell.

  1. Let’s change our table from above to a heat map/Tableau calls it Highlight Table
  2. You will see it is greyed out because we have too many Measure Values
  3. Take out all measure values except profit
  4. Now you see it becomes available to use
  5. Click on it and you will see the Heat Map/Highlight Table

TREE MAP

Show hierarchical data as a proportion of a whole. Is very similar to the HEAT MAP except each cell is sized according to the value it contains as well as the color is varied. They also allow us to see the entire data set together in one map.

You can combine a bar chart with a tree map, which allows us a view of the overall grouping via the bar chart and the tree map will show the breakdown within each bar/grouping.

SCATTER PLOTS

Investigate relationships between quantitative values. For a large number of continuous information which can take unlimited number of answers when compared to another variable, then scatter plots are powerful, because they can show the distribution of the outcome and are typically used in statistics and forecasting. If you are looking for a correlation between the two.

  1. For Scattered plots you need to measures so we can plot on two axis
  2. Take category off the sheet
  3. Add discount
  4. SO we are plotting discount vs profit
  5. Now only one circle shows up
  6. Go to ANALYSIS TAB/Uncheck Aggregated Measures
  7. Because we want to see every single point of discount that’s why we don’t want aggregated

BUBBLE CHARTS

Are used to extenuate the effects or the data of a scatter plot or map plot. Various sizes of circles convey meaning about data

TREND LINE

  1. Let’s add a trend line to the scatter plot above
  2. Left side of screen
  3. Left panel
  4. Next to Data>>Analytics
  5. Trend Line>Drag it to Linear
  6. Now we see a trend going down as we increase discount

LINE GRAPHS

  1. Very effective to plot a variable across time.
  2. You can easily see if there is a trend and allows for potential forecasting.
  3. Continuous data is best in line charts.
  4. To show trends use line charts.

BAR GRAPHS

Depends on how much information you are trying to display. Waterfall graphs and many other versions of bar graphs. They are the most used and easy to read in visualization. For displaying categories as well. Discrete. Stacked column could be used for composition display. If we want to rank categories we can use Bar Charts and sort them by rank.

AREA MAP GRAPHS

Any data with geographic data should be plotted on map graph. User for rates rather than totals. Use sensible base geography

SYMBOL MAPS

Use for totals rather than rates. Be careful as small differences will be very hard to see.

 

GANTT CHARTS

Show duration over time, as used in Project Management. Are a great way to display the time lapses of a project. Start, finish, milestones, accomplishments. Used in PM and resource planning. Display a project’s schedule, deliverables, deadlines, resources available. A good combination would be map charts and Gantt Charts on the same dashboard if geography is applicable.

PIE CHARTS

Can be useful if you keep it to 3 and under categories. TRY TO AVOID AT all costs.

HISTOGRAMS

To display distributions along categories, groups, or bins, or ranges, to better understand the distribution of your data. Experiment different grouping to see which one is more effective. Used to understand the grouping of your data and helps you narrow down your focused research.

BULLET CHARTS

Bullet charts are used to track progress toward a goal – Evaluate performance of a metric against a goal. If we have projected sales, cost…. that will be plotted as a vertical line and the actual sales will be shown as a bar so you can easily see if the target has been met or not. You can actually color the % of the goal/projected sales so then you can clearly see if the sales bar is red or green and you’d know if it met the goal without having to squint to see where the goal/bullet is on the bar.

You can kind of use it instead of a bar chart.

 

 

 

 

 

 

BOX – WHISKER PLOTS

Show the distribution of a set of a data. To understand how your data is skewed towards one end, identifying outliers in your data. This is the very familiar (to me) candle stick chart but with the following breakdown:

  1. The box, which contains the median of the data along with the first and third quartiles, 25% greater and less than the median.
  2. The whiskers,which typically represents data within 1.5 times the inter-quartile range. The difference between the first and third quartiles.
  3. The whiskers can also be used to show the maximum and minimum points within the data.
  4. Here are guidelines regarding box and whisker plots used to show the distribution of a set of data.
  5. For example, understanding your data at a glance, seeing how data is skewed towards one end, or identifying outliers in your data.
  6. Consider hiding the points within the box. This helps a viewer focus on the outliers.
  7. Consider comparing box plots across categorical dimensions. Box plots are great,allowing you to compare distributions between data sets quickly.

ANALYSIS

EXPLORATORY

Before you start working on the data you need to explore it. See what does it contain, what each dimension means!!! This is very important. Many times we assume from the way a column is labeled that the data is SUPPOSED to mean what WE ASSUME. Ask to be sure.

You need to know where your data comes from. How is it labeled? Is it structured or unstructured data? Are there holes in your data? Are you seeing any trends with the data? Any relationships forming as you become familiar with it? Knowing these items will make it easier for you to analyze your data.
When exploring your data, although there is no direct process to follow, you should always have a clear understanding of what questions you are trying to answer. Knowing which questions you are trying to answer will make it much easier for you to choose your variables for analysis, and also determine which analyses to use when analyzing the data.
For example, let’s assume you have a data set of sales orders across the US, over a certain period of time, let’s say the past 20 years. In the exploratory phase, you can try different  combinations of variables trying to find associations between states, sales personnel, sale orders, etc., over the 20 year period. In the exploratory phase, you find that certain states have higher than normal sales orders.
Now, in the explanatory phase, you could show which states have the highest sales orders, and try to explain why that is. You could show that the states with the highest sales orders
have the highest population, or that the states with the highest sales orders have the most experienced sales personnel.

EXPLANATORY

Explanatory analysis is what happens when you have something specific you want to show an audience. Explanatory analyses involves explaining the outcomes or relationships you may have found during your exploratory phase. In other words, when you want or need to tell a story with the data.

ANSCOMBE’S QUARTET

Anscombe Quartet revolutionized data analysis. It was first posited in the early 1970s, he stated that you can’t just use summary statistics to understand the data, you have to visualize it.
In a world where we have data sets that could be in the trillions of records, Anscombe’s argument is even more relevant today. It’s not to say that summary statistics aren’t important.
They are absolutely essential, but you must also visualize it.

 

ETHICS & GOOD PRACTICE

  1. Zero Baseline – Don’t start with an adjust baseline to exaggerate numbers – When you do the visual difference is much higher than what it actually is and will mislead viewers and reduces the credibility of the data. You will see often in NEWS and on TV
  2. In this graph when you look at it you will see a sharp decrease after 2005 – But if you look closer it is the exact opposite because the ZERO AXIS is on top which reverses everything. This is not the way we plot graphs.
  3. PIE CHART – Use bar chart instead. Here the same data is shown in two charts. In the pie chart it appears taht person A, B, and C seem to have very similar values, and you still have to label each slice with a number. The simpler the better, so skip it and just use the bar chart.
  4. CONTEXT – Look at this chart, it is meant to display the amount of funds charity events have raised. Their purpose is to show which type of activity raised more money. DO you think it is effective? 1- Why is the Running/Yellow tube located in the middle yet it is labeled as the first one down below? How do we know how many cycling events took to raise 61 Million dollars. Did it take 100 events, one event, how many, and how does that compare to walking? Do they want us to assume that cycling raises more money? If so the data is misleading because of the questions I raised.
  5. Do not use 3D visualization. It is a two dimensional measurement.
  6. Use a bar graph showing the significant amount raised
  7. Use a way to show that cycling raised more than the two combined
  8. This is meant to show the difference in sales for the 1st month of weed in each of the states- You see that the yellow coloring is misleading- why is OR almost all yellow? That could only be if we have skewed the y-axis and making 3.48 be the maximum value. IT says it is based on tax. What is the tax rate per state, so how do we know one is not double the others. How do we know the population that has access to the dispensaries in each state. How do we know how many ounces are legal to sell at most if applicable? We can display the tax rate and back calculate the actual sales that yielded the tax income so we can compare apples to apples since the tax rate in CO is 12.9, in WA 37% and in OR 17%

 

DESKTOP

NEW KEY: TDPZ-2A02-F590-7D7B-13B0

FIRST KEY:  TDH2-4740-0410-69E0-05A6

PUBLIC

em…test@  WH—1