Contents
VISUALIZATION BASICS
- The purpose of visualization is insight, not pictures. So turn data into something clear and meaningful in minds of the audience.
- The Human brain is devoted to fast visual processing
- We are built to detect, match and make sense of patterns
- The visual approach is much more effective than tabular display of data
- Assign visual attributes to different data: color and shape
- Good viz allow users to see what we want them to see before they know they have seen it
- Overview first, zoom in, filter, details on demand
WHO WHAT WHY HOW
- You have to start by knowing your audience, the stakeholders, who are the SMEs
- If you are new, ask SMEs about what they are looking for
- WHAT data is available
- WHY is the business goal we are going after, why the heck are we doing this
- HOW is the result
VISUAL PERCEPTION
Automatic and Immediate – Speed always outweighs accuracy. The brain is wired to quickly make assumptions.
How do you feel about this visualization? How would you reduce clutter?
COGNITIVE LOAD
Cognitive load is the amount of mental effort that we use to get the information that we need. We must reduce clutter to reduce the cognitive load.
INTRINSIC
The amount of memory that we need to understand something
EXTRANEOUS
The amount of extra brain power that I need to deal with poorly designed visualizations.
GERMANE
Is a way for the brain to look for patterns to develop context
CONTRAST
Use contrast to establish very effective charts. Grey out all bars except a specific bar for the item we are trying focus on
PRE-ATTENTIVE ATTRIBUTES
Even if you are new to reading data in a chart, you already have the built-in capabilities to spot light and dark colors, large and small shapes, groups and orientations of objects. These are referred to as pre-attentive attributes. Visual analytics leverages visual cues humans automatically process with sensory memory. These are attributes to get the brain to immediately focus on certain aspects of a visualization. We have three types of memory. We need to focus on the iconic and short term memory. Use color to draw attention
Change one of these to focus the user’s attention:
Size
Color
Orientation
Shape
Line composition
Enclosure
Intensity
Position
ICONIC
This is how we get a sense of what’s happening in front of us in that split second before the brain kicks in.
SHORT TERM
This where we deal with the cognitive load.
LONG TERM
This is what we retain for a very long time.
VISUAL ENCODING
- Translate data into visual code – for example if we display a long string of text it would be impossible for the brain to figure out what we are trying to convey, but if we color code – encode – specific letters with a different color that will while fading out the rest will clearly display the message we are trying to convey. If we assign different colors to different letters, it will make the message even harder to view
- Pre-attentive attributes – Use color hue, orientation, texture, position
- Clarify data for users
- Help people clearly and accurately see what we are trying to convey
Consider how your designs can help people make clear, accurate interpretations and gain useful insights based on what they see. There are two fundamental systems that drive how we think and make judgements: Automatic, immediate – Slower and more deliberate cognition
- Automatic and Immediate – Speed always outweighs accuracy. The brain is wired to quickly make assumptions.
- Deliberate and Cognitive thinking.
As you see in the image above, you quickly see that the middle bar has a gradient. Even though that’s not accurate until you slow down and analyze it further. But your first instinct is to think it contains a colored gradient – when it doesn’t.
Make sure you align what the perception of your visualization is compared to what you are trying to convey.
REDUCE CLUTTER
GESTALT PRINCIPLES
The Gestalt school of thought was studied in the early part of the last century to evaluate how the brain perceives the world around us. It’s how we ought to organize our visualizations to be effective at conveying information.
PROXIMITY
SIMILARITY
ENCLOSURE
CLOSURE
CONTINUITY
CONNECTION
Line charts that connect dots are a simple way to show that they are related to each other
TYPES OF VISUALIZATIONS
NUMBER
A simple number can be very powerful – like how many visitors per day. Show something at a very high level as part of an overall dashboard.
TABLE
Table is very effective way to convey lots of information, but be very careful. It can be used for comparison if we Keep it small in size, like a 2X2.
- Using —Superstore_Data_Sample
- Drag Category or Segment to Rows
- Measure Names to Columns/Order Date – Column Headers will display No Measure Values to display
- Drag Measure Values/Profit into the empty table
- Measured values will appear now
- Drag Subcategory into rows
HEAT/HIGHLIGHT TABLE
Show the relationship of 2 categories depending on the third measure. Can identify the top tier and bottom tier of a composition of numerous outcomes. Highlight Table is a Heat Map but adds a number of the third dimension in the intersecting cell.
- Let’s change our table from above to a heat map/Tableau calls it Highlight Table
- You will see it is greyed out because we have too many Measure Values
- Take out all measure values except profit
- Now you see it becomes available to use
- Click on it and you will see the Heat Map/Highlight Table
TREE MAP
Show hierarchical data as a proportion of a whole. Is very similar to the HEAT MAP except each cell is sized according to the value it contains as well as the color is varied. They also allow us to see the entire data set together in one map.
You can combine a bar chart with a tree map, which allows us a view of the overall grouping via the bar chart and the tree map will show the breakdown within each bar/grouping.
SCATTER PLOTS
Investigate relationships between quantitative values. For a large number of continuous information which can take unlimited number of answers when compared to another variable, then scatter plots are powerful, because they can show the distribution of the outcome and are typically used in statistics and forecasting. If you are looking for a correlation between the two.
- For Scattered plots you need to measures so we can plot on two axis
- Take category off the sheet
- Add discount
- SO we are plotting discount vs profit
- Now only one circle shows up
- Go to ANALYSIS TAB/Uncheck Aggregated Measures
- Because we want to see every single point of discount that’s why we don’t want aggregated
BUBBLE CHARTS
Are used to extenuate the effects or the data of a scatter plot or map plot. Various sizes of circles convey meaning about data
TREND LINE
- Let’s add a trend line to the scatter plot above
- Left side of screen
- Left panel
- Next to Data>>Analytics
- Trend Line>Drag it to Linear
- Now we see a trend going down as we increase discount
LINE GRAPHS
- Very effective to plot a variable across time.
- You can easily see if there is a trend and allows for potential forecasting.
- Continuous data is best in line charts.
- To show trends use line charts.
BAR GRAPHS
Depends on how much information you are trying to display. Waterfall graphs and many other versions of bar graphs. They are the most used and easy to read in visualization. For displaying categories as well. Discrete. Stacked column could be used for composition display. If we want to rank categories we can use Bar Charts and sort them by rank.
AREA MAP GRAPHS
Any data with geographic data should be plotted on map graph. User for rates rather than totals. Use sensible base geography
SYMBOL MAPS
Use for totals rather than rates. Be careful as small differences will be very hard to see.
GANTT CHARTS
Show duration over time, as used in Project Management. Are a great way to display the time lapses of a project. Start, finish, milestones, accomplishments. Used in PM and resource planning. Display a project’s schedule, deliverables, deadlines, resources available. A good combination would be map charts and Gantt Charts on the same dashboard if geography is applicable.
PIE CHARTS
Can be useful if you keep it to 3 and under categories. TRY TO AVOID AT all costs.
HISTOGRAMS
To display distributions along categories, groups, or bins, or ranges, to better understand the distribution of your data. Experiment different grouping to see which one is more effective. Used to understand the grouping of your data and helps you narrow down your focused research.
BULLET CHARTS
Bullet charts are used to track progress toward a goal – Evaluate performance of a metric against a goal. If we have projected sales, cost…. that will be plotted as a vertical line and the actual sales will be shown as a bar so you can easily see if the target has been met or not. You can actually color the % of the goal/projected sales so then you can clearly see if the sales bar is red or green and you’d know if it met the goal without having to squint to see where the goal/bullet is on the bar.
You can kind of use it instead of a bar chart.
BOX – WHISKER PLOTS
Show the distribution of a set of a data. To understand how your data is skewed towards one end, identifying outliers in your data. This is the very familiar (to me) candle stick chart but with the following breakdown:
- The box, which contains the median of the data along with the first and third quartiles, 25% greater and less than the median.
- The whiskers,which typically represents data within 1.5 times the inter-quartile range. The difference between the first and third quartiles.
- The whiskers can also be used to show the maximum and minimum points within the data.
- Here are guidelines regarding box and whisker plots used to show the distribution of a set of data.
- For example, understanding your data at a glance, seeing how data is skewed towards one end, or identifying outliers in your data.
- Consider hiding the points within the box. This helps a viewer focus on the outliers.
- Consider comparing box plots across categorical dimensions. Box plots are great,allowing you to compare distributions between data sets quickly.
ANALYSIS
EXPLORATORY
Before you start working on the data you need to explore it. See what does it contain, what each dimension means!!! This is very important. Many times we assume from the way a column is labeled that the data is SUPPOSED to mean what WE ASSUME. Ask to be sure.
EXPLANATORY
ANSCOMBE’S QUARTET
ETHICS & GOOD PRACTICE
- Zero Baseline – Don’t start with an adjust baseline to exaggerate numbers – When you do the visual difference is much higher than what it actually is and will mislead viewers and reduces the credibility of the data. You will see often in NEWS and on TV
- In this graph when you look at it you will see a sharp decrease after 2005 – But if you look closer it is the exact opposite because the ZERO AXIS is on top which reverses everything. This is not the way we plot graphs.
- PIE CHART – Use bar chart instead. Here the same data is shown in two charts. In the pie chart it appears taht person A, B, and C seem to have very similar values, and you still have to label each slice with a number. The simpler the better, so skip it and just use the bar chart.
- CONTEXT – Look at this chart, it is meant to display the amount of funds charity events have raised. Their purpose is to show which type of activity raised more money. DO you think it is effective? 1- Why is the Running/Yellow tube located in the middle yet it is labeled as the first one down below? How do we know how many cycling events took to raise 61 Million dollars. Did it take 100 events, one event, how many, and how does that compare to walking? Do they want us to assume that cycling raises more money? If so the data is misleading because of the questions I raised.
- Do not use 3D visualization. It is a two dimensional measurement.
- Use a bar graph showing the significant amount raised
- Use a way to show that cycling raised more than the two combined
- This is meant to show the difference in sales for the 1st month of weed in each of the states- You see that the yellow coloring is misleading- why is OR almost all yellow? That could only be if we have skewed the y-axis and making 3.48 be the maximum value. IT says it is based on tax. What is the tax rate per state, so how do we know one is not double the others. How do we know the population that has access to the dispensaries in each state. How do we know how many ounces are legal to sell at most if applicable? We can display the tax rate and back calculate the actual sales that yielded the tax income so we can compare apples to apples since the tax rate in CO is 12.9, in WA 37% and in OR 17%
DESKTOP
NEW KEY: TDPZ-2A02-F590-7D7B-13B0
FIRST KEY: TDH2-4740-0410-69E0-05A6
PUBLIC
em…test@ WH—1