Tuesday, December 31, 2019

"How to" Create a Normal Distribution Curve


How-To Create a Normal Distribution Chart

Occasionally there are Forum questions on creating normal distribution curves.  



 
The approach here relies heavily on the concept of data densification develop and refined by Bora Beran, Joe Mako, Jonathan Drummey, Ken Flerlage and others – Readers are encouraged to review their works on creating shapes and charts using the process.


This is a straightforward application of the normal distribution formula:
               
Y= (1/[σ*sqrt(2π])*e
Where  σ= Standard Deviation
                µ=Mean
 
You may find more elegant approaches – when you do please let me know so we all can benefit

The data used here is the “Superstore” data set that comes with Tableau desktop –  while the data set is detailed down to the transaction level the examples here go to the Year/Category level – the concept can be adapted to any level in the Superstore data set or your own

Overview –

Tableau does not draw curved lines – so we simulate the familiar smooth bell curve of the normal distribution with a series of straight lines  on an X-Y Cartesian coordinate grid.  The densification process is used to create intervals on the X-Axis (Sales in these examples).  As the number of intervals increases the plot becomes less angular. There is a parameter to adjust the number of intervals – somewhere between 50-100 intervals produces a reasonably smooth curve –play with it using your own data to see what works for you

– At this point if you have not familiar with Data Densification

STOP


and review one of the resources noted above. 

Step 1 Prepare the data for densification


Duplicate the data set by creating a union of the data with itself. Tableau will create a Dimension "Table Name" that can be used to distinguish the original data set from the duplicate - in the example the original data set is "Orders" and the duplicate in "Orders1"





 
Next create a calculation to define the ends point needed for the X-values


Path =   if [Table Name]="Orders1" then 1 else [max number of intervals] end

                Where [Max number of intervals] is the parameter to vary the x axis spacing

Use the Path formula to create bins –



Set the Bin Size to 1 - it will be the increment on the "X-axis" - (note I changed the bin name to Padded)



 

 
Step 2 – Create the X-values using the bins to pad the data NOTE I used a +- 4 sigma range

First, determine the µ and σ values – the average and the standard deviation values in this example on Sales – but we want to be able to vary the results by year and by category

For that use an LOD’s to create the needed combinations of year and category and the widow function forces the calculation to be executed in each bin across the viz


Mu=  window_min( avg({ FIXED year([Order Date]),[Category] :avg([Sales]) }) )

                Sigma= window_min(min({ FIXED year([Order Date]),[Category] :STDEV([Sales]) }) )

Where  (index()-1) is the bin number (interval) on the x axis


                X-value = (-4*[sigma])+([index-1 ]*(8*[sigma]))/window_max(max([Path])) 


Step 3 – Solve the Normal Distribution formula for Y 

Table calculations are used to determine then Y value at each bin (x-value)
 
For clarity there are 2 calculations 

The coefficient = (1/[σ*sqrt(2π])

                                 Coeff= 1/(sqrt(2*pi()*([sigma]^2)))

The Exponent =
                            Exponent = (-((([X value (Sales)])-[mu])^2)/((2*([sigma]^2)))  )
 
And then combined them in a single formula for Y (note Tableau does not have an "e" function but the EXP() will return the correct value)
                                                
Y= ([coeff]*EXP([exponent]) )
 

Step 4 – Create the viz

Place the X and Y values on columns and rows then add the Mu and Sigma values to the Detail tile




Then add Padded (the bins) to the viz and set the calculation across the bins (all places)





You can also make normalized distribution charts using the formula below

                x-normalized = ([mu]-[X value (Sales)])/[sigma]

and replace the x-value in any of the charts





 
Finally, I added a parameter and a filter to allow the user to change the number of standard deviations visible in the charts
               
 N-Std Dev Filter =  [x normalized (Sales)] >= -[enter number of standard deviations] and              [x normalized (Sales)]<=[enter number of standard deviations]
 

Place it on the filter shelf – it too is a table calculation set to use Padded – then set to True 

The workbook containing these examples and more can be found on my Tableau Public site at https://public.tableau.com/profile/jim.dehner#!/vizhome/Howtocreateanormaldistributionchart_v2018_3/summary

Jim








Monday, December 30, 2019

YoY - Three different approaches

My blog has moved to jimdehner.com 

Thank you for all the support over the past 2 years - I recently updated my blog to make it easier to read, improve the rendering of images and tie back to video presentations that are associated with the content - Check it out at  Jim's new blog  - 





Beyond getting started and want to up your viz game? -

Is this blog for you?

The post here assume you are familiar with writing calculations, creating text tables, and using data in bar and line charts.  You have used Tableau to replicate spreadsheet reports, done some mapping and have created time series dependent line and bar charts.  Most of all you want to take you data visualization game up a notch.
Like many of you I started using Tableau with a background in spreadsheet calculators but really didn’t understand data base driven systems like Tableau.  The transition was anything but smooth. Writing cell-based formulas and copying them down a column came easy, but the concept of dimensions and measures escaped me. Don’t even think abut LOD’s and table calculations.  It took some work, but I became comfortable with the different language and data structure – I call that the “Getting Started” phase.  If you are through Getting Started and ready to use Tableau to tell stories with your viz, use advanced calculations to get deeper into the data or just want to better understand why Tableau does that – then this blog is for you.
No claim is made that the solutions here are the most elegant or unique approach to the problem  – Many would be described as “brute force”, I come from an engineering and marketing background and it is reflected in the approach. 
Finally, I hope to help you get a better understanding of how Tableau works and why the data structure and order of your calculations are so important in getting the results you expect.
I also keep a number of solutions to FAQ’s from the Tableau Community Forum on my Tableau Public site at https://public.tableau.com/profile/jim.dehner#!/  Many include step-by-step how to instructions. I encourage you to check them out and download anything of interest.  That’s why they were created. 

Jim

Use LODs to create layer in your data set

My blog has moved to jimdehner.com 

Thank you for all the support over the past 2 years - I recently updated my blog to make it easier to read, improve the rendering of images and tie back to video presentations that are associated with the content - Check it out at  Jim's new blog  -   



Need help using LOD’s – you’re not alone – But how can they be used


Think of your data set as a pyramid.  The data you upload creates the base of the pyramid – the top is the sum of all the data.  LOD expressions give you a way to create layers in your data and use those layers to categorizes data into dynamic groups, sets or ways to workaround the limitations of table calculations - 


LOD Syntax

LODs use a key word – Fixed, Include, or Exclude - followed by a list of dimensions which set the level of the virtual layer you want to create in the data set -  For example

                                {Fixed  [Segment], [Category]    : …

In words it says “make all the combinations of Segment (3 in Superstore data)  and Category (3) ”  and create a layer that consists of 9 total values – then save them so they can be used later

The colon :   is the divider that ends the layer definition and starts the formula of the value to saved –

                                {Fixed  [Segment], [Category]    :   sum(Sales)} 

The result is a new layer in the data at the Segment/Category level




Creating multiple layers 

Adding dimensions before before the colon will define another layer in the data.  To create a new layer that includes Year level place the Year(order date) in the dimension list

{ FIXED [Segment],[Category],year([Order Date]): sum([Sales])}

Now in addition to layer at the Segment/Category level there is a layer at the Segment/Category/Year level





Filtering and the Order of Operation

LOD expressions can be filter but the results will be affected if dimensions placed in the Context Filter

First use a simple filter on a Dimension in the LOD expression – ( Segment ). The results are filtered as you would expect – the Consumer Segment is filtered out of the data – Three of the 9 values of the Segment/Category are filtered out of the result




Context filters affect

Add a filter for a Dimension not in the LOD expression Region in the example – filter the Central region and the 9 values from the LOD are NOT changed -






Change the Region filter to Context – the results do change





What’s going on?  The  position of the Context filter in Order of Operation causes the results to change.




Context filters are applied BEFORE the LOD calculation in the Order of Operation – if a dimension is placed in Context Tableau will apply the filter before the LOD is calculated –

In this case – the Central Region is filtered out before calculating the Segment/Category LOD

Dimension filters are applied AFTER the LOD is calculated – the results reflect the un-filtered Segment/Category LOD






You control when the filters are applied – there is no single correct way – it depends on the result you expect

Use in other calculations

LODs place a value that is aggregated in new virtual layer in the data set, but they are NOT aggregates in themselves.   That’s powerful! 




Replacing Table calculations

A FAQ on the Forum is how to fix a table calculation – but they are built on the table that underlays each worksheet and can’t be carried form one sheet to another.  

LOD’s can be used to "Fix" some table calculations - for example Percent of total 

The table calculation formula for percent of total used Total() and is then calculated down (in this example)

SUM([Sales]) / TOTAL(SUM([Sales]))

LOD’s can be used in the to replace the numerator and denominator

                sum({ FIXED [Segment],[Category]: sum([Sales])})/sum({ FIXED [Category]: sum([Sales])})


The LOD base Percent of Total and now Fixed at the Segment/Category level 

Nested LOD and Table Calculation

LODs can be also be nested directly in Table Calculations - This example dynamically ranks Sub-Category based on annual sales

The LOD below will total Sales at the Sub-Category/Year level

{ FIXED [Sub-Category],Year([Order Date]):sum([Sales])}

Nesting the LOD in a Ranking function (Table Calculation)  will sequence the categories descending each year

RANK_UNIQUE(sum([fixed annual category sales]),'desc')

And return this








LODs in conditional statements and grouping

Need to separate records into groups based on the value of a single dimension?

Use a conditional statement (  if…then  ) in the aggregation portion of the LOD  (after the :  )
This says “for each order and category combination – separate out the Furniture lines and group the remaining as “Other”

      { FIXED [Order ID],[Category] : min(if [Category]="Furniture" then [Category] else "Other" end  ) }

The LOD will categorize each Order/Category combination into one of 2 groups – you can use the LOD like any other dimension in you viz






Include and Exclude LODs





Where Fixed works above the dimension filters in the order of operations Include and Exclude work after the dimension have been applied












Using the same Segment/Category but adding the Include and Exclude version we can see the effect of the LOD

{ FIXED [Segment],[Category]: sum([Sales])}

{ EXCLUDE   [Segment],[Category]: sum([Sales])}

{ INCLUDE  [Segment],[Category]: sum([Sales])}

Include is applied after the dimension filter  Year(order date) is applied and sums at the  Segment/Category for each year

Exclude applies the totals after the Year(order date) is applied and sums across all the Segment/Categories at the year level





I hope this give you a better idea of how LOD expressions give you the flexibility to add different levels in your data set.  They are a powerful tool that can be used in conjunction with conditional statements, table calculations or nested in other LODs.  

Have fun - the best way to learn more is to use them with some simple example 

A Tableau workbook with copies of the examples presented here can be found on my tableau public site   https://public.tableau.com/profile/jim.dehner#!/vizhome/LayeryourdatawithLODs/FixedSegCatyrsumsales

Jim




Dynamically sort Top N in a Drill Down Hierarchy

My blog has moved to jimdehner.com 

Thank you for all the support over the past 2 years - I recently updated my blog to make it easier to read, improve the rendering of images and tie back to video presentations that are associated with the content - Check it out at  Jim's new blog  -   




Use a scaffold to solve the 2 date problem

My blog has moved to jimdehner.com 

Thank you for all the support over the past 2 years - I recently updated my blog to make it easier to read, improve the rendering of images and tie back to video presentations that are associated with the content - Check it out at  Jim's new blog  - 



The most frequently viewed workbook on my Tableau Public site is a solution to the classic 2 date problem.   –  The data have only a start and end date and you need to know how many programs are active on any date – Ultimately you want something like this


The business question can take many forms
  •        Head count given only a hire and termination date
  •        Account balance given deposit and withdrawal dates
  •        Count of active (or inactive) customers

The file may look like this



But dates cannot be aligned along a single axis




To count the “active programs” we need a data structure where the account start date and end date can be aligned along a common axis



One solution is to create a date file – or scaffold – and attach the detail account data to the scaffold

The scaffold is a simple file of consecutive dates from the earliest date in our data to a minimum of the maximum date (or extended to include future date)





Load the scaffold into Tableau and join it to the detail data set.  Every record in the detail data has to join with each date on the scaffold –








Here is how it is done:

On the data source tab add the Date Scaffold and drag it to the canvas

Join the date scaffold and the detail file






Every record in the detail file is now connected to each date




The start and end dates are aligned with each date on the data scaffold

Add an expression that will count the dates from the date ladder that are between the start and end date

{ FIXED [Account ID],[Scaffold Date]: if min([Scaffold Date])>=min([Start Date]) and min([Scaffold Date])<max([End date]) then 1 end   }

And create the viz





The actual workbook with How To Instructions can be downloaded from the Tableau Public site at https://public.tableau.com/profile/jim.dehner#!/vizhome/scaffold-2dateexample/Overview



Jim

More than dots on a map - The renewable energy story

My blog has moved to jimdehner.com 

Thank you for all the support over the past 2 years - I recently updated my blog to make it easier to read, improve the rendering of images and tie back to video presentations that are associated with the content - Check it out at  Jim's new blog  -   


Renewable Energy – How big is it-


A dashboard should tell a story beyond just presenting data.  It needs to be interesting and when possible encourage the viewer to interact and investigate to make the story their own. This is the end product


But that's several iterations later -

It  started like this - as a result of a trip from Nashville to Chicago. In northern Indiana there was a 10 mile stretch  where as far as I could see to the east or the west all I saw were wind turbines. Wow – didn’t expect to see that – thought that wind power was just a west coast thing.  

I found a data set that includes the location of every wind turbine in the US -  They were concentrated in the mid-west and south  Not what I expected at all - I had a lot to learn!










While interesting there wasn't much of have a story from that data  – just raised a lot of questions – How much energy do wind turbines produce? Is it a significant portion of overall energy production? What about other renewable sources? How does it vary by state?







What would you do - something like this might work but how would you make that



To start you need data. Lots of data! – Energy generated in Kwh , by generation source at the state level – you could use a search engine and hope for the best – there are also several portals that you could consider

The energy data used in here was found at www.eia.gov the US Energy Information Agency site but each energy source had to be extracted separately – (the a future post will go through the Tableau Prep flow used to combine and clean the separate energy source files). (Note: data is for electric energy generation in 2017) 


But what's the story? – Does energy generated from renewable source really represent 10% of the total?  That’s what I thought before looking at the data but that’s just not right – nationally it is closer to 15% and it some states renewable energy sources can account to 85% of the total. 


  • the portion of energy generated from each of the renewable source in each state
  • the state rank nationally 
  • and the trend since the start of the century

That would be a much more interesting story that just the location of  wind turbines that got my initial interest.   It would also be good if the dashboard was dynamic so individual users could look at their state or region.

The upper portion of the viz is where you can create the  most impact – Make the title a statement that takes the user away from any preconceived notions and use BAN blocks present both present the facts and set the color code for the rest of the viz




The block color – teal for Hydro, Gold for Solar, Red for Wind and Black to total all sources will be used throughout the remaining charts

Now build on the story including a Trend chart to show growth since the start of the century and  a state level ranking chart to see a comparison throughout the area.  They are relatively straight forward area and bar charts using the percent of total calculation




Finally, allow the user to select a state or area and change all the charts accordingly –
That’s the fun part and you may have noticed references to sets in the previous charts -




Sets and set actions driven off a state map make the dashboard dynamic. 
To create the State set, Open the STATE pill and Select CREATE then SET


When the box opens Select any STATE – we are going to change it with a set action, so the initial value is not important



Next create a simple Map at the State level
Drag State to the canvas to create the map and be sure to place the State Set on the marks card




Bring it all together on a dashboard using containers to fix the location of the individual sheets



Make your point in the title and always include attributions for data sources, functions or material that are not your own.  Also, if you want the user to do something you need to tell them what to do and why

One last thing – need to make the dashboard dynamic by adding a set action
And then set the action to go from the map on the selection form the viz






on the individual worksheets set action filter has been added



The filter is set as IN meaning that the States selected from the Map will be applied as the filter – you can select a single state, or several using a Cntl Select or use the lasso to select a area
  
Congratulations – you’re done!





Download the actual workbook, complete with  “How To” instructions, from my Tableau Public site at https://public.tableau.com/profile/jim.dehner#!/vizhome/RenewableEnergySources/Dashboard1  


Jim

My Blog Has Moved

My blog has moved to jimdehner.com  Thank you for all the support over the past 2 years - I recently updated my blog to make it easier to re...