Saturday, October 30, 2021

FAQ Series - Drilling down in Top N and Other

Frequently we see questions on how identify the Top N and group the "Other".

A common problem with many business applications - determining the Top N can be easily done in a number of ways.  (Presented on VizConnect - see the recording at Link to Recording)

1 The Top N filter

Just apply a Top N filter to the dimension

A more typical question adds some complexity where the user wants to see the Top N across multiple dimensions which can be solved using Index() or Rank() 

2 The Index() solution

Index is based on the position of the record in the data table so start by sorting the data:

Add the Index() (note can be hidden)

and finally add a filter set to N

It works well but there is a need to pre sort the data and when there are multiple dimensions you will need to determine which column to use in the sort: 

3 The Rank solution

An alternative is to use a ranking based on either a measure or an LOD. All forms of Rank are table calculations and look something like this:


and each segment can be treated independently without presorting the data:

4 Top N and Other

Users often want to see the Top N individually and group the remaining as "Other".  It requires a complex solution that combines both LOD's, Table Calculation, the use of Show/Hide filters and a hack 

Start with an LOD to find the Top N and "Other" based on subcategory and region (Note I am using Include to support drilling down into the Region Hierarchy) :

{ INCLUDE  [Region],[Category]:sum([Sales])}

and then Rank the LOD 

rank(sum([6 lod sales region subcat]),'desc')

to start building a data table for the viz like this 

for flexibility I added a parameter to set the N level for the Top N and created this T/F calculation

[6 rank for lod sub reg]<=[N to top N]

and used this to add a Label to the "Other" group

if [6 rank tf ] then attr([Sub-Category]) else "Other" end

 the data table now look like this:

use a table calculation to total the "Other" 

window_sum(if [6 subcat and other label]="Other" then sum([6 lod sales region subcat]) end)

and restart every Region places the Other Total on all subcategories not in the Top N - after sorting the table now looks like this

Now the hack, we only want to see each of the Top N subcategories and 1 of the subcategories labeled "Other"  in each region  - That is done with a show hide filter:

if [6 rank for lod sub reg]<=[N to top N]+1 then "show" else "hide" end

After applying the filter and hiding the real subcategory labels the data table is complete

 as a bar chart the results look like this

5 Drill down in Hierarchy

One advantage of using the Include LOD is that it supports drill into the hierarchy finding the Top N at subsequently lower levels

the LOD expression used here was 
{ INCLUDE  [Region],[Category]:sum([Sales])}

which will support drilling down from Region to the State level - at the Region level and filtered for Central the bar chart looks like this 

Drilling down in the Region hierarchy to the State level will return the Top N and Other unique to each state

Hope this helps the next time you need to find the Top N and group the Other

The examples used here,and a few others,  can be downloaded at workbook here



Monday, October 4, 2021

Sets and Filters the Condition and Top N Tabs - What do they do?

Using  Filters and Sets is nothing new  - but how often do you use the "Condition" or "Top N" tabs - and have you ever used them together.

Here we are going to look at how to use Conditions and Top N in combination with Sets and Filters - 

The Condition and the Top N tabs are on the setup for Sets and Filter - They look the same on each and provide the capability to further filter the dimension used in the set or filter.  The can interact with each other - in an   AND  fashion and can be used in combinations with filters on other dimensions or sets -   (Presented at VizConnect Link to Recording)

Example 1 - Simple Filter

Using Superstore data, I've created a parameter on Region and a filter on State with this formula  on the condition tab - 

Sum([Sales])>10000  and Max(Region) = [Region Parameter]

The Formula window works like and LOD - dimensions and measures must be aggregated and you can not use ATTR(). The condition will filter out all States with total sales less tha10000 and only states in the Region selected in the parameter will be included.

Now add in a Top N type filter ( I used the Bottom 4)

And the Bottom N is applied in addition ( AND filtering )  to the filters applied on the condition tab.

Before seeing more complex examples we need to see how the Order of Operation will affect the filtering. 

The Top N filter takes place in Step 4 in the Order of Operations along with the creation of Sets and Fixed LOD's expressions.   That is after Context Filters but before Dimension filters are applied.  It is also before any other type of calculations including Table Calculations are determined in steps 7 through 10 - 

Example 2 - Top N Filter with Context Filter

We will start with a Superstore example to find the Top 7 States based on sales 


With Region in Context the filter will return in 7 state with the greatest sales from the regions in the view

if Region is removed from Context the top 7 are determined before the Region filter is applied and 

Example 3 - Multi Dimension Top N  Sets 

Now let's look at combining additional dimensions in the Top N.  We still want the top 7 States but the users also wants the top 7 based on the total sales of selected subcategories - 

First we need an LOD to determine the sales at the combination of State and Subcategory

{ FIXED [Sub-Category],[State]:sum([Sales])}

Next create a set with this condition 

sum({ FIXED [State],[Sub-Category]: sum([Sales])} )= 

sum([3 LOD sales by subcat and state])  

and add in the Category and subcategory Filter to Context -  and the set will include the 7 states that have the max total sales based on the filter subcategories 


when the subcategories are changed the Top 7 adjust :

Example 4 - Fixed 7 States find the Top N subcategories

Users want to find the top categories within the 7 overall best States. 

Now the Order of Operation and the use of Table Calculations come into play.  First, we need to first determine the top 7 states and then within each state find use Rank to order the subcategories -

Ranking functions are table calculations that are determined in Step 10 of the OOO after the sets and lods are determined in Step 4.   Using the LOD we calculated in the previous example 

{ FIXED [Sub-Category],[State]:sum([Sales])}

add that to a Ranking function

RANK_UNIQUE(sum([3 LOD sales by subcat and state]),'desc')

and replace the set with the ranking function the number of categories to show

It is easier to see the result in a text table format 

The top 7 States overall are first determined then the top 3 subcategories are identified in each state

Example 5 - the Reset Button 

Ever have a dashboard where you've applied a several filters and you would like to "Reset them"  with a single click - It can be done with a "Reset Button" - No magic here just the creative use of an action filter 

Create a worksheet with a shape and a label - 

then on a dashboard add an Action Filter - from the Rest button to sheet where the filters are applied  - the select each filter you want to reset - you will get a Warning Message  but that is not a problem

Select the rest button will return all values to the filters

Example 6 Default Values for filters

Some users want set default values for filters and it can be done with some creativity using a parameter ( Credit for this goes to Joshua Milligan - Tableau Zen Master Hall of Fame -  Link to Joshua's work  )  It is done by first creating a parameter on Region and adding in 2 values - All and Default

Next create a case statement for Default   In this example there are 4 States we want for the when the user selects Default - they are connected with OR

CASE [Region Param with all and default]

When "Default" then

[State] ="New York" or

[State] = "Illinois" or

[State]= "Florida" or 

[State]= "Washington"


then we just need a TF filter 

[Region Param with all and default] = "All" or 

[Region]=[Region Param with all and default]

or CASE [Region Param with all and default]

when "Default" then [8 Default state from region]


When All is selected in the Region parameter the statement returns a True.  if the user selects and single region just that region will return a True. When Default is selected the Default case statement returns a True for the 4 states and returns this 

Example 7  Cascading Parameters

Frequently users want to have the drop down values in one parameters be dependent on the value of another parameter.   Currently that is not capable in "out of the box Tableau" and I have responded to users to use a parameter and a "Relevant Value Filter or Set" on the Condition Tab like this:

Which would create a set base on the parameter and only show the IN values in the set 

and from there you can use the set to drill down 

If you can use extensions there is another option

First you need to create 2 parameters - one to drive the extension - here on Category

Then a second parameter that will receive the values from the cascading filter and place both on the worksheet

Search Parameter and select Data Drive Parameter

the extension needs to be configured - on the first page Identify the parameter that will accept the values (the second above), the worksheet it is on and the dimension on that sheet that relates to the values ( here subcategory)

on the Option Tab select 'Update List when Parameter Changes' and the parameter that will drive the list (the 1st parameter - here Category Parameter)

Then create the viz and the first parameter will drive the drop down list in the second 

I hope this help explain the use of the Condition and the Top N tabs and has given you some ideas on how you can use them in your vizes 

The workbook used here can be found at   Link to Workbook



Sunday, August 22, 2021

5 Ways to do YoY and YTD -

YoY and YTD calculations are fundamental business metric - afterall who doesn't want to know how they compared to last year, last month or how they are tracking on a year to date basis.  It seems that users at all experience levels have questions on how to complete period over period and period to date type calculations or have different expectations for the outcome. (See the recording  VizConnect Tableau Community Data Doctor Session August 2021 - YouTube )

Here we will look at 5 different approaches starting with the most straightforward - table calculations to the advanced duplicating the dataset and joining it to itself.  

I will be using subsets of Superstore data in the calculations. (Note:there is a link to downloadable workbook)

1 Table Calculations

The easiest way to do a period over period calculation is with a standard table calculation  Percent Difference:

Easy enough, but most users want to eliminate the empty column for the initial year in the series.  That can not be done with a table calculation

2 LOD and a fixed starting point

The second method uses 2 LODs and requires a fixed starting point.  That starting point be from a parameter, the current date (Today())  or the latest date in the data set {max(date)}  The example uses Today() -

The current YTD value can be determined using the LOD:

{ FIXED [Category]: sum(  if DATETRUNC("year",[Order Date])= DATETRUNC("year",TODAY())and DATETRUNC("day",[Order Date])<= DATETRUNC("day",TODAY()) then [Sales] end )}

Where the year of the date dimension is checked against the year of Today() and all the days in the year prior to and including Today() and included in the total. 

Similarly the prior YTD value calculation is:

{ FIXED [Category]: sum(  if DATETRUNC("year",[Order Date])= DATETRUNC("year",dateadd("year",-1,TODAY())) and DATETRUNC("day",[Order Date])<= DATETRUNC("day",dateadd("year",-1,TODAY())) then [Sales] end )}

and compares the year of the date to 1 year prior to Today()  using a Dateadd function.

Then the YoY percent difference is simply calculated using:

(sum([Current YTD Sales (today)])-sum([Prior YTD Sales (today) ]))/

sum([Prior YTD Sales (today) ])

The LOD approach is the one I recommend most often - the LODs for the Current and Prior YTD are portable from sheet to sheet and can be used in any other calculation.  They are also easily converted to use with a parameter for the end date or the max date in the data set by substituting [Parameter Date] or {Max(date)}  for Today() in the forumlas.

3 Variable period  POP

Often users want to compare the YTD versus PYTD or MTD versus PMTD (or more)  based on a parameter selection by the end user.  First a Parameter is need to select the basis for comparison

then 2 Case statements for the Current or Prior period to date totals - the case statements use LODs similar to those used in the previous example. (Note:The formulas can be downloaded in the workbook).  

Selecting the Year returns the YTD (July 9, 2020) versus Prior YTD (July 9, 2019) 

And selecting Month will compare MTD (July 9, 2020) to Prior MTD (June 9, 2020)


More levels can be added to the parameter and the case statements to look at different data part levels.

4 Dynamic YoY comparison

Less frequently, a user will want to compare make a YoY comparison over a range of dates and that presents a problem with the first 3 approaches.  Table calculations do a comparison by position in a data table, not by the actual date and the LOD solutions depend on fixing an end date.

A dynamic or running YoY comparison can be done but it will take duplicating the data set and joining it to itself offsetting the date by a year.  But it can only be done at levels where the data set is solid - i.e. no voids in the data set

The first example uses a very simple data subset of the Superstore data that only includes the order date, the category, subcategory and the sales value.  Step 1 is to join a copy of the data with itself at the subcategory, month and year level but offsetting the year - (Note I have used the physical layer to explicitly join the files together )

When you open any worksheet there will be 2 data sets in the data frame - one is the original data set

and the other is the copy that was joined to create a Prior year data set

The comparison can be made directly by placing the current and prior year sales on in viz and the current years on columns 

So much of the basics, now let's look at expanding the model at creating a rolling YoY using the approach. To keep the visual simple I have expanded the date to quarterly level data and compressed the categories so we are looking at totals only
For the current and prior year rolling totals I used a table calculation

Current year:


and Prior year

window_sum(sum([Sales Prior Year]),-3,0)  

They result in this:

5 Using a scaffold to fill the nulls

But there is a problem when the data is sparse - where there are nulls or voids in the data. 

I expanded the dataset to the Month and day level and that introduced voids in the data

The prior year data will no longer total properly due to the nulls in the data 

To correct the problem requires first creating the missing cells. That is done by scaffolding the data - I used Prep to create a ladder at the date/category level and joining it back to the original data

resolving the nulls and the prior year now totals correctly

I hope this helps the next time you need to do a YoY analysis and helps select the best method to use in your situation

The workbook use here can be downloaded at :Link to workbook


FAQ Series - Drilling down in Top N and Other

Frequently we see questions on how identify the Top N and group the "Other". A common problem with many business applications - de...