Data Modelling and Prediction in Tableau

Introduction

Many effective business courses preach the benefits of being proactive and strategic. In today’s competitive environment, it’s not enough to react to every breakthrough and ad hoc setback. Instead, organizations need to be forward-thinking: anticipating outcomes, capitalizing on opportunities, and preventing losses. With growing volumes of data and easy-to-use software, predictive analytics is more accessible than ever, helping organizations become more proactive and increase their bottom line.

Every data source that you create in Tableau has a data model. You can think of a data model as a diagram that tells Tableau how it should query data in the connected database tables.

so lets learn more about it.

Data Modelling

There are multiple ways to model your data in Tableau environment. Your modeling method may change based on the amount of data you want to handle, the number of joins and types of joins you are using, the granularity of data sources, data refresh frequency and data source details.

1. Joins in Tableau: You can define join with tables coming from the same data source in the Data Source tab.

You will get options to define join type, columns on which you want to define joins and few data cleansing options like change the data type, rename the columns, split columns etc. You can have different joins with up to 32 tables in Tableau including self-join.

2. Cross-Database joins: You can define joins with different tables from different data sources. This feature is available from Tableau 10.0 version in Data Source Tab.

Just go to Data Source Tab, Click on Add and connect to the data source which you want to use.

Note: Cross-Database join is not supported by all the data sources. i.e Google Analytics, Microsoft Analysis Services, Microsoft PowerPivot, Odata, Oracle Essbase, SharePoint Lists, Teradata OLAP cannot be used in Cross-Database Joins (considering Tableau 2020.1)

In Cross Database joins, Blue color represents Primary Data Source and Orange Colour Represents Secondary Data Source.

Unlike Data Blending, you will have options to change the join connection of Cross-Database Joins.

3. Data Blending: Data blending is a method for combining data from multiple sources without defining any joins between them while connecting. Data blending is useful when you want to bring additional information from a secondary data source to displays it with the primary data source columns.

Unlike joins, data blending keeps the data sources separate and simply displays their information together. This is ideal when the data is at different levels of granularity.

The blending can be defined on common column names by default or can be modified using the Data→Edit Relationship option.

Data blending is performed on a sheet-by-sheet basis and is established when a field from a second data source is used in the view.

When you bring a field from one data source to the sheet — it becomes the primary data source. When you add another field from second data source, it becomes a secondary data source. An orange linking icon will appear in the data pane, indicating which field(s) are being used to blend the data sources.

The default join in Data Blending is left join & there is no way we can change this join condition.

4. Custom SQL: For most databases, you can connect to a specific query rather than the entire data set. You can use custom SQL to union your data across tables, perform complex joins, restructure or reduce the size of your data for analysis by applying where conditions.

5. Tableau Prep: Tableau 2020.3 has a data prep tool that can be used for defining joins, unions, and data cleansing processes. You can define your data model using Tableau Prep tool and the output file can be given as input to Tableau Desktop.

Tableau Prep enables analysts and business users to quickly and easily analyze their data.

Predictive Modelling

In Tableau ,we have predictive modelling functions can help you quickly generate predictions that can be manipulated, visualized, and exported like data using table calculations.

Predictive modeling functions available in Tableau:

FunctionSyntaxDescriptionMODEL_QUANTILEMODEL_QUANTILE(
model_specification (optional),
quantile, target_expression,
predictor_expression(s))
Returns a target numeric value within the probable range defined by the target expression and other predictors, at a specified quantile. This is the Posterior Predictive Quantile.

Example:

MODEL_QUANTILE(0.5, SUM([Sales]),COUNT([Orders]))

MODEL_PERCENTILEMODEL_PERCENTILE(
model_specification (optional),
target_expression,
predictor_expression(s))
Returns the probability (between 0 and 1) of the expected value being less than or equal to the observed mark, defined by the target expression and other predictors. This is the Posterior Predictive Distribution Function, also known as the Cumulative Distribution Function (CDF).

Example:

MODEL_PERCENTILE( SUM([Sales]),COUNT([Orders]))

Create a prediction calculation

Follow along with the steps below to learn how to create a simple prediction calculation using the MODEL_QUANTILE function.

Step 1: Create a Visualization

  1. In Tableau Desktop, connect to the Sample — Superstore saved data source, which comes with Tableau.
  2. Navigate to a worksheet.
  3. From the Data pane, drag the Order Date dimension to the Columns shelf.
  4. Open the measure’s context menu to change its list level to Month and Year:
  1. Drag Sales to the Rows shelf.

Step 2: Create the Calculated Field

  1. Click to open the Analysis menu at the top, and then select Create Calculated Field.
  2. In the Calculation Editor, do the following:
  • Name the calculation: Predict Median Sales.
  • Enter the following formula:
  • MODEL_QUANTILE(0.5, SUM([Sales]),ATTR(DATETRUNC('month', [Order Date])))
  • Remember: The MODEL_QUANTILE function takes a given quantile and predicts values based on the predictors you input.
  • Let’s break this down:
  • In this case, the quantile = 0.5, which predicts the median.
  • We want to predict sales, so the target expression is SUM([Sales]).
  • We want to base the prediction on past performance, so we include date as a predictor, which is the last argument in the calculation.
  1. When finished, click OK.
  2. The prediction calculation is now added as a calculated field in the Data pane.

Step 3: Add the Prediction Calculation to the View

  1. Drag the prediction calculation to the Rows shelf, to the right of SUM(Sales).
  2. Right-click (Control-click on Mac) the measure and select Dual Axis.
  3. To align the two axes in a dual axes chart to use the same scale, right-click (Control-click on Mac) the secondary axis, in this case Predict Median Sales, and select Synchronize Axis. This aligns the scale of the two axes.
  1. That’s all there is to it. To find out how you can extend a date axis and predict the future

Rules for Prediction Calculations

  • You can’t mix aggregate and non-aggregate arguments. If the target expression is an aggregate, so must the predictor.
  • The functions are best used to predict values for individual records, on vizzes where each mark represents a discrete entity, such as a person, a product, a sale, etc.
  • The functions are best used to predict values for aggregated target expressions using SUM and COUNT.
  • The functions aren’t recommended to predict values for aggregated target expressions using AVG, MEDIAN, MIN, or MAX.
  • The functions should use predictors that are at the same level of detail or higher than the viz.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store