Data Pipeline

Our Data Pipeline allows you to perform transformations on your query results in a series of steps. These steps include a variety of operations such as column sorting, pivoting data, and adding calculated columns. The flexibility of adding any steps in any order allows you to get your data exactly how you want it.

2014-11-05_13-35-35.png

Adding a Step

Click the button in the pipeline to add a new transformation step to your query results. You can apply as many steps as you'd like, in any order. You can also use each step multiple times. The available steps and examples are outlined below.

Data Pipeline steps are applied in order from first to last, so in some cases it is possible for a later step to override an earlier one.

Preview Data

To view your data at any point in the pipeline, click a Pipeline step to open it. Then, click Show Input/Output Data. It can be incredibly useful to compare how your data looks before and after a transformation step is applied.

Executed Query

View the executed query by clicking the Executed query link next to the Run Query button.

The Executed Query will show you the query exactly as it is sent to your database. This is especially helpful for charts using dashboard variables, as it includes the variable values that are inserted into the query. You can also see any timezone offset that has been applied.

Data Pipeline Steps

Add Column

Add a calculated column to your query results. Use one of our built-in functions, or choose Custom Formula in the Formula Type dropdown to create your own. See Formulas to view some examples of functions you can use.

Click to view example

Bucket Data

The Bucket Data pipeline step divides the values from one column into a series of ranges, and then counts how many values fall within each range. This data format can be used to create a histogram chart.

Histograms generally use COUNT as the Aggregation, but you may also choose MIN, MAX, SUM, or AVERAGE.

You have three options for how the bucket ranges are defined: Standard Deviation, Bucket Size (amount in each range), or Number of Buckets.

2014-12-01_14-20-52.png

Click to view example

Case Statement

Add a column to your query results (or edit an existing column) where values are set based on certain "if... then" conditions.

Dashboard variables can be used in any part of a Case Statement pipeline step.

If you find that you use the same CASE statement across multiple charts, you may want to consider putting the generated column data into a Data Store or a custom column in your database.

Combine Columns

Combine columns either by applying a mathematical operation or by concatenation. Check the Hide Combined Columns checkbox to hide your original columns.

Edit Column

Use this step to edit an existing column. All column types (text, date, number, etc.) allow entering a custom formula, and numerical columns have additional formula types including simple math operations and rounding.

Extrapolate Data

The Extrapolate Data step provides predictive data for a column in your query results. Choose whether to add data to the existing column or create a new one, the number of rows returned, and the type of extrapolation used (Cubic, Quadratic, or Linear).

Filter Rows

Filter rows on certain conditions. Choose Include or Exclude, then choose:

  • Matching all conditions
    Uses AND logic. Row will be included or excluded only if it matches every condition.
  • Matching any conditions
    Uses OR logic. Row will be included or excluded if it matches one or more conditions.

Regular Expression matching

Any text column supports filtering by Java-style regular expressions in the Filter Rows
Pipeline step. Matching is case-sensitive and unicode-aware by default. For case-insensitive matching, add (?i) to the beginning of your regular expression.

Click to view example

Group

Group works the same as GROUP BY in SQL. It allows you to group your query results based on a certain column. You'll need to choose which column you want your chart to be grouped by, and which aggregation you would like for the other columns. Aggregation types include SUM, AVG, CONCATENATE (comma-separated list), COUNT, COUNT DISTINCT, MAX, and MIN.

Hide Columns

Hide one or more columns in your query results. If you need to use hidden columns in a Formula, hide the columns as the last step.

Limit Rows

Limit the number of rows included in your query results. Can be useful when applied after a sort—for example, sort sales descending and limit 10 to include the top 10 sales. Offset allows you to select a specific range of rows. For example, to include only rows 10-30, set the limit to 20 and the offset to 10.

Click to view example

Pivot Data

If your chart has one measure grouped by two dimensions, such as count of activity grouped by month and by activity type, you'll want to pivot the data so you can use it in a chart.

Pivot tables require exactly 3 columns. Your data is pivoted on the second column-e.g., the second column becomes the column headers.

Select Ascending, Descending, or None in the Column sort direction dropdown to customize the column order in your pivot table. Aggregate Function allows you to choose the aggregation for duplicate group values. Available aggregations are: SUM, AVG, MIN, MAX, GROUP_CONCAT, Auto. Auto will choose either SUM or GROUP_CONCAT, depending on the data type.

Rename Columns

Rename one or more columns. Useful for display purposes, or shortening column names for ease of use in formulas.

Reorder Columns

Easily click and drag your column names to reorder how they will appear in your chart. Note: it's generally best to reorder columns in one of the last steps, as any editing that involves column changes will affect your reordering settings.

Sort Rows

Apply a sort to one or more columns.

Click to view example

Unpivot Data

Unpivot data is useful when you have multiple single value columns that you want to use in a visualization, such as a pie, line, or bar chart.

If each layer returns a single value, choose Cross Join as your merge type.

Click to view example

Zero Fill Data

Zero fill fills in missing values in your data with zeros. It's important when charting a time series with missing dates, or when performing calculations on columns that contain NULL values.

Zero-fill will fill in all date and numeric columns in your query results.

Formulas

The Add Column and Edit Column Pipeline steps allow you to use formulas to transform your data. You can either use one of our built-in formulas as-is, or write your own formula using the Custom formula option.

If you aren't seeing a formula type you expect to see for an existing column, check the column's data type. Some formulas are only available for numeric columns.

Tip: When dividing integers, be sure to multiply the top value by 1.0 to convert your results to a decimal. For example:

(1.0 * "Count of Clicks")/"Ad Cost"

Add

Add a numerical value to a column. Enter a column name to add a column value instead.

Click to view example

Aggregation

Creates a new column with an aggregation value of another column. Choose from SUM, MIN, MAX, or COUNT.

Useful for applying calculations to a column that involve an aggregated value. For example, use SUM aggregation, then Column Ratio in the Add Column pipeline step to determine a column value's percentage of the total column.

 

Click to view example

Column Ratio

Easily divide one column by another to get a ratio. Integers are auto-converted to decimals to ensure decimal accuracy.

 

Custom formula

Perform basic calculations on your column, or combine one or more of our formulas as needed. SQLite functions are also supported. See the SQLite documentation for the full list of core functions, aggregate functions, and date and time functions available.

Common functions

Name Formula
Moving average moving_avg("Column name", # trailing rows)
Running total running_total("Column name")
Percent change percent_change("Column name")
Median median("Column name")
Format format("Column name", format string)
Lag lag("Column name", offset)
Average avg("Column name")
Round round("Column name", # decimals)
Coalesce coalesce("Column name", replacement char)
Cast cast("Column name" as datatype)

Math functions

acos, asin, atan, atn2, atan2, acosh, asinh, atanh, difference, degrees, radians, cos, sin, tan, cot, cosh, sinh, tanh, coth, exp, log, log10, power, sign, sqrt, square, ceil, floor, pi

String functions

replicate, charindex, leftstr, rightstr, ltrim, rtrim, trim, replace, reverse, proper, padl, padr, padc, strfilter

Aggregate functions

stdev, variance, mode, median, lower_quartile, upper_quartile

Date functions

In the formulas below, units can be one of the following: day, month, year, hour, minute, second, quarter (datepart() only), dayofyear (datepart() only). For more details on these date functions, see our Knowledge Base article.

Name Formula
Date add dateadd( date_string, amount, 'unit' )
Date subtract datesub( date_string, amount, 'unit' )
Date difference datediff( date_string1, date_string2, 'unit' (optional) )
Date part datepart( date_string, 'unit' )

Divide

Divide a column by a value, or by another column.

Format

Used to specify decimal precision, useful for force-formatting currency. Format returns a string, not an integer. Therefore, it is best used in table charts or Single Value charts.

Lag

Access data from a previous row in the same column. Row offset defines how many rows back you want to use.

Moving average

Generates a series of averages of your data to create a smooth trend line. Trailing rows is the number of previous rows to include in each row's average.

Multiply

Multiply a column by a value, or by another column.

Percent change

Determines percent change between current and previous row.

Ratio of total

Displays each row's value as a ratio of the column sum.

Round

Round is used to round to a specific number of decimal places. Note: you cannot round a number to a higher decimal precision than it already has. For example, you cannot round the integer 10 to have two decimal places.

Running total

Sums up all previous results of a column.

Subtract

Subtract a value from a column. You may also subtract a column from another column.

Total column sum

Creates a new column where each value is a sum of its row.