Our Data Pipeline allows you to perform transformations on your query results in a series of steps. These steps include a variety of operations such as column sorting, pivoting data, and adding calculated columns. The flexibility of adding any steps in any order allows you to get your data exactly how you want it.

## Adding a Step

Click the
**Add Transformation**
button in the pipeline to add a new transformation step to your query results. You can apply as many steps as you'd like, in any order. You can also use each step multiple times. The available steps and examples are outlined below.

*Data Pipeline steps are applied in order from first to last, so in some cases it is possible for a later step to override an earlier one.*

## Preview Data

To view your data at any point in the pipeline, click a Pipeline step to open it. Then, click **Show Input/Output Data**. It can be incredibly useful to compare how your data looks before and after a transformation step is applied.

### Executed Query

View the executed query by clicking the **Executed query** link next to the **Run Query** button.

The Executed Query will show you the query exactly as it is sent to your database. This is especially helpful for charts using dashboard variables, as it includes the variable values that are inserted into the query. You can also see any timezone offset that has been applied.

## Data Pipeline Steps

### Add Column

Add a calculated column to your query results. Use one of our built-in functions, or choose **Custom Formula** in the Formula Type dropdown to create your own. See Formulas to view some examples of functions you can use.

### Bucket Data

The Bucket Data pipeline step divides the values from one column into a series of ranges, and then counts how many values fall within each range. This data format can be used to create a histogram chart.

Histograms generally use COUNT as the **Aggregation**, but you may also choose MIN, MAX, SUM, or AVERAGE.

Define your buckets using any of the following options: Standard Deviation, Bucket Size (amount in each range), Number of Buckets, or Custom to choose custom ranges.

### Case Statement

Add a column to your query results (or edit an existing column) where values are set based on certain "if... then" conditions.

Dashboard variables can be used in any part of a Case Statement pipeline step.

*If you find that you use the same CASE statement across multiple charts, you may want to consider putting the generated column data into a Data Store or a custom column in your database.*

To pass through column values in THEN or ELSE, select **Column** from the dropdown and choose your column.

### Combine Columns

Combine columns either by applying a mathematical operation or by concatenation. Check the **Hide Combined Columns** checkbox to hide your original columns.

### Edit Column

Use this step to edit an existing column. All column types (text, date, number, etc.) allow entering a custom formula, and numerical columns have additional formula types including simple math operations and rounding.

### Extrapolate Data

The Extrapolate Data step provides predictive data for a column in your query results. Choose whether to add data to the existing column or create a new one, the number of rows returned, and the type of extrapolation used (Cubic, Quadratic, or Linear).

### Filter Rows

Filter rows on certain conditions. Choose **Include** or **Exclude**, then choose:

- Matching
**all**conditions*Uses***AND**logic. Row will be included or excluded only if it matches every condition. - Matching
**any**conditions*Uses***OR**logic. Row will be included or excluded if it matches one or more conditions.

#### Regular Expression matching

Any text column supports filtering by Java-style regular expressions in the **Filter Rows**

Pipeline step. Matching is case-sensitive and unicode-aware by default. For case-insensitive matching, add (?i) to the beginning of your regular expression.

### Group

Group works the same as GROUP BY in SQL. It allows you to group your query results based on a certain column. You'll need to choose which column you want your chart to be grouped by, and which aggregation you would like for the other columns. Aggregation types include SUM, AVG, CONCATENATE (comma-separated list), COUNT, COUNT DISTINCT, MAX, and MIN.

### Hide Columns

Hide one or more columns in your query results. *If you need to use hidden columns in a Formula, hide the columns as the last step.*

### Limit Rows

Limit the number of rows included in your query results. Can be useful when applied after a sort—for example, sort sales descending and limit 10 to include the top 10 sales. Offset allows you to select a specific range of rows. For example, to include only rows 10-30, set the limit to 20 and the offset to 10.

### Pivot Data

If your chart has one measure grouped by two dimensions, such as count of activity grouped by month and by activity type, you'll want to pivot the data so you can use it in a chart.

Pivot tables require exactly 3 columns. Your data is pivoted on the second column-e.g., the second column becomes the column headers.

Select Ascending, Descending, or None in the Column sort direction dropdown to customize the column order in your pivot table. Aggregate Function allows you to choose the aggregation for duplicate group values. Available aggregations are: SUM, AVG, MIN, MAX, GROUP_CONCAT, Auto. Auto will choose either SUM or GROUP_CONCAT, depending on the data type.

To pivot on more than 3 columns, refer to Pivot on more than 3 columns

### Rename Columns

Rename one or more columns. Useful for display purposes, or shortening column names for ease of use in formulas.

### Reorder Columns

Easily click and drag your column names to reorder how they will appear in your chart. Note: it's generally best to reorder columns in one of the last steps, as any editing that involves column changes will affect your reordering settings.

### Sort Rows

Apply a sort to one or more columns.

### Transpose Data

The Transpose step simply switches rows and columns.

*There are no limits on the input table format, but the resulting table cannot contain more than 1000 columns.*

### Unpivot Data

Unpivot data is useful when you have multiple single value columns that you want to use in a visualization, such as a pie, line, or bar chart.

*If each layer returns a single value, choose Cross Join as your merge type.*

### Zero Fill Data

Zero fill fills in missing values in your data with zeros. It's important when charting a time series with missing dates, or when performing calculations on columns that contain NULL values.

*Zero-fill will fill in all date and numeric columns in your query results.*

## Formulas

The **Add Column** and **Edit Column** Pipeline steps allow you to use formulas to transform your data. You can either use one of our built-in formulas as-is, or write your own formula using the Custom formula option.

*If you aren't seeing a formula type you expect to see for an existing column, check the column's data type. Some formulas are only available for numeric columns.*

Tip: When dividing integers, be sure to multiply the top value by 1.0 to convert your results to a decimal. For example:

`(1.0 * "Count of Clicks")/"Ad Cost"`

### Add

Add a numerical value to a column. Enter a column name to add a column value instead.

### Aggregation

Creates a new column with an aggregation value of another column. Choose from AVG, COUNT, MAX, MEDIAN, MIN, or SUM.

Useful for applying calculations to a column that involve an aggregated value. For example, use SUM aggregation, then **Column Ratio** in the **Add Column** pipeline step to determine a column value's percentage of the total column.

### Column Ratio

Easily divide one column by another to get a ratio. Integers are auto-converted to decimals to ensure decimal accuracy.

### Create link with title

Takes a url column and a text column and generates a markdown link that's clickable from your table chart, using the text column as the link title.

Useful for creating short, easily parsable links.

### Custom formula

Perform basic calculations on your column, or combine one or more of our formulas as needed. SQLite functions are also supported. See the SQLite documentation for the full list of core functions, aggregate functions, and date and time functions available.

#### Common functions

Name | Formula |
---|---|

Moving average | moving_avg("Column name", # trailing rows) |

Running total | running_total("Column name") |

Percent change | percent_change("Column name") |

Median | median("Column name") |

Format | format("Column name", format string) |

Lag | lag("Column name", offset) |

Average | avg("Column name") |

Round | round("Column name", # decimals) |

Coalesce | coalesce("Column name", replacement char) |

Cast | cast("Column name" as datatype) |

#### Math functions

acos, asin, atan, atn2, atan2, acosh, asinh, atanh, difference, degrees, radians, cos, sin, tan, cot, cosh, sinh, tanh, coth, exp, log, log10, power, sign, sqrt, square, ceil, floor, pi

#### String functions

replicate, charindex, leftstr, rightstr, ltrim, rtrim, trim, replace, reverse, proper, padl, padr, padc, strfilter

#### Aggregate functions

stdev, variance, mode, median, lower_quartile, upper_quartile

#### Date functions

In the formulas below, units can be one of the following: day, month, year, hour, minute, second, quarter (datepart() only), dayofyear (datepart() only). For more details on these date functions, see our Knowledge Base article.

Name | Formula |
---|---|

Date add | dateadd( date_string, amount, 'unit' ) |

Date subtract | datesub( date_string, amount, 'unit' ) |

Date difference | datediff( date_string1, date_string2, 'unit' (optional) ) |

Date part | datepart( date_string, 'unit' ) |

### Divide

Divide a column by a value, or by another column.

### Format

Used to specify decimal precision, useful for force-formatting currency. *Format returns a string, not an integer. Therefore, it is best used in table charts or Single Value charts.*

### Lag

Access data from a previous row in the same column. Row offset defines how many rows back you want to use.

### Moving average

Generates a series of averages of your data to create a smooth trend line. Trailing rows is the number of previous rows to include in each row's average.

### Multiply

Multiply a column by a value, or by another column.

### Percent change

Determines percent change between current and previous row.

### Percentile

Determines the percentile of each value in a numeric column. For example, a value is at the 50th percentile if half of the values in the column are less than or equal to it.

### Ratio of total

Displays each row's value as a ratio of the column sum.

### Round

Round is used to round to a specific number of decimal places. *Note: you cannot round a number to a higher decimal precision than it already has. For example, you cannot round the integer 10 to have two decimal places.*

### Running total

Sums up all previous results of a column.

### Subtract

Subtract a value from a column. You may also subtract a column from another column.

### Total column sum

Creates a new column where each value is a sum of its row.