Skip to main content

How to Write Custom Formulas

Tom Williams avatar
Written by Tom Williams
Updated over a week ago

A custom formula is an expression that manipulates data fields from your connected applications (like GitHub, Jira, GitLab, etc.) to produce new values.

Think of it as a mini-program that runs on each record in your dataset, allowing you to calculate, transform, filter, and combine data in ways that perfectly fit your unique analytical needs.

Custom formulas are always written on a single line.

Where Can You Use Custom Formulas?

In Keypup, custom formulas can be applied in two primary contexts within your insights:

  1. Dimensions: When you want to categorize or group your data by a derived value.

    • Example: Grouping pull requests by their size category ("Small PR", "Large PR") instead of just raw lines of code.

    • Requirement: Dimension formulas can use any dimension operator. They do not require an aggregation function.

  2. Metrics: When you want to calculate aggregated values (like sums, averages, counts) based on a custom logic.

    • Example: Calculating the average time pull requests spend in a "Review" state, or counting issues with a specific label.

    • Requirement: Metric formulas must include at least one Metric Aggregator (e.g., SUM(), COUNT(), AVG()).

Basic Syntax Elements

Custom formulas support standard programming constructs for flexible data manipulation.

Data Types and Literals

You can use the following basic data types directly in your formulas:

  • Strings: Enclosed in double quotes (") or single quotes (').

    • Example: "My custom label", 'Feature'

  • Integers: Whole numbers.

    • Example: 10, 2023

  • Floats: Decimal numbers.

    • Example: 3.14, 0.5

  • Booleans: true or false.

    • Example: true, false

  • Null Value: NULL can be used to represent an absence of value.

    • Example: NULL

Arithmetic Operators

Perform mathematical calculations:

  • + (addition)

  • - (subtraction)

  • * (multiplication)

  • / (division)

  • ^ (power) - also available as POWER() function.

  • % (modulo) - also available as MOD() function.

Example:

  • lines_added + lines_deleted

  • (closed_at - created_at) / DAY() (calculating duration in days)

  • 2^3 (2 raised to the power of 3, which is 8)

  • 17 % 3 (17 modulo 3, which is 2)

Comparison Operators

Compare two values, returning true or false:

  • == (equal to)

  • != (not equal to)

  • > (greater than)

  • >= (greater than or equal to)

  • < (less than)

  • <= (less than or equal to)

Example:

  • lines_changed > 500

  • state == "OPEN"

Logical Operators

Combine or negate boolean expressions:

  • AND or && (logical AND)

  • OR or || (logical OR)

  • NOT() or ! (logical NOT)

Example:

  • state == "OPEN" AND type == "pull_request"

  • !(lines_changed < 10) (equivalent to lines_changed >= 10)

Ternary Operator

A shorthand for IF statements.

  • condition ? value_if_true : value_if_false

Example:

  • lines_changed > 500 ? "Big PR" : "Small PR"

Regular Expression Matching

Use regular expressions for advanced text pattern matching:

  • ~ (matches regex)

  • !~ (does not match regex)

Important Notes on Regex:

  • Matching is case-insensitive by default.

  • To make a regex case-sensitive, add (?c) at the beginning of your regex.

  • The regex must be a PostgreSQL-compatible POSIX regex.

Example:

  • message ~ "fix" (matches commit messages containing "fix", case-insensitive)

  • title !~ "(?c)^WIP" (does not match titles starting with "WIP", case-sensitive)

Regular Expression Cheat Sheet:

  • "^fix": string starts with "fix"

  • "fix$": string ends with "fix"

  • "(?c)fix": string contains "fix" (case-sensitive)

  • "fix|bug": string contains "fix" or "bug"

  • You can test your regular expressions using online tools like regex101.com.

Working with Arrays

Keypup supports array fields (e.g., label_names, assignee_usernames) often denoted by stringL for list of strings.

Array Literals

Define arrays directly in your formula:

  • ["foo", "bar", "baz"] (string array)

  • [1, 2, 3] (integer array)

Array Element Access

Access elements by their index using square brackets [] or the AT_INDEX() function.

  • Indices start at 0.

  • Negative indices access elements from the end (e.g., -1 for the last element).

Example:

  • label_names[0] (the first label)

  • assignee_usernames[-1] (the last assignee)

  • AT_INDEX(["apple", "banana", "cherry"], 1) returns "banana"

Array Concatenation and Subtraction

  • array1 + array2: Concatenates two arrays of the same type.

  • array1 - array2: Removes elements from array1 that are present in array2.

Example:

  • ["bug"] + ["feature"] returns ["bug", "feature"]

  • ["A", "B", "C"] - ["B"] returns ["A", "C"]

FLATTEN() Operator (Dimension-Only)

The FLATTEN() operator is unique and can only be used for dimensions. It transforms each value within an array field into a separate record, effectively "spreading" the array values across multiple rows. This is incredibly useful for analyzing data at the individual item level within an array.

  • Syntax: FLATTEN(array_field)

  • Usage: Only as a top-level operator for a dimension.

  • Result: For each original record, FLATTEN() will create a new "virtual" record for each element in the specified array. If the array has 3 elements, the original single record might become 3 records for that dimension.

Example Use Case: You want to count the number of pull requests per label, rather than per set of labels.

  • Dimension: FLATTEN(label_names)

  • Metric: COUNT()

  • Result: A chart showing "bug": 50 PRs, "enhancement": 30 PRs, etc., where a single PR with both "bug" and "enhancement" labels would be counted twice (once for "bug", once for "enhancement").

Working with Dates and Times

Keypup supports ISO8601 formatted dates and datetimes in custom formulas.

Date Arithmetic

When you subtract one date from another (date2 - date1), the result is always expressed in seconds. To convert this duration into other units, divide by the relevant time constant function.

Example:

  • (closed_at - created_at): duration in seconds.

  • (closed_at - created_at) / HOUR(): duration in hours.

  • (closed_at - created_at) / DAY(): duration in days.

  • (closed_at - created_at) / MONTH(): duration in months.

Time Constant Functions

These functions return the duration in seconds for their respective time unit:

  • HOUR(): 3600 seconds

  • DAY(): 86400 seconds

  • WEEK(): 604800 seconds

  • MONTH(): 2629746 seconds (based on 30 days)

  • QUARTER(): 7776000 seconds (based on 90 days)

  • YEAR(): 31556952 seconds (based on 365 days)

Current Time

  • NOW(): Returns the current datetime.

Example:

(NOW() - created_at) / DAY() (calculates how many days ago an item was created).

Date Construction

DATE(year, month, day): Builds a datetime object with the time set to midnight UTC.

Example:

DATE(2023, 1, 15) returns 2023-01-15T00:00:00Z

Date Formatting Functions (for Dimensions)

These functions are commonly used for dimensions to group data by specific date granularity.

  • YEAR(datetime): Returns the year (e.g., 2023).

  • YEAR_MONTH(datetime): Formats as YYYY-MM (e.g., "2023-07").

  • YEAR_MONTH_DAY(datetime): Formats as YYYY-MM-DD (e.g., "2023-11-07").

  • YEAR_QUARTER(datetime): Formats as YYYY-QX (e.g., "2023-Q2").

  • YEAR_WEEK(datetime): Formats as YYYY-WXX (e.g., "2023-W17").

  • MONTH(datetime): Returns the month number (1-12).

  • DAY_OF_WEEK(datetime): Returns the day of the week (1=Monday, 7=Sunday).

  • DAY_OF_YEAR(datetime): Returns the day of the year (1-366).

Example Dimension Formula: YEAR_MONTH(created_at)

Specialized Functions & Aggregators

Array Functions

  • ARRAY_FILTER(list, regex): Returns all values in list matching regex.

    • Example: ARRAY_FILTER(label_names, "^bug-")

  • ARRAY_FIND(list, value1, value2, ...): Returns the first value in list that matches any of the provided search terms (value1, value2, etc.), in order of precedence.

    • Example: ARRAY_FIND(label_names, "security", "bug", "enhancement")

  • AT_INDEX(list, index): (Same as list[index]) Returns the element at the specified index.

    • Example: AT_INDEX(label_names, 0)

Date & Time Functions

  • BEGINNING_OF_HOUR(datetime): Returns datetime at the start of the hour.

  • BEGINNING_OF_DAY(datetime): Returns datetime at the start of the day.

  • BEGINNING_OF_MONTH(datetime): Returns datetime at the start of the month.

  • BEGINNING_OF_QUARTER(datetime): Returns datetime at the start of the quarter.

  • BEGINNING_OF_WEEK(datetime): Returns datetime at the start of the week (Monday).

  • BEGINNING_OF_YEAR(datetime): Returns datetime at the start of the year.

  • END_OF_HOUR(datetime): Returns datetime at the end of the hour.

  • END_OF_DAY(datetime): Returns datetime at the end of the day.

  • END_OF_MONTH(datetime): Returns datetime at the end of the month.

  • END_OF_QUARTER(datetime): Returns datetime at the end of the quarter.

  • END_OF_WEEK(datetime): Returns datetime at the end of the week.

  • END_OF_YEAR(datetime): Returns datetime at the end of the year.

Conditional Logic & Utility Functions

  • BETWEEN(value, start, end): Returns true if value is between start and end (inclusive).

    • Example: BETWEEN(lines_changed, 100, 1000)

  • CONCAT(value1, ..., valueN): Concatenates values into a string. (Shorthand: +)

    • Example: CONCAT("PR: ", title)

  • CONTAINS(list, term|term_list): Returns true if list contains the term or any of term_list.

    • Example: CONTAINS(label_names, "bug")

  • CONTAINS_ALL(list, term|term_list): Returns true if list contains all of the provided terms.

    • Example: CONTAINS_ALL(label_names, ["bug", "critical"])

  • CONTAINS_EXACTLY(list, term|term_list): Returns true if list contains only the provided terms (order-insensitive).

    • Example: CONTAINS_EXACTLY(label_names, ["bug", "frontend"])

  • GREATEST(value1, ..., valueN): Returns the highest value from a list of provided values. (Not an aggregator)

    • Example: GREATEST(created_at, updated_at)

  • IF(condition1, value_if_condition1, [condition2, value_if_condition2], ..., [else_value]): Conditional logic. (Shorthand: ? :)

    • Example: IF(state == "OPEN", "Active", "Inactive")

    • Important: All return values must be of the same type.

  • IF_MATCH(target, regex1, result1, [regex2, result2], ..., [else_result]): Evaluates target against multiple regexes and returns the result from the first match.

    • Example: IF_MATCH(title, "^fix", "Fix", "^feat", "Feature", "Other")

  • IF_NULL(value, fallback_if_null): Returns value if not NULL, otherwise returns fallback_if_null.

    • Example: IF_NULL(assigned_at, created_at)

  • IF_ZERO(value, fallback_if_zero): Returns value if not 0, otherwise returns fallback_if_zero. Useful for avoiding division by zero.

    • Example: SUM(deletions) / SUM(IF_ZERO(additions, 1))

  • IN(value, list): Returns true if value is present in list.

    • Example: IN(author_username, ["john.doe", "jane.smith"])

  • IS_NULL(value): Returns true if value is NULL.

    • Example: IS_NULL(merged_at)

  • IS_NOT_NULL(value): Returns true if value is not NULL.

    • Example: IS_NOT_NULL(assignee_usernames)

  • LEAST(value1, ..., valueN): Returns the lowest value from a list of provided values. (Not an aggregator)

    • Example: LEAST(first_commit_at, created_at)

  • NOT_CONTAINS(list, term|term_list): Returns true if list does not contain the term or any of term_list.

    • Example: NOT_CONTAINS(label_names, "WIP")

Math Functions

  • EXP(exponent): Returns e raised to the power of exponent.

  • LN(number): Returns the natural logarithm of a number.

  • LOG(number, [base=10]): Returns the logarithm of a number with a specified base (default 10).

  • POWER(base, exponent): Raises base to the exponent power. (Shorthand: ^)

  • MOD(dividend, divisor): Returns the remainder of a division. (Shorthand: %)

  • ROUND(number, [precision=0]): Rounds a number to a specified decimal precision.

    • Example: ROUND(3.14159, 2) returns 3.14

  • ROUNDDOWN(number, [precision=0]): Rounds a number down.

  • ROUNDUP(number, [precision=0]): Rounds a number up.

Hashing Functions

  • SHA1(value): Computes a SHA1 hash. Useful for anonymization.

  • SHA256(value): Computes a SHA256 hash. Useful for anonymization.

String Manipulation Functions

  • LEFT(string, [num_chars=1]): Returns a substring from the beginning.

  • LENGTH(list|string): Returns the length of an array or string.

  • MATCH(string, regex): Returns true if string matches regex. (Shorthand: ~)

  • NOT_MATCH(string, regex): Returns true if string does not match regex. (Shorthand: !~)

  • RIGHT(string, [num_chars=1]): Returns a substring from the end.

  • SPLIT(string, delimiter): Splits a string into an array of strings.

    • Example: SPLIT("item1,item2", ",") returns ["item1", "item2"]

  • TO_STR(value): Casts a value to its string representation.

    • Example: TO_STR(123) + "_suffix"

Type Casting Functions

  • TO_FLOAT(value): Casts a value to a float. Returns 0 if not a number.

  • TO_INT(value): Casts a value to an integer (truncates decimals). Returns 0 if not a number.

Timeline Functions (for the workflow_timeline field)

These are powerful for analyzing time spent in different states of a process.

  • TIMELINE_DURATION(timeline_field, event|event_list, [closed_only: false|true], [null_if_none: false|true]): Calculates the total duration (in seconds) spent in a given state(s).

    • closed_only: If true, only counts durations for completed (closed) events.

    • null_if_none: If true, returns NULL instead of 0 if no matching events are found.

    • Example: TIMELINE_DURATION(workflow_timeline, "In Progress") / HOUR() (time in "In Progress" in hours)

  • TIMELINE_FIRST_START_AT(timeline_field, event|event_list): Returns the timestamp of the first time an item entered the specified state(s).

  • TIMELINE_FIRST_END_AT(timeline_field, event|event_list): Returns the timestamp of the first time an item exited the specified state(s).

  • TIMELINE_LAST_START_AT(timeline_field, event|event_list): Returns the timestamp of the last time an item entered the specified state(s).

  • TIMELINE_LAST_END_AT(timeline_field, event|event_list): Returns the timestamp of the last time an item exited the specified state(s).

Metric Aggregators

As noted earlier, any custom formula used for a Metric must include at least one of these aggregation functions. These functions operate on groups of records defined by your dimensions.

  • AVG(number): Calculates the average of a numeric field/expression.

    • Example: AVG(lines_changed)

  • COUNT() / COUNT(field): Counts the number of records (or records where field is not null).

    • Example: COUNT() (total records), COUNT(due_on) (records with a due date)

  • COUNT_CUMULATIVE() / COUNT_CUMULATIVE(field): Returns a cumulative count based on the sort order.

  • COUNT_DISTINCT(field|expression): Counts the number of unique values for a field/expression.

    • Example: COUNT_DISTINCT(author_username)

  • COUNT_IF(condition): Counts records that match a specific condition within each group. Useful for multiple metrics in one chart.

    • Example: COUNT_IF(state == "CLOSED")

  • COUNT_ROWS(): Counts the number of groups (rows) returned by the query. Useful for ratios.

    • Example: COUNT() / COUNT_ROWS()

  • COUNT_TOTAL() / COUNT_TOTAL(field): Counts the total number of records across all groups. Useful for overall ratios.

    • Example: COUNT() / COUNT_TOTAL()

  • MAX(number): Returns the maximum value.

    • Example: MAX(lines_changed)

  • MEDIAN(number): Returns the median value.

    • Example: MEDIAN(lines_changed)

  • MIN(number): Returns the minimum value.

    • Example: MIN(additions)

  • PERCENTILE(number_field, percentile): Returns the discrete percentile value (value from the dataset).

    • Example: PERCENTILE(lines_changed, 0.90)

  • PERCENTILE_CONT(number_field, percentile): Returns the continuous (interpolated) percentile value.

    • Example: PERCENTILE_CONT(lines_changed, 0.90)

  • STDDEV(number): Returns the standard deviation.

    • Example: STDDEV(lines_changed)

  • SUM(field or expression): Sums values for a field or expression.

    • Example: SUM(additions)

  • SUM_CUMULATIVE(number): Returns a cumulative sum based on the sort order.

    • Example: SUM_CUMULATIVE(lines_changed)

  • SUM_IF(condition, number): Sums values only for records that match a specific condition. Useful for multiple metrics in one chart.

    • Example: SUM_IF(state == "MERGED", lines_changed)

  • SUM_TOTAL(number): Sums values across all groups.

    • Example: SUM(lines_changed) / SUM_TOTAL(lines_changed)

  • VARIANCE(number): Returns the variance.

    • Example: VARIANCE(lines_changed)

Key Principles for Writing Formulas

  • Understand Your Data Fields: Always refer to the dataset available on Keypup section to know the exact names and types of fields you can use (e.g., created_at is a ts / timestamp, label_names is a stringL / list of strings).

  • Type Consistency: Ensure that operations and function arguments match the expected data types. For IF and IF_NULL, the return values must be of the same type.

  • Aggregators for Metrics: Remember that metrics must use an aggregator function.

  • Mind the FLATTEN(): It's powerful but only for dimensions and affects how records are processed.

  • Start Simple: Begin with basic formulas and gradually add complexity. Test small parts of your logic if possible.

  • Check Examples: Refer to existing insight examples in Keypup for inspiration and best practices on how to combine functions and fields.

By mastering custom formulas, you can unlock a new level of analytical power in Keypup, creating highly specific and insightful reports tailored to your team's unique workflows and performance indicators.

Did this answer your question?