SQL Querying

Powerful SQL querying against your structured log data

Timber offers full ANSI compliant SQL querying against your structured log data, providing you with powerful unrestricted access to your log data. Query your log data just as you would any SQL compliant database.

This feature is designed for complex long-term querying. Data is made available on 15 minute intervals. For real-time access please see the Live Tailing & Searching feature.

Getting Started

SQL Querying is limited to the Timber CLI since it is a low-level data feature. Integration into the Timber web app is planned.

  1. List your available sources:

    timber sources
  2. Execute the sql-queries command using your chosen source ID as the table name:

    timber sql-queries execute "
    SELECT dt, message
    FROM source_{source_id}
    LIMIT 10
    "
Timber SQL Querying Demo

Usage

Executing Queries

SQL Query Syntax

Timber supports the Presto SELECT syntax:

[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expr [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
[ HAVING condition]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ LIMIT [ count | ALL ] ]

Please see the Presto SELECT docs for a comprehensive syntax overview.

Tables

Your table name is formatted as source_{id}. Where {id} is replaced by your actual source ID.

For example: SELECT * FROM source_1234 LIMIT 100 would select data from source ID 1234.

Your can obtain your source ID via the timber sources command.

Columns

Any column you send as part of your log data is automatically made available for querying. If you haven't already, please read out Dynamic Schema Maintenance document.

Nested columns are delimited by a .. For example context.user.id

When specifying columns with a . be sure to quote the name with ` characters!

For example (notice the column quoting):

SELECT `context.user.id`
FROM source_{id}
LIMIT 10

Special Columns

Name

Type

Description

application_id

int

The ID of the source, application is an unfortunate legacy term.

dt

float

Log date in fractional Unix timestamp format (float). The timestamp represents seconds and the fractions represent fractions of a second. Use this to efficiently narrow your queries to a specific date range.

normalized_message

string

Downcased and ANSI formatting stripped. Convenient for sub-string searching.

raw_message

string

The raw, unaltered message that you sent to Timber.

severity

int

Numerical representation of the level field. The value follows the Syslog 5424 severities.

Functions

Please see the Presto SELECT docs for a comprehensive overview of all functions available.

Example 1: Retrieve the last 50 logs logs:

SELECT dt, message
FROM source_{id}
ORDER BY `dt.desc`
LIMIT 50

Example 2: Count logs over the past 24 hours:

SELECT COUNT(*) AS count
FROM source_{id}
WHERE dt >= (now() - interval '24' hour)

Example 3: Count errors by user over the last week

This query assumes you have a user.id field.

SELECT
`context.user.id` AS user_id,
COUNT(*) AS count
FROM source_{id}
WHERE
level = 'error' AND
dt >= (now() - interval '1' week)
GROUP BY `context.user.id`

Example 4: Average HTTP server response times over the last 24 hours (1 minute intervals):

This query assumes you have a http_response_sent.duration_ms field.

SELECT
floor(dt - mod(dt, 60)) AS interval,
AVG(`http_response_sent.duration_ms`) AS response_avg
FROM source_{id}
WHERE dt >= (now() - interval '24' hours)
GROUP BY floor(dt - mod(dt, 60))

Example 5: Search logs by a phrase

SELECT dt, message
FROM source_{id}
WHERE normalized_message LIKE '%sent 500%'
ORDER BY `dt.desc`
LIMIT 50

normalized_message is a special field that Timber provides. It is a downcased and ANSI formatted stripped version of the message field, providing for case-insensitive searching.

Listing Queries

timber sql-queries

Getting A Query's Info

  1. Run the info sub-command to get a query's status:

    timber sql-queries info [sql_query_id]

Please see the Query Statuses section for status explanations.

Displaying Results

  1. Run the results sub-command to displaying the results, replacing [sql_query_id] with the ID of your query:

    timber sql-queries results [sql_query_id]

Downloading Results

  1. Run the download sub-command to download the result, replacing [sql_query_id] with the ID of your query:

    timber sql-queries download [sql_query_id] | open

    We pipe with the open command to

Cancelling Queries

You cannot cancel queries that are not RUNNING.

You can cancel running queries by issuing the cancel sub-command:

timber sql-queries cancel [sql_query_id]

Query Statuses

SQL queries can have any of the following statuses:

Status

Description

QUEUED

The query is queued for running. Typically queries run immediately or within minutes if they are queued. Queueing should not last longer than 5 minutes.

RUNNING

The query is currently running.

SUCCEEDED

The query successfully completed.

FAILED

The query failed due to an error. Get the status to view error details.

CANCELLED

The query was manually cancelled.

Usage Calculation

Each SQL query scans data in order to execute and return its result. The amount of data scanned is entirely dependent on the query and the data within your account. The data scanned will be displayed in your client after the query has finished executing.

It is very easy to write a query that will scan all of your data and exhaust your limit. Additionally, it is just as easy to write efficient queries that only scan the smallest amount of data necessary. Please see our querying best practices on how to do this.

Best Practices

  1. Supply a date range on the dt column to limit the amount of data scanned, otherwise all data within your account will be scanned.

  2. Supply a LIMIT to return early and reduce the amount of data returned.

  3. Specify individual columns within the SELECT clause to reduce the amount of data scanned and returned. Avoid *.

  4. Avoid JOINs if possible.

How It Works

If you haven't already, please read our Log Ingestion document for a deeper dive into our log ingestion pipeline.

Architecture

Timber persists your data in an efficient columnar format (ORC) on S3 and uses Athena (Presto) to query that data. Athena utilizes thousands of CPU cores to query your data, resulting in incredibly fast query speeds.

Performance

Timber's S3 / SQL querying pipeline is built to extract every possible performance benefit. This is the result of hard won experience building and maintaining big data S3 pipelines. With Timber you get all of this as part of your account:

  1. Timber dynamically maintains a schema for each of your sources.

  2. Because we maintains a consistent schema we're able to write your data in compressed ORC columnar format for efficient data retrieval.

  3. Hourly partitions are used for high-level granular access, helping to reduce the amount of data scanned for each query.

Asynchronous Processing

Because SQL queries can vary in complexity, they are executed asynchronously, and the status of the query is polled. This is due to the fact that SQL queries can sometimes take a while to complete. This is entirely dependent on the complexity of your query and the amount of data scanned. Performance is largely correlated with the amount of data scanned. You can reduce the execution time by following our best practices.

Data Availability

Timber's S3 / SQL Querying pipeline flushes data on 15 minute intervals, meaning data can be delayed up to 15 minutes before being made available for SQL querying. If you need real-time access please see the Live Tailing & Search feature.

Limitations

  1. SQL queries are read-only, only SELECT queries are allowed.

  2. Execution time cannot exceed 10 minutes. Beyond this the query will be canceled.

  3. Only 10 X your billing plan's volume can be scanned within a given billing period. For example, if you have a 10gb volume billing plan, you can only execute up to 100GB in cumulative data scanned within a single billing period.Note that because of the way Timber stores your log data, SQL queries scan significantly less data than the total amount of data contained in your log lines. See the Usage Calculation section for more information.

Please contact support if you want to inquire about a limit increase.