Skip to content

Trino

Supported authentication

  • Basic (username/password)

Prerequisites

The following is required before connecting to Trino in DataRobot:

  • Data stored in a Trino database

Batch predictions requirements

You can use Trino as both an intake source and an output destination for batch prediction jobs.

  • Use lowercase only for column names in the dataset used to train a project. Trino sanitizes column names automatically (unquoted identifiers are lowercased), so mixed-case or uppercase column names can cause column inconsistency errors when reading from Trino for batch scoring. This applies even when creating tables with quoted column names—Trino still stores them as lowercase.

  • When using Trino as a batch prediction output destination, set an explicit numeric chunkSize in bytes. Named chunk strategies (auto, dynamic, fixed) are not supported. The value must not exceed 1,000,000 bytes (1MB), which is the default Trino query.max-length limit. Exceeding this limit or using a named strategy will cause the job to fail.

Required parameters

The table below lists the minimum required fields to establish a connection with Trino:

Required field Description Documentation
Host The hostname or IP address of your Trino coordinator. Trino documentation

Troubleshooting

Problem Solution Instructions
When attempting to execute an operation in DataRobot, the firewall requests that you clear the IP address each time. Add all allowed IPs for DataRobot. See Allowed source IP addresses. If you've already added the allowed IPs, check the existing IPs for completeness.

Code examples

The Python example below shows how to connect to and move data from Trino into DataRobot.

Initialize the DataRobot client and define database details for later use:

api_token = '<token>'
endpoint = 'https://app.datarobot.com/api/v2'

import datarobot as dr
from datarobot.enums import DataStoreTypes

dr.Client(token=api_token, endpoint=endpoint)

TRINO_HOST = "datarobot.trino.galaxy.starburst.io"
TRINO_PORT = 443
USE_SSL = "true"
CATALOG = "<catalog>"
SCHEMA = "<schema>"
TABLE = "<table>"
QUERY = None
TRINO_USERNAME = "<username>"
TRINO_PASSWORD = "<password>"

Do one of the following to locate your Trino driver ID:

  • Create the Trino driver ID:

    trino_driver = dr.DataDriver.create(
        class_name=DataStoreTypes.DR_DATABASE_V1,
        canonical_name='Trino Driver',
        database_driver='trino-v1',
    )
    
  • Reference an existing Trino driver ID:

    trino_driver = dr.DataDriver.get('<trino_driver_id>')
    

Create (or reuse) Trino credentials and securely save them in DataRobot:

trino_credentials = dr.Credential.create_basic(
    name='Trino Credentials',
    user=TRINO_USERNAME,
    password=TRINO_PASSWORD,
)

Define a connection to the external data store:

datastore_fields = [
    {"id": "host", "name": "Host Name", "value": TRINO_HOST},
    {"id": "port", "name": "port", "value": str(TRINO_PORT)},
    {"id": "ssl", "name": "ssl", "value": USE_SSL},
]

trino_datastore = dr.DataStore.create(
    data_store_type=DataStoreTypes.DR_DATABASE_V1,
    canonical_name='Trino Datastore',
    driver_id=trino_driver.id,
    fields=datastore_fields,
)

Point to a specific data source (table or query):

data_source_params = dr.DataSourceParameters(
    data_store_id=trino_datastore.id,
    catalog=CATALOG,
    schema=SCHEMA,
    table=TABLE,
    query=QUERY,
)

trino_datasource = dr.DataSource.create(
    data_source_type=DataStoreTypes.DR_DATABASE_V1,
    canonical_name='Trino DataSource',
    params=data_source_params,
)

Pull the data from Trino and import a snapshotted version into DataRobot:

trino_dataset = trino_datasource.create_dataset(
    do_snapshot=True,
    credential_id=trino_credentials.id,
)