# Apache Spark API for Scoring Code

> Apache Spark API for Scoring Code - Learn how to use the Spark API for Scoring Code, a library that
> integrates Scoring Code JARs into Spark clusters.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-04-24T16:03:56.258930+00:00` (UTC).

## Primary page

- [Apache Spark API for Scoring Code](https://docs.datarobot.com/en/docs/api/code-first-tools/sc-apache-spark.html): Full documentation for this topic (HTML).

## Sections on this page

- [PySpark API](https://docs.datarobot.com/en/docs/api/code-first-tools/sc-apache-spark.html#pyspark-api): In-page section heading.
- [Spark Scala API](https://docs.datarobot.com/en/docs/api/code-first-tools/sc-apache-spark.html#spark-scala-api): In-page section heading.
- [Score a CSV file](https://docs.datarobot.com/en/docs/api/code-first-tools/sc-apache-spark.html#score-a-csv-file): In-page section heading.
- [Load models at runtime](https://docs.datarobot.com/en/docs/api/code-first-tools/sc-apache-spark.html#load-models-at-runtime): In-page section heading.
- [Time series scoring](https://docs.datarobot.com/en/docs/api/code-first-tools/sc-apache-spark.html#time-series-scoring): In-page section heading.
- [Deprecated Spark libraries](https://docs.datarobot.com/en/docs/api/code-first-tools/sc-apache-spark.html#deprecated-spark-libraries): In-page section heading.

## Related documentation

- [Developer documentation](https://docs.datarobot.com/en/docs/api/index.html): Linked from this page.
- [Code-first tools](https://docs.datarobot.com/en/docs/api/code-first-tools/index.html): Linked from this page.

## Documentation content

# Apache Spark API for Scoring Code

The Spark API for Scoring Code library integrates DataRobot Scoring Code JARs into Spark clusters. It is available as a [PySpark API](https://docs.datarobot.com/en/docs/api/code-first-tools/sc-apache-spark.html#pyspark-api) and a [Spark Scala API](https://docs.datarobot.com/en/docs/api/code-first-tools/sc-apache-spark.html#spark-scala-api).

In previous versions, the Spark API for Scoring Code consisted of multiple libraries, each supporting a specific Spark version. Now, one library supports all supported Spark versions. The following Spark versions support this feature:

- Spark 2.4.1 or greater
- Spark 3.x

> [!NOTE] Important
> Spark must be compiled for Scala 2.12.

For a list of the deprecated, Spark version-specific libraries, see the [Deprecated Spark libraries](https://docs.datarobot.com/en/docs/api/code-first-tools/sc-apache-spark.html#deprecated-spark-libraries) section.

## PySpark API

The PySpark API for Scoring Code is included in the [datarobot-predict](https://pypi.org/project/datarobot-predict/) Python package, released on PyPI. The PyPI project description contains documentation and usage examples.

## Spark Scala API

The Spark Scala API for Scoring Code is published on Maven as [scoring-code-spark-api](https://central.sonatype.com/artifact/com.datarobot/scoring-code-spark-api). For more information, see the [API reference documentation](https://javadoc.io/doc/com.datarobot/scoring-code-spark-api_3.0.0/latest/com/datarobot/prediction/spark30/Predictors$.html).

Before using the Spark API, you must add it to the Spark classpath. For `spark-shell`, use the `--packages` parameter to load the dependencies directly from Maven:

```
spark-shell --conf "spark.driver.memory=2g" \
     --packages com.datarobot:scoring-code-spark-api:VERSION \
     --jars model.jar
```

### Score a CSV file

The following example illustrates how you can load a CSV file into a Spark DataFrame and score it:

```
import com.datarobot.prediction.sparkapi.Predictors

val inputDf = spark.read.option("header", true).csv("input_data.csv")

val model = Predictors.getPredictor()
val output = model.transform(inputDf)

output.show()
```

### Load models at runtime

The following examples illustrate how you can load a model's JAR file at runtime instead of using the spark-shell `--jars` parameter:

**From DataRobot:**
Define the `PROJECT_ID`, the `MODEL_ID`, and your `API_TOKEN`.

```
val model = Predictors.getPredictorFromServer(
     "https://app.datarobot.com/projects/PROJECT_ID/models/MODEL_ID/blueprint","API_TOKEN")
```

**From HDFS filesystem:**
Define the path to the model JAR file and the `MODEL_ID`.

```
val model = Predictors.getPredictorFromHdfs("path/to/model.jar", spark, "MODEL_ID")
```


### Time series scoring

The following examples illustrate how you can perform time series scoring with the `transform` method, just as you would with non-time series scoring. In addition, you can customize the time series parameters with the `TimeSeriesOptions` builder.

If you don't provide additional arguments for a time series model through the `TimeSeriesOptions` builder, the `transform` method returns forecast point predictions for an auto-detected forecast point:

```
val model = Predictors.getPredictor()
val forecastPointPredictions = model.transform(timeSeriesDf)
```

To define a forecast point, you can use the `buildSingleForecastPointRequest()` builder method:

```
import com.datarobot.prediction.TimeSeriesOptions

val tsOptions = new TimeSeriesOptions.Builder().buildSingleForecastPointRequest("2010-12-05")
val model = Predictors.getPredictor(modelId, tsOptions)
val output = model.transform(inputDf)
```

To return historical predictions, you can define a start date and end date through the `buildForecastPointRequest()` builder method:

```
val tsOptions = new TimeSeriesOptions.Builder().buildForecastDateRangeRequest("2010-12-05", "2011-01-02")
```

For a complete reference, see [TimeSeriesOptions javadoc](https://javadoc.io/doc/com.datarobot/datarobot-prediction/latest/com/datarobot/prediction/TimeSeriesOptions.Builder.html).

## Deprecated Spark libraries

Support for Spark versions earlier than 2.4.1 or Spark compiled for Scala earlier than 2.12 is deprecated. If necessary, you can access deprecated libraries published on Maven Central; however, they will not receive any further updates.

The following libraries are deprecated:

| Name | Spark version | Scala version |
| --- | --- | --- |
| scoring-code-spark-api_1.6.0 | 1.6.0 | 2.10 |
| scoring-code-spark-api_2.4.3 | 2.4.3 | 2.11 |
| scoring-code-spark-api_3.0.0 | 3.0.0 | 2.12 |
