The Spark API for Scoring Code library integrates DataRobot Scoring Code JARs into Spark clusters. It is available as a PySpark API and a Spark Scala API.
In previous versions, the Spark API for Scoring Code consisted of multiple libraries, each supporting a specific Spark version. Now, one library supports all supported Spark versions. The following Spark versions support this feature:
Spark 2.4.1 or greater
Spark 3.x
Important
Spark must be compiled for Scala 2.12.
For a list of the deprecated, Spark version-specific libraries, see the Deprecated Spark libraries section.
The PySpark API for Scoring Code is included in the datarobot-predict Python package, released on PyPI. The PyPI project description contains documentation and usage examples.
Before using the Spark API, you must add it to the Spark classpath. For spark-shell, use the --packages parameter to load the dependencies directly from Maven:
The following examples illustrate how you can perform time series scoring with the transform method, just as you would with non-time series scoring. In addition, you can customize the time series parameters with the TimeSeriesOptions builder.
If you don't provide additional arguments for a time series model through the TimeSeriesOptions builder, the transform method returns forecast point predictions for an auto-detected forecast point:
Support for Spark versions earlier than 2.4.1 or Spark compiled for Scala earlier than 2.12 is deprecated. If necessary, you can access deprecated libraries published on Maven Central; however, they will not receive any further updates.