Transform your data with Spark SQL¶
Data Prep provides a tool to let you transform your data using Spark SQL. Spark SQL provides a library of functions for you to prep, clean, and transform your data.
Your Data Prep Administrator must enable this feature in your application.
The following sections describe how to use the Spark SQL tool in Data Prep. For a discussion of the SQL statements supported, see Data Prep Spark SQL Guidelines.
You can also select and transform your data using the AI Catalog. See Prepare data in AI Catalog with Spark SQL.
Work with Spark SQL¶
To access the Spark SQL tool, click spark sql in the project Tools bar:
Add a Spark SQL statement¶
From the Tools bar, click spark sql. The Spark SQL Statement pane appears.
Enter a Spark SQL statement. See the Data Prep Spark SQL guidelines for usage details.
Click Run Query on the lower right of the Spark SQL Statement pane to validate your query. If the query is successful, the results display below. View the results to ensure that the query is functioning as expected.
If the query is unsuccessful, an error message displays below the query:
Click Save to save the query. You can save queries with errors and return later to resolve the errors.
If the SQL query contains an error, the Spark SQL step in the Steps tool displays an error icon (). Click the icon to view the error message.
After saving the Spark SQL step in the Steps tool, you might need to make changes to a previous step or add a new step before the Spark SQL step. In this case, click the Spark SQL step to edit it, click Run Query, and save the query again.