Data ingest¶
DataRobot Location AI enables tapping into existing geospatial data sources through a variety of pathways, including:
- Native geospatial files
- Spatially-enabled database table
- Auto-recognized spatial coordinates
- User transformations to location variable type
Connecting directly to geospatial data saves the time and resources required for exporting from native geospatial data formats in a Geographic Information System (GIS) or a data preparation tool. DataRobot Location AI’s ability to automatically recognize geospatial data in non-native formats also allows non-traditional Geospatial Analysts to work explicitly with spatial data.
Native geospatial data¶
DataRobot Location AI supports ingest of these native geospatial data formats:
- ESRI Shapefiles
- GeoJSON
- ESRI File Geodatabase
- Well Known Text (embedded in table column)
- PostGIS Databases
Native geospatial file formats are uploaded to DataRobot in the same way as non-geospatial formats—such as drag-and-drop, URL upload, and using the AI Catalog.
ESRI Shapefiles¶
ESRI Shapefiles are a common native geospatial format, created in the late-1990s and still in wide use today. Shapefiles are a multifile format that require, at a minimum, the .shp
, .shx
, and .dbf
extensions for completion. Because of the multifile nature of the format, DataRobot Location AI accepts ZIP archived files that include these extensions and the additional .prj
extension describing the Coordinate Reference System (CRS) for the data.
GeoJSON¶
GeoJSON is a more recent geospatial file format, often used in web mapping applications, and was submitted as a specification by the Internet Engineering Task Force (IETF). Unlike ESRI Shapefiles, GeoJSON is a single file format that describes the Coordinate Reference System (CRS) within the file itself.
ESRI File Geodatabase¶
ESRI File Geodatabase is a proprietary format that approximates a database through a nested folder structure. Location AI can read a File Geodatabase directory (with extension .gdb
) in a ZIP archive with extension .gdb.zip
. Location AI reads the first layer in a Geodatabase file.
Well Known Text¶
Well Known Text (WKT) is a markup language described in the Open Geospatial Consortium’s (OGC) Simple Feature Access specification. WKT is a versatile representation of vector geospatial geometries and can be utilized in any of DataRobot AutoML’s existing file types as a feature describing the geometry associated with a row. See the “WKT” column in the figure below.
PostGIS Databases¶
Configuring PostGIS ingest follows the same workflow as non-geospatial databases.
Auto-recognition of location data¶
In addition to native geospatial data ingest, DataRobot Location AI can automatically detect location data within non-geospatial formats. DataRobot Location AI will automatically recognize location variables when the columns contain the name latitude and longitude and contain values in these formats:
-
Decimal degrees
-
Degrees minutes seconds
- -46° 37′ 59.988″ and -23° 33′
- 46.63333W and 23.55S
- 46*37′59.98"W and 23*33′S
- W 46D 37m 59.988s and S 23D 33m
DataRobot marks geometry features created as the result of auto-recognized spatial coordinates with an icon in the Data page.
User transformation to location data¶
When spatial coordinates embedded in non-geospatial file formats are not recognized, you can still use DataRobot variable type transform functionality to create a location feature. To transform data into a location feature:
-
Navigate to one of the parent coordinate features and expand the feature listing; select Var Type Transform from the feature menu.
-
In the Numeric/Categorical Transformation dialog, select Location from the Transform Numeric/Categorical to dropdown.
-
Two additional dropdown menus appear—Latitude and Longitude. Select from the existing feature set to specify the parent coordinates.
-
Click Create feature.
The new feature appears after its parent feature as a new row in the Data table, noted with an icon indicating it is user-created.
Location variable type¶
In addition to the traditional variable types of numeric, categorical, and date, Location AI adds a location variable type to provide explicit treatment of spatial data in DataRobot models.
The location variable type supports the 2d geometric primitives as specified in the OGC Simple Feature Access specification and some multipart geometries. These include support for:
- Point/MultiPoint
- LineString/MultiLineString
- Polygon/MultiPolygon
Location variables improve DataRobot’s ability to handle location data throughout the AutoML workflow, including model blueprints, feature importance calculations, and visualizations.