DataRobot Location AI adds support for geospatial analysis across the entire AutoML workflow. These tools and techniques help users improve their modeling workflows by:
- Natively ingesting common geospatial formats
- Automatically recognizing geospatial coordinates in non-spatial formats
- Allowing Exploratory Spatial Data Analysis (ESDA)
- Enhancing model blueprints with spatially-explicit modeling tasks
- Visualizing geospatial data using interactive maps in pre- and post-modeling
- Gaining insights into geospatial patterns in your models
DataRobot’s Location AI enhances the standard AutoML workflow to capture a broad range of geospatial problems.
See the associated considerations for important additional information.
These sections describe:
|Data ingest||Work with sources of geospatial data.|
|ESDA||Conduct Exploratory Spatial Data Analysis (ESDA) within the DataRobot environment.|
|Modeling||Expand traditional automated feature engineering and improve model options.|
|Accuracy Over Space||Assess model fidelity through visualizations.|
Consider the following location, visualization, and modeling points when working with Location AI.
Many features of Location AI operate with a primary location feature. You can have multiple Location features in a dataset; the primary location feature is the one that is used as the basis for most visualization and modeling.
The primary location feature is automatically set to the first Location feature in a dataset. You can change this selection by using the dropdown in the Geospatial Modeling section of the EDA page. Once Autopilot is started, the primary location feature cannot be changed.
Location features are automatically created when a project is created from:
- The geometry described in a native geospatial data source (e.g., a Shapefile or a PostGIS database) will be recognized as a Location feature.
- Columns in tables that contain Well-Known Text or Well-Known Binary (Hex) are recognized as Location.
- If there are two features in the dataset that have names containing “latitude” and “longitude” (English only, case insensitive) and valid Location data, these will be automatically transformed into a Location feature. Only the first such pair in the dataset will be transformed.
All of the rows in a Location column must be of the same shape type, e.g., Point, Line, or Polygon.
You can manually create a Location feature from a pair of features with latitude and longitude data.
- These features must contain valid Location data.
- You must add this newly created feature to the feature list for the new feature to be used in modeling. For Autopilot, this is usually Informative Features.
Location AI provides several intuitive tools for Exploratory Spatial Data Analysis (ESDA) and to explore model performance insights.
The Unique Map visualization, which will display every Location on a map, is typically available. When datasets are sufficiently large the data is automatically aggregated to improve performance:
- In any dataset with more than 20,000 unique geometries (points, lines, or polygons).
- Datasets with polygons or lines with more than 50,000 total vertices.
Accuracy Over Space is available for Regression projects only. It is not available when using Over Time Validation (OTV).
When using Location AI, modeling blueprints are enhanced to take advantage of the important information often provided by location.
Location AI can be used for Exploratory Spatial Data Analysis (ESDA) in multiclass projects, but Location AI models will only be available for regression, binary classification, anomaly detection, and clustering projects.
Time series is not supported.
The Spatial Neighborhood Featurizer will not run if there are more than 100,000 rows or more than 500 numeric columns in a dataset.
In cases of point Location features created by transforming a latitude and longitude feature, you may wish to exclude the original latitude and longitude feature from the feature list during modeling, as these carry the same information as the new Location feature.
Some modeling blueprints do not support Location AI, such as Gaussian Process Regressors and Eureqa models. Some blueprints may run in Autopilot or be available in the Repository that will not use the location information.
Scoring Code export is not supported.