{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Predict late shipments\n", "\n", "While delays may be unavoidable, retailers and manufacturers have the ability to manage any negative impact that delays have on their supply chain, by foreseeing and mitigating potential disruptions. The difficulty in doing so today is that retailers and manufacturers are ill equipped with a lack of forward looking information. However, through the use of AI, supply chain managers can proactively anticipate irregularities in the supply chain by predicting whether deliveries will arrive on time for both outbound and inbound shipments. Using historical shipment data and features associated with deliveries such as weather and port traffic, AI learns patterns associated with on-time and late deliveries to accurately classify future shipments into either bucket and offers the top statistical reasons why. Based on this information, supply chain managers are able to implement changes that prevent avoidable late deliveries, and to mitigate the risks that stem from unavoidable late deliveries." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "auto" }, "outputs": [], "source": [ "import datarobot as dr\n", "import matplotlib.pyplot as plt\n", "import matplotlib.ticker as mtick\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Connect to DataRobot\n", "\n", "Read more about different options for [connecting to DataRobot from the client](https://docs.datarobot.com/en/docs/api/api-quickstart/api-qs.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If the config file is not in the default location described in the API Quickstart guide, '~/.config/datarobot/drconfig.yaml', then you will need to call\n", "# dr.Client(config_path='path-to-drconfig.yaml')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import data\n", "\n", "DataRobot hosts the dataset used in this notebook: access it via the URL in the following paragraph (`data_path`). Read in the data directly from the URL into a [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) and display the results to verify all of the data looks correct." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "autoscroll": "auto" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDCountryManaged ByFulfill ViaVendor INCO TermShipment ModeLate_deliveryProduct GroupSub ClassificationVendor...Unit of Measure (Per Pack)Line Item QuantityLine Item ValuePack PriceUnit PriceManufacturing SiteFirst Line DesignationWeight (Kilograms)Freight Cost (USD)Line Item Insurance (USD)
01Cªte d'IvoirePMO - USDirect DropEXWAir0HRDTHIV testRANBAXY Fine Chemicals LTD....3019551.0029.000.97Ranbaxy Fine Chemicals LTDYes13.0780.34NaN
13VietnamPMO - USDirect DropEXWAir0ARVPediatricAurobindo Pharma Limited...24010006200.006.200.03Aurobindo Unit III, IndiaYes358.04521.50NaN
24Cªte d'IvoirePMO - USDirect DropFCAAir0HRDTHIV testAbbott GmbH & Co. KG...10050040000.0080.000.80ABBVIE GmbH & Co.KG WiesbadenYes171.01653.78NaN
315VietnamPMO - USDirect DropEXWAir0ARVAdultSUN PHARMACEUTICAL INDUSTRIES LTD (RANBAXY LAB......6031920127360.803.990.07Ranbaxy, Paonta Shahib, IndiaYes1855.016007.06NaN
416VietnamPMO - USDirect DropEXWAir0ARVAdultAurobindo Pharma Limited...6038000121600.003.200.05Aurobindo Unit III, IndiaYes7590.045450.08NaN
..................................................................
951048HaitiPMO - USDirect DropFCAAir0ARVPediatricABBVIE LOGISTICS (FORMERLY ABBOTT LOGISTICS BV)...6046472.8810.280.17ABBVIE Ludwigshafen GermanyYes10.0893.220.93
961063South AfricaPMO - USDirect DropDDPAir0ARVAdultS. BUYS WHOLESALER...30302547522.7515.710.52Aurobindo Unit III, IndiaYesNaNNaN93.14
971065South AfricaPMO - USDirect DropDDPAir0ARVAdultS. BUYS WHOLESALER...30350049840.0014.240.47Cipla, Goa, IndiaYesNaNNaN97.69
981066South AfricaPMO - USDirect DropDDPAir0ARVPediatricS. BUYS WHOLESALER...24010083588.483.560.01Aurobindo Unit III, IndiaNoNaNNaN7.03
991067South AfricaPMO - USDirect DropDDPAir0ARVPediatricS. BUYS WHOLESALER...2009601152.001.200.01BMS Meymac, FranceYesNaNNaN2.26
\n", "

100 rows × 25 columns

\n", "
" ], "text/plain": [ " ID Country Managed By Fulfill Via Vendor INCO Term \\\n", "0 1 Cªte d'Ivoire PMO - US Direct Drop EXW \n", "1 3 Vietnam PMO - US Direct Drop EXW \n", "2 4 Cªte d'Ivoire PMO - US Direct Drop FCA \n", "3 15 Vietnam PMO - US Direct Drop EXW \n", "4 16 Vietnam PMO - US Direct Drop EXW \n", ".. ... ... ... ... ... \n", "95 1048 Haiti PMO - US Direct Drop FCA \n", "96 1063 South Africa PMO - US Direct Drop DDP \n", "97 1065 South Africa PMO - US Direct Drop DDP \n", "98 1066 South Africa PMO - US Direct Drop DDP \n", "99 1067 South Africa PMO - US Direct Drop DDP \n", "\n", " Shipment Mode Late_delivery Product Group Sub Classification \\\n", "0 Air 0 HRDT HIV test \n", "1 Air 0 ARV Pediatric \n", "2 Air 0 HRDT HIV test \n", "3 Air 0 ARV Adult \n", "4 Air 0 ARV Adult \n", ".. ... ... ... ... \n", "95 Air 0 ARV Pediatric \n", "96 Air 0 ARV Adult \n", "97 Air 0 ARV Adult \n", "98 Air 0 ARV Pediatric \n", "99 Air 0 ARV Pediatric \n", "\n", " Vendor ... \\\n", "0 RANBAXY Fine Chemicals LTD. ... \n", "1 Aurobindo Pharma Limited ... \n", "2 Abbott GmbH & Co. KG ... \n", "3 SUN PHARMACEUTICAL INDUSTRIES LTD (RANBAXY LAB... ... \n", "4 Aurobindo Pharma Limited ... \n", ".. ... ... \n", "95 ABBVIE LOGISTICS (FORMERLY ABBOTT LOGISTICS BV) ... \n", "96 S. BUYS WHOLESALER ... \n", "97 S. BUYS WHOLESALER ... \n", "98 S. BUYS WHOLESALER ... \n", "99 S. BUYS WHOLESALER ... \n", "\n", " Unit of Measure (Per Pack) Line Item Quantity Line Item Value Pack Price \\\n", "0 30 19 551.00 29.00 \n", "1 240 1000 6200.00 6.20 \n", "2 100 500 40000.00 80.00 \n", "3 60 31920 127360.80 3.99 \n", "4 60 38000 121600.00 3.20 \n", ".. ... ... ... ... \n", "95 60 46 472.88 10.28 \n", "96 30 3025 47522.75 15.71 \n", "97 30 3500 49840.00 14.24 \n", "98 240 1008 3588.48 3.56 \n", "99 200 960 1152.00 1.20 \n", "\n", " Unit Price Manufacturing Site First Line Designation \\\n", "0 0.97 Ranbaxy Fine Chemicals LTD Yes \n", "1 0.03 Aurobindo Unit III, India Yes \n", "2 0.80 ABBVIE GmbH & Co.KG Wiesbaden Yes \n", "3 0.07 Ranbaxy, Paonta Shahib, India Yes \n", "4 0.05 Aurobindo Unit III, India Yes \n", ".. ... ... ... \n", "95 0.17 ABBVIE Ludwigshafen Germany Yes \n", "96 0.52 Aurobindo Unit III, India Yes \n", "97 0.47 Cipla, Goa, India Yes \n", "98 0.01 Aurobindo Unit III, India No \n", "99 0.01 BMS Meymac, France Yes \n", "\n", " Weight (Kilograms) Freight Cost (USD) Line Item Insurance (USD) \n", "0 13.0 780.34 NaN \n", "1 358.0 4521.50 NaN \n", "2 171.0 1653.78 NaN \n", "3 1855.0 16007.06 NaN \n", "4 7590.0 45450.08 NaN \n", ".. ... ... ... \n", "95 10.0 893.22 0.93 \n", "96 NaN NaN 93.14 \n", "97 NaN NaN 97.69 \n", "98 NaN NaN 7.03 \n", "99 NaN NaN 2.26 \n", "\n", "[100 rows x 25 columns]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_path = \"https://pathfinder.datarobot.com/wp-content/uploads/2020/06/Pathfinder_Training_Predict-Parts-Shortage.csv\"\n", "\n", "pathfinder_df = pd.read_csv(data_path, encoding=\"ISO-8859-1\")\n", "\n", "pathfinder_df.head(100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualize data\n", "\n", "Below, view several examples of charts that visualize the dataset in different ways such as grouping by shipment method, average delivery time, and vendor." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "autoscroll": "auto" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Shipment ModeLate_delivery
0Air0.096025
1Air Charter0.115385
2Ocean0.175202
3Truck0.160777
\n", "
" ], "text/plain": [ " Shipment Mode Late_delivery\n", "0 Air 0.096025\n", "1 Air Charter 0.115385\n", "2 Ocean 0.175202\n", "3 Truck 0.160777" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = (\n", " pathfinder_df.where(pathfinder_df[\"Late_delivery\"] == 1)\n", " .groupby(\"Shipment Mode\")\n", " .agg({\"Late_delivery\": \"count\"})\n", ")\n", "df2 = (\n", " pathfinder_df.where(pathfinder_df[\"Late_delivery\"] == 0)\n", " .groupby(\"Shipment Mode\")\n", " .agg({\"Late_delivery\": \"count\"})\n", ")\n", "df_perc = df1 / (df2 + df1)\n", "df_perc = df_perc.reset_index() # doing this prevents html getting displayed on the x-axis\n", "\n", "df_perc" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "autoscroll": "auto" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Late_deliveryIDUnit of Measure (Per Pack)Line Item QuantityLine Item ValuePack PriceUnit PriceWeight (Kilograms)Freight Cost (USD)Line Item Insurance (USD)Late_delivery_str
0048029.30674178.45940017182.398446149764.73915022.8808660.6552863258.89246910823.199763229.296219On time
1174750.37352474.38111327194.209949218410.00922414.4316860.2758854656.08079513156.072446321.186624Late
\n", "
" ], "text/plain": [ " Late_delivery ID Unit of Measure (Per Pack) \\\n", "0 0 48029.306741 78.459400 \n", "1 1 74750.373524 74.381113 \n", "\n", " Line Item Quantity Line Item Value Pack Price Unit Price \\\n", "0 17182.398446 149764.739150 22.880866 0.655286 \n", "1 27194.209949 218410.009224 14.431686 0.275885 \n", "\n", " Weight (Kilograms) Freight Cost (USD) Line Item Insurance (USD) \\\n", "0 3258.892469 10823.199763 229.296219 \n", "1 4656.080795 13156.072446 321.186624 \n", "\n", " Late_delivery_str \n", "0 On time \n", "1 Late " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "avg_value_df = pathfinder_df.groupby(\"Late_delivery\").mean()\n", "avg_value_df[\"Late_delivery_str\"] = [\"On time\", \"Late\"]\n", "avg_value_df = avg_value_df.reset_index()\n", "\n", "avg_value_df" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "autoscroll": "auto" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDUnit of Measure (Per Pack)Line Item QuantityLine Item ValuePack PriceUnit PriceWeight (Kilograms)Freight Cost (USD)Line Item Insurance (USD)Display
Late_delivery
048029.30674178.45940017182.398446149764.73915022.8808660.6552863258.89246910823.199763229.296219On time
174750.37352474.38111327194.209949218410.00922414.4316860.2758854656.08079513156.072446321.186624Late
\n", "
" ], "text/plain": [ " ID Unit of Measure (Per Pack) Line Item Quantity \\\n", "Late_delivery \n", "0 48029.306741 78.459400 17182.398446 \n", "1 74750.373524 74.381113 27194.209949 \n", "\n", " Line Item Value Pack Price Unit Price Weight (Kilograms) \\\n", "Late_delivery \n", "0 149764.739150 22.880866 0.655286 3258.892469 \n", "1 218410.009224 14.431686 0.275885 4656.080795 \n", "\n", " Freight Cost (USD) Line Item Insurance (USD) Display \n", "Late_delivery \n", "0 10823.199763 229.296219 On time \n", "1 13156.072446 321.186624 Late " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "avg_value_df = pathfinder_df.groupby(\"Late_delivery\").mean().replace(\"0\", \"On time\")\n", "avg_value_df[\"Display\"] = [\"On time\", \"Late\"]\n", "\n", "avg_value_df" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "autoscroll": "auto" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VendorLate_delivery
0ABBOTT LABORATORIES (PUERTO RICO)NaN
1ABBOTT LOGISTICS B.V.NaN
2ABBVIE LOGISTICS (FORMERLY ABBOTT LOGISTICS BV)0.011527
3ABBVIE, SRL (FORMALLY ABBOTT LABORATORIES INTE...NaN
4ACCESS BIO, INC.NaN
.........
68THE MEDICAL EXPORT GROUP BVNaN
69TURE PHARMACEUTICALS & MEDICAL SUPPLIES P.L.C.NaN
70Trinity Biotech, Plc0.002809
71WAGENIANaN
72ZEPHYR BIOMEDICALSNaN
\n", "

73 rows × 2 columns

\n", "
" ], "text/plain": [ " Vendor Late_delivery\n", "0 ABBOTT LABORATORIES (PUERTO RICO) NaN\n", "1 ABBOTT LOGISTICS B.V. NaN\n", "2 ABBVIE LOGISTICS (FORMERLY ABBOTT LOGISTICS BV) 0.011527\n", "3 ABBVIE, SRL (FORMALLY ABBOTT LABORATORIES INTE... NaN\n", "4 ACCESS BIO, INC. NaN\n", ".. ... ...\n", "68 THE MEDICAL EXPORT GROUP BV NaN\n", "69 TURE PHARMACEUTICALS & MEDICAL SUPPLIES P.L.C. NaN\n", "70 Trinity Biotech, Plc 0.002809\n", "71 WAGENIA NaN\n", "72 ZEPHYR BIOMEDICALS NaN\n", "\n", "[73 rows x 2 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pandas import DataFrame\n", "\n", "df1 = (\n", " pathfinder_df.where(pathfinder_df[\"Late_delivery\"] == 1)\n", " .groupby(\"Vendor\")\n", " .agg({\"Late_delivery\": \"count\"})\n", ")\n", "df2 = (\n", " pathfinder_df.where(pathfinder_df[\"Late_delivery\"] == 0)\n", " .groupby(\"Vendor\")\n", " .agg({\"Late_delivery\": \"count\"})\n", ")\n", "\n", "df_perc = df1 / (df2 + df1)\n", "df_perc = df_perc.reset_index()\n", "\n", "df_perc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Modeling\n", " \n", "For this use case, create a DataRobot project and initiate modeling by running Autopilot in Quick mode." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "autoscroll": "auto" }, "outputs": [], "source": [ "EXISTING_PROJECT_ID = None # If you've already created a project, replace None with the ID here\n", "if EXISTING_PROJECT_ID is None:\n", " # Create project and pass in data\n", " project = dr.Project.create(sourcedata=pathfinder_df, project_name=\"Predicting Late Shipments\")\n", "\n", " # Set the project target to the appropriate feature. Use the LogLoss metric to measure performance\n", " project.set_target(target=\"Late_delivery\", mode=dr.AUTOPILOT_MODE.QUICK, worker_count=\"-1\")\n", "else:\n", " # Fetch the existing project\n", " project = dr.Project.get(EXISTING_PROJECT_ID)\n", "\n", "project.wait_for_autopilot(check_interval=30)\n", "\n", "\n", "# Get the project metric (i.e LogLoss, RMSE, etc.)\n", "metric = project.metric\n", "\n", "# Get the project ID\n", "# project_id = project.id\n", "# project_id" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View project in UI\n", "\n", "If you want to view the project in the DataRobot UI, use the following snippet to retrieve the project's URL and use it to navigate to the application." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "autoscroll": "auto" }, "outputs": [ { "data": { "text/plain": [ "'https://app.datarobot.com/projects/62e1533d3ee0c70bc69f0023/models'" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get project URL\n", "project_url = project.get_leaderboard_ui_permalink()\n", "project_url" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initiate Autopilot" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "autoscroll": "auto" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "In progress: 4, queued: 0 (waited: 0s)\n", "In progress: 4, queued: 0 (waited: 0s)\n", "In progress: 4, queued: 0 (waited: 1s)\n", "In progress: 4, queued: 0 (waited: 2s)\n", "In progress: 4, queued: 0 (waited: 3s)\n", "In progress: 4, queued: 0 (waited: 5s)\n", "In progress: 4, queued: 0 (waited: 8s)\n", "In progress: 4, queued: 0 (waited: 15s)\n", "In progress: 2, queued: 0 (waited: 28s)\n", "In progress: 0, queued: 0 (waited: 54s)\n", "In progress: 7, queued: 0 (waited: 84s)\n", "In progress: 1, queued: 0 (waited: 115s)\n", "In progress: 0, queued: 0 (waited: 145s)\n", "In progress: 5, queued: 0 (waited: 175s)\n", "In progress: 1, queued: 0 (waited: 206s)\n", "In progress: 1, queued: 0 (waited: 236s)\n", "In progress: 1, queued: 0 (waited: 267s)\n", "In progress: 1, queued: 0 (waited: 297s)\n", "In progress: 1, queued: 0 (waited: 327s)\n", "In progress: 3, queued: 0 (waited: 358s)\n", "In progress: 1, queued: 0 (waited: 388s)\n", "In progress: 0, queued: 0 (waited: 418s)\n", "In progress: 0, queued: 0 (waited: 449s)\n" ] } ], "source": [ "project.wait_for_autopilot(check_interval=30)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate model performance \n", "\n", "In order to measure model performance, first select the top-performing model based on a specific performance metric (i.e., `LogLoss`) and then evaluate several different types of charts, such as Lift Chart, ROC Curve, and Feature Importance. There are two helper functions that you need to build in order simplify producing these model insights.\n", "\n", "You can reference more information about model evaluation tools [here](https://docs.datarobot.com/en/docs/modeling/analyze-models/evaluate/index.html)." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "autoscroll": "auto" }, "outputs": [], "source": [ "def sorted_by_metric(models, test_set, metric):\n", " models_with_score = [model for model in models if model.metrics[metric][test_set] is not None]\n", "\n", " return sorted(models_with_score, key=lambda model: model.metrics[metric][test_set])" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "autoscroll": "auto" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The top performing model is Model('eXtreme Gradient Boosted Trees Classifier with Early Stopping') using metric, LogLoss\n" ] } ], "source": [ "models = project.get_models()\n", "\n", "# Uncomment if this is not set above in the create project paragraph\n", "metric = project.metric\n", "\n", "# Get top performing model\n", "model_top = sorted_by_metric(models, \"crossValidation\", metric)[0]\n", "\n", "print(\n", " \"\"\"The top performing model is {model} using metric, {metric}\"\"\".format(\n", " model=str(model_top), metric=metric\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Histogram" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "autoscroll": "auto" }, "outputs": [], "source": [ "# Set styling\n", "dr_dark_blue = \"#08233F\"\n", "dr_blue = \"#1F77B4\"\n", "dr_orange = \"#FF7F0E\"\n", "dr_red = \"#BE3C28\"\n", "dr_light_blue = \"#3CA3E8\"\n", "\n", "# Create function to build Historgrams\n", "\n", "\n", "def rebin_df(raw_df, number_of_bins):\n", " cols = [\"bin\", \"actual_mean\", \"predicted_mean\", \"bin_weight\"]\n", " new_df = pd.DataFrame(columns=cols)\n", " current_prediction_total = 0\n", " current_actual_total = 0\n", " current_row_total = 0\n", " x_index = 1\n", " bin_size = 60 / number_of_bins\n", " for rowId, data in raw_df.iterrows():\n", " current_prediction_total += data[\"predicted\"] * data[\"bin_weight\"]\n", " current_actual_total += data[\"actual\"] * data[\"bin_weight\"]\n", " current_row_total += data[\"bin_weight\"]\n", "\n", " if (rowId + 1) % bin_size == 0:\n", " x_index += 1\n", " bin_properties = {\n", " \"bin\": ((round(rowId + 1) / 60) * number_of_bins),\n", " \"actual_mean\": current_actual_total / current_row_total,\n", " \"predicted_mean\": current_prediction_total / current_row_total,\n", " \"bin_weight\": current_row_total,\n", " }\n", "\n", " new_df = new_df.append(bin_properties, ignore_index=True)\n", " current_prediction_total = 0\n", " current_actual_total = 0\n", " current_row_total = 0\n", " return new_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lift Chart\n", "\n", "A [lift chart](https://docs.datarobot.com/en/docs/modeling/analyze-models/evaluate/lift-chart.html#lift-chart) shows you how close model predictions are to the actual values of the target in the training data. The lift chart data includes the average predicted value and the average actual values" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "# Create function to build lift charts\n", "\n", "\n", "def matplotlib_lift(bins_df, bin_count, ax):\n", " grouped = rebin_df(bins_df, bin_count)\n", " ax.plot(\n", " range(1, len(grouped) + 1),\n", " grouped[\"predicted_mean\"],\n", " marker=\"+\",\n", " lw=1,\n", " color=dr_blue,\n", " label=\"predicted\",\n", " )\n", " ax.plot(\n", " range(1, len(grouped) + 1),\n", " grouped[\"actual_mean\"],\n", " marker=\"*\",\n", " lw=1,\n", " color=dr_orange,\n", " label=\"actual\",\n", " )\n", " ax.set_xlim([0, len(grouped) + 1])\n", " ax.set_facecolor(dr_dark_blue)\n", " ax.legend(loc=\"best\")\n", " ax.set_title(\"Lift chart {} bins\".format(bin_count))\n", " ax.set_xlabel(\"Sorted Prediction\")\n", " ax.set_ylabel(\"Value\")\n", " return grouped" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "autoscroll": "auto" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "lift_chart = model_top.get_lift_chart(\"validation\")\n", "\n", "# Save the result into a pandas dataframe\n", "lift_df = pd.DataFrame(lift_chart.bins)\n", "\n", "bin_counts = [10, 15]\n", "f, axarr = plt.subplots(len(bin_counts))\n", "f.set_size_inches((8, 4 * len(bin_counts)))\n", "\n", "rebinned_dfs = []\n", "for i in range(len(bin_counts)):\n", " rebinned_dfs.append(matplotlib_lift(lift_df, bin_counts[i], axarr[i]))\n", "\n", "plt.tight_layout()\n", "# plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ROC Curve\n", "\n", "The receiver operating characteristic curve, or [ROC curve](https://docs.datarobot.com/en/docs/modeling/analyze-models/evaluate/roc-curve-tab/roc-curve.html), is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "autoscroll": "auto" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
accuracyf1_scorefalse_negative_scoretrue_negative_scoretrue_positive_scorefalse_positive_scoretrue_negative_ratefalse_positive_ratetrue_positive_ratematthews_correlation_coefficientpositive_predictive_valuenegative_predictive_valuethresholdfraction_predicted_as_positivefraction_predicted_as_negativelift_positivelift_negative
00.8849880.0000001901462001.0000000.0000000.0000000.0000000.0000000.8849881.0000000.0000001.0000000.0000001.000000
10.8843830.0000001901461010.9993160.0006840.000000-0.0088720.0000000.8849180.7419940.0006050.9993950.0000000.999921
20.8831720.0000001901459030.9979480.0020520.000000-0.0153760.0000000.8847790.6010530.0018160.9981840.0000000.999764
30.8825670.0000001901458040.9972640.0027360.000000-0.0177600.0000000.8847090.5779890.0024210.9975790.0000000.999685
40.8825670.0202021881456260.9958960.0041040.0105260.0295150.2500000.8856450.5469340.0048430.9951572.1736841.000742
......................................................
1100.1555690.21408506719013950.0458280.9541721.0000000.0741180.1198741.0000000.0024290.9594430.0405571.0422711.129959
1110.1452780.21205405019014120.0342000.9658001.0000000.0636880.1186021.0000000.0021110.9697340.0302661.0312111.129959
1120.1349880.21006103319014290.0225720.9774281.0000000.0514680.1173561.0000000.0017950.9800240.0199761.0203831.129959
1130.1253030.20821901719014450.0116280.9883721.0000000.0367590.1162081.0000000.0015890.9897090.0102911.0103981.129959
1140.1150120.2062980019014620.0000001.0000001.0000000.0000000.1150120.0000000.0009141.0000000.0000001.0000000.000000
\n", "

115 rows × 17 columns

\n", "
" ], "text/plain": [ " accuracy f1_score false_negative_score true_negative_score \\\n", "0 0.884988 0.000000 190 1462 \n", "1 0.884383 0.000000 190 1461 \n", "2 0.883172 0.000000 190 1459 \n", "3 0.882567 0.000000 190 1458 \n", "4 0.882567 0.020202 188 1456 \n", ".. ... ... ... ... \n", "110 0.155569 0.214085 0 67 \n", "111 0.145278 0.212054 0 50 \n", "112 0.134988 0.210061 0 33 \n", "113 0.125303 0.208219 0 17 \n", "114 0.115012 0.206298 0 0 \n", "\n", " true_positive_score false_positive_score true_negative_rate \\\n", "0 0 0 1.000000 \n", "1 0 1 0.999316 \n", "2 0 3 0.997948 \n", "3 0 4 0.997264 \n", "4 2 6 0.995896 \n", ".. ... ... ... \n", "110 190 1395 0.045828 \n", "111 190 1412 0.034200 \n", "112 190 1429 0.022572 \n", "113 190 1445 0.011628 \n", "114 190 1462 0.000000 \n", "\n", " false_positive_rate true_positive_rate \\\n", "0 0.000000 0.000000 \n", "1 0.000684 0.000000 \n", "2 0.002052 0.000000 \n", "3 0.002736 0.000000 \n", "4 0.004104 0.010526 \n", ".. ... ... \n", "110 0.954172 1.000000 \n", "111 0.965800 1.000000 \n", "112 0.977428 1.000000 \n", "113 0.988372 1.000000 \n", "114 1.000000 1.000000 \n", "\n", " matthews_correlation_coefficient positive_predictive_value \\\n", "0 0.000000 0.000000 \n", "1 -0.008872 0.000000 \n", "2 -0.015376 0.000000 \n", "3 -0.017760 0.000000 \n", "4 0.029515 0.250000 \n", ".. ... ... \n", "110 0.074118 0.119874 \n", "111 0.063688 0.118602 \n", "112 0.051468 0.117356 \n", "113 0.036759 0.116208 \n", "114 0.000000 0.115012 \n", "\n", " negative_predictive_value threshold fraction_predicted_as_positive \\\n", "0 0.884988 1.000000 0.000000 \n", "1 0.884918 0.741994 0.000605 \n", "2 0.884779 0.601053 0.001816 \n", "3 0.884709 0.577989 0.002421 \n", "4 0.885645 0.546934 0.004843 \n", ".. ... ... ... \n", "110 1.000000 0.002429 0.959443 \n", "111 1.000000 0.002111 0.969734 \n", "112 1.000000 0.001795 0.980024 \n", "113 1.000000 0.001589 0.989709 \n", "114 0.000000 0.000914 1.000000 \n", "\n", " fraction_predicted_as_negative lift_positive lift_negative \n", "0 1.000000 0.000000 1.000000 \n", "1 0.999395 0.000000 0.999921 \n", "2 0.998184 0.000000 0.999764 \n", "3 0.997579 0.000000 0.999685 \n", "4 0.995157 2.173684 1.000742 \n", ".. ... ... ... \n", "110 0.040557 1.042271 1.129959 \n", "111 0.030266 1.031211 1.129959 \n", "112 0.019976 1.020383 1.129959 \n", "113 0.010291 1.010398 1.129959 \n", "114 0.000000 1.000000 0.000000 \n", "\n", "[115 rows x 17 columns]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "roc = model_top.get_roc_curve(\"validation\")\n", "\n", "# Save the result into a pandas dataframe\n", "roc_df = pd.DataFrame(roc.roc_points)\n", "\n", "roc_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dr_roc_green = \"#03c75f\"\n", "white = \"#ffffff\"\n", "dr_purple = \"#65147D\"\n", "dr_dense_green = \"#018f4f\"\n", "\n", "threshold = roc.get_best_f1_threshold()\n", "fig = plt.figure(figsize=(8, 8))\n", "axes = fig.add_subplot(1, 1, 1, facecolor=dr_dark_blue)\n", "\n", "plt.scatter(roc_df.false_positive_rate, roc_df.true_positive_rate, color=dr_roc_green)\n", "plt.plot(roc_df.false_positive_rate, roc_df.true_positive_rate, color=dr_roc_green)\n", "plt.plot([0, 1], [0, 1], color=white, alpha=0.25)\n", "plt.title(\"ROC curve\")\n", "plt.xlabel(\"False Positive Rate\")\n", "plt.xlim([0, 1])\n", "plt.ylabel(\"True Positive Rate\")\n", "plt.ylim([0, 1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Feature Impact\n", "\n", "[Feature Impact](https://docs.datarobot.com/en/docs/modeling/analyze-models/understand/feature-impact.html) measures how important a feature is in the context of a model. It measures how much the accuracy of a model would decrease if that feature was removed.\n", "\n", "Feature Impact is available for all model types and works by altering input data and observing the effect on a model’s score. It is an on-demand feature, meaning that you must initiate a calculation to see the results. Once DataRobot computes the feature impact for a model, that information is saved with the project." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "autoscroll": "auto" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "feature_impacts = model_top.get_or_request_feature_impact()\n", "\n", "# Limit size to make chart look good. Display top 25 values\n", "if len(feature_impacts) > 25:\n", " feature_impacts = feature_impacts[0:24]\n", "\n", "# Formats the ticks from a float into a percent\n", "percent_tick_fmt = mtick.PercentFormatter(xmax=1.0)\n", "\n", "impact_df = pd.DataFrame(feature_impacts)\n", "impact_df.sort_values(by=\"impactNormalized\", ascending=True, inplace=True)\n", "\n", "# Positive values are blue, negative are red\n", "bar_colors = impact_df.impactNormalized.apply(lambda x: dr_red if x < 0 else dr_blue)\n", "\n", "ax = impact_df.plot.barh(\n", " x=\"featureName\", y=\"impactNormalized\", legend=False, color=bar_colors, figsize=(10, 8)\n", ")\n", "ax.xaxis.set_major_formatter(percent_tick_fmt)\n", "ax.xaxis.set_tick_params(labeltop=True)\n", "ax.xaxis.grid(True, alpha=0.2)\n", "ax.set_facecolor(dr_dark_blue)\n", "\n", "plt.ylabel(\"\")\n", "plt.xlabel(\"Effect\")\n", "plt.xlim((None, 1)) # Allow for negative impact\n", "plt.title(\"Feature Impact\", y=1.04);" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Make predictions\n", "\n", "### Test predictions\n", "\n", "After determining the top-performing model from the Leaderboard, upload the prediction test dataset to verify that the model generates predictions successfully before deploying the model to a production environment. The predictions are returned as a Pandas dataframe. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_path_scoring = \"https://pathfinder.datarobot.com/wp-content/uploads/2020/06/Pathfinder_Scoring_Predict-Parts-Shortage.xlsx\"\n", "scoring_df = pd.read_excel(data_path_scoring, engine=\"openpyxl\")\n", "\n", "prediction_dataset = project.upload_dataset(scoring_df)\n", "predict_job = model_top.request_predictions(prediction_dataset.id)\n", "prediction_dataset.id\n", "\n", "predictions = predict_job.get_result_when_complete()\n", "pd.concat([scoring_df, predictions], axis=1)\n", "predictions.positive_probability.plot(kind=\"hist\", title=\"Predicted Probabilities\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deploy a model to production\n", "\n", "\n", "If you are happy with the model's performance, you can deploy it to a production environment with [MLOps](https://docs.datarobot.com/en/mlops/index.html). Deploying the model will free up workers, as data scored through the deployment doesn't use any modeling workers. Furthermore, you are no longer restricted on the amount of data to score; score over 100GB with the deployment. Deployments also offer many model management benefits: monitoring service, data drift, model comparison, retraining, and more." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "autoscroll": "auto" }, "outputs": [ { "data": { "text/plain": [ "Deployment(Late Shipment Predictions)" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Creat a prediction server\n", "prediction_server = dr.PredictionServer.list()[0]\n", "\n", "# Get top performing model. Un comment if this did not execute in the previous section\n", "# model_top = sorted_by_metric(models, 'crossValidation', metric)[0]\n", "\n", "deployment = dr.Deployment.create_from_learning_model(\n", " model_top.id,\n", " label=\"Late Shipment Predictions\",\n", " description=\"Predict Late Shipments\",\n", " default_prediction_server_id=prediction_server.id,\n", ")\n", "deployment.id" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure batch predictions\n", "\n", "After the model has been deployed, DataRobot creates an endpoint for real time scoring. The deployment allows you to use DataRobot's batch prediction API to score large datasets with a deployed DataRobot model. \n", "\n", "The batch prediction API provides flexible intake and output options when scoring large datasets using prediction servers. The API is exposed through the DataRobot Public API and can be consumed using a REST-enabled client or Public API bindings for DataRobot's Python client." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set the deployment ID" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before proceeding, provide the deployed model's deployment ID (retrieved from the deployment's [Overview tab](https://docs.datarobot.com/en/docs/mlops/monitor/dep-overview.html))." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "deployment_id = \"YOUR_DEPLOYMENT_ID\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Determine input and output options\n", "\n", "DataRobot's batch prediction API allows you to score data from and to multiple sources. You can take advantage of the credentials and data sources you have already established previously through the UI for easy scoring. Credentials are usernames and passwords, while data sources are any databases with which you have previously established a connection (e.g., Snowflake). View the example code below outlining how to query credentials and data sources.\n", "\n", "You can reference the full list of DataRobot's supported [input](https://docs.datarobot.com/en/docs/predictions/batch/batch-prediction-api/intake-options.html) and [output options](https://docs.datarobot.com/en/docs/predictions/batch/batch-prediction-api/output-options.html).\n", "\n", "Reference the DataRobot documentation for more information about [data connections](https://docs.datarobot.com/en/docs/data/connect-data/data-conn.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The snippet below shows how you can query all credentials tied to a DataRobot account." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dr.Credential.list()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output above returns multiple sets of credentials. The alphanumeric string included in each item of the list is the credentials ID. You can use that ID to access credentials through the API." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The snippet below shows how you can query all data sources tied to a DataRobot account. The second line lists each datastore with an alphanumeric string; that is the datastore ID." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5e6696ff820e737a5bd78430\n" ] } ], "source": [ "dr.DataStore.list()\n", "print(dr.DataStore.list()[0].id)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Scoring examples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The snippets below demonstrate how to score data with the Batch Prediction API. Edit the `intake_settings` and `output_settings` to suit your needs. You can mix and match until you get the outcome you prefer." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Score from CSV to CSV" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Scoring without Prediction Explanations\n", "dr.BatchPredictionJob.score(\n", " deployment_id,\n", " intake_settings={\n", " \"type\": \"localFile\",\n", " \"file\": \"inputfile.csv\", # Provide the filepath, Pandas dataframe, or file-like object here\n", " },\n", " output_settings={\"type\": \"localFile\", \"path\": \"outputfile.csv\"},\n", ")\n", "\n", "# Scoring with Prediction Explanations\n", "dr.BatchPredictionJob.score(\n", " deployment_id,\n", " intake_settings={\n", " \"type\": \"localFile\",\n", " \"file\": \"inputfile.csv\", # Provide the filepath, Pandas dataframe, or file-like object here\n", " },\n", " output_settings={\"type\": \"localFile\", \"path\": \"outputfile.csv\"},\n", " max_explanations=3, # Compute Prediction Explanations for the amount of features indicated here\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Score from S3 to S3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dr.BatchPredictionJob.score(\n", " deployment_id,\n", " intake_settings={\n", " \"type\": \"s3\",\n", " \"url\": \"s3://theos-test-bucket/lending_club_scoring.csv\", # Provide the URL of your datastore here\n", " \"credential_id\": \"YOUR_CREDENTIAL_ID_FROM_ABOVE\", # Provide your credentials here\n", " },\n", " output_settings={\n", " \"type\": \"s3\",\n", " \"url\": \"s3://theos-test-bucket/lending_club_scored2.csv\",\n", " \"credential_id\": \"YOUR_CREDENTIAL_ID_FROM_ABOVE\",\n", " },\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Score from JDBC to JDBC" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "dr.BatchPredictionJob.score(\n", " deployment_id,\n", " intake_settings={\n", " \"type\": \"jdbc\",\n", " \"table\": \"table_name\",\n", " \"schema\": \"public\",\n", " \"dataStoreId\": data_store.id, # Provide the ID of your datastore here\n", " \"credentialId\": cred.credential_id, # Provide your credentials here\n", " },\n", " output_settings={\n", " \"type\": \"jdbc\",\n", " \"table\": \"table_name\",\n", " \"schema\": \"public\",\n", " \"statementType\": \"insert\",\n", " \"dataStoreId\": data_store.id,\n", " \"credentialId\": cred.credential_id,\n", " },\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 4 }