{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Identify money laundering with anomaly detection\n", "\n", "This notebook demonstrates how DataRobot performs outlier detection with a use case that prevents money laundering: the process of hiding illicitly obtained money. The notebook uses a historical money transactions dataset and trains anomaly detection models to detect outliers. \n", "\n", "In the sample dataset, fraudulent transactions are identified in the `SAR` column; however, in this use case, that information will not be used to train the model. This is because, in most cases, money laundering goes undetected and [anomaly detection in DataRobot](https://docs.datarobot.com/en/docs/modeling/special-workflows/unsupervised/anomaly-detection.html) can help identify when it occurs. This notebook uses a small subset of the data to evaluate how well the unsupervised approach works, as you can compare the results to the data that's already labeled as fraud (the `SAR` column).\n", "\n", "## Requirements\n", "\n", "- Python version 3.7.3\n", "- DataRobot API version 2.19.0\n", "\n", "Note that small adjustments may be required depending on the Python and DataRobot API versions you are using.\n", "\n", "Full documentation of the Python package can be found [here](https://datarobot-public-api-client.readthedocs-hosted.com)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import datarobot as dr\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "from sklearn.metrics import confusion_matrix\n", "from sklearn.model_selection import train_test_split\n", "\n", "%matplotlib inline\n", "import time\n", "\n", "import seaborn as sns\n", "from sklearn.metrics import f1_score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Upload data\n", "\n", "In the sample dataset, [available for download](aml.csv), `SAR` is the target feature. For the purposes of this notebook, the target is not used for training, but instead to evaluate the accuracy of the anomaly detection models built in later steps. This is because you do not want to cause [target leakage](https://docs.datarobot.com/en/docs/glossary/index.html#target-leakage)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | ALERT | \n", "SAR | \n", "kycRiskScore | \n", "income | \n", "tenureMonths | \n", "creditScore | \n", "state | \n", "nbrPurchases90d | \n", "avgTxnSize90d | \n", "totalSpend90d | \n", "... | \n", "indCustReqRefund90d | \n", "totalRefundsToCust90d | \n", "nbrPaymentsCashLike90d | \n", "maxRevolveLine | \n", "indOwnsHome | \n", "nbrInquiries1y | \n", "nbrCollections3y | \n", "nbrWebLogins90d | \n", "nbrPointRed90d | \n", "PEP | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "0 | \n", "3 | \n", "110300.0 | \n", "5 | \n", "757 | \n", "PA | \n", "10 | \n", "153.80 | \n", "1538.00 | \n", "... | \n", "1 | \n", "45.82 | \n", "5 | \n", "6000 | \n", "0 | \n", "3 | \n", "0 | \n", "6 | \n", "1 | \n", "0 | \n", "
1 | \n", "1 | \n", "0 | \n", "2 | \n", "107800.0 | \n", "6 | \n", "715 | \n", "NY | \n", "22 | \n", "1.59 | \n", "34.98 | \n", "... | \n", "1 | \n", "67.40 | \n", "0 | \n", "10000 | \n", "1 | \n", "3 | \n", "0 | \n", "87 | \n", "0 | \n", "0 | \n", "
2 | \n", "1 | \n", "0 | \n", "1 | \n", "74000.0 | \n", "13 | \n", "751 | \n", "MA | \n", "7 | \n", "57.64 | \n", "403.48 | \n", "... | \n", "1 | \n", "450.69 | \n", "0 | \n", "10000 | \n", "0 | \n", "3 | \n", "0 | \n", "6 | \n", "0 | \n", "0 | \n", "
3 | \n", "1 | \n", "0 | \n", "0 | \n", "57700.0 | \n", "1 | \n", "659 | \n", "NJ | \n", "14 | \n", "29.52 | \n", "413.28 | \n", "... | \n", "1 | \n", "71.43 | \n", "0 | \n", "8000 | \n", "1 | \n", "5 | \n", "0 | \n", "7 | \n", "2 | \n", "0 | \n", "
4 | \n", "1 | \n", "0 | \n", "1 | \n", "59800.0 | \n", "3 | \n", "709 | \n", "PA | \n", "54 | \n", "115.77 | \n", "6251.58 | \n", "... | \n", "1 | \n", "2731.39 | \n", "3 | \n", "7000 | \n", "1 | \n", "1 | \n", "0 | \n", "8 | \n", "1 | \n", "0 | \n", "
5 rows × 31 columns
\n", "