Skip to content

MongoDB Inventory

Rationale

New MongoDB inventory tool to ensure no data is lost after backup/restore process, after upgrades or migrations. This is available as of the 9.0.1 and 8.0.16 releases

Usage

Script is located at Datarobot-RELEASE-10.X.X/installer_tools/scripts/ called mongodb_data_consistency.py on the k8s cluster

Please copy this to mmapp-app pod using below kubectl command, please replace namespace and pod name accordingly

kubectl cp Datarobot-RELEASE-10.X.X/installer_tools/scripts/mongodb_data_consistency.py <your-kubernetes-namespace>/<mmapp-app-pod-name>:/mnt/local_file_storage/mongodb_data_consistency.py

We strongly recommend using mmapp-app pod that has datarobot-runtime available or any other pod that has datarobot-runtime and mongodb secrets accessible to it

For default mongo-uri in dr-secrets: Run this command from source before backup restore process or before upgrade: python3 /mnt/local_file_storage/mongodb_data_consistency.py pre-upgrade Run this command from target after restore or upgrade: python3 /mnt/local_file_storage/mongodb_data_consistency.py post-upgrade --file <file/path/inventory_pre_upgrade.txt> For remote mongo-uri connections: Run this command from source before backup restore process or before upgrade: python3 /mnt/local_file_storage/mongodb_data_consistency.py pre-upgrade --mongo-uri 'mongodb://username:password@mongo_host:27017' Run this command from target after restore or upgrade: python3 /mnt/local_file_storage/mongodb_data_consistency.pypost-upgrade --file <file/path/inventory_pre_upgrade.txt> --post-mongo-uri 'mongodb://username:password@mongo_host:27017'

Parameters: * pre-upgrade option used to get mongodb data inventory of the source before backup/restore, upgrade or migration process. By default inventory script will be stored as inventory_pre_upgrade.txt in the current directory

  • post-upgrade option is used to get mongodb data inventory of target after restore, upgrade or migration completion. Requires --file parameter to be passed.

  • --file option is mandatory for post-upgrade option, where pre inventory needs to be passed i.e inventory_pre_upgrade.txt

  • --mongo-uri parameter can be passed if its remote database not available in dr-secrets

Inventory should be produced before and after running data migration. Once post-upgrade is run, it will compare two inventories and will output the diff between them. Example output

['MMApp', 'admin', 'application_builder', 'audit', 'celery', 'common_infra', 'config', 'datasets', 'draudit', 'dss_profiling', 'identity', 'json_studio', 'loadtest', 'local', 'orm_next', 'prediction_optimization_ux', 'secure_configs', 'test', 'usersecrets', 'varietyResults']

Collection: reason_codes_job
Size Difference: 13230080 -> 13369344
Number of Documents Difference: 101872 -> 103024
Number of Indexes Difference: 5 -> 2

Collection: period_accuracy_period_metadata
Size Difference: 245760 -> 237568
Number of Documents Difference: 2886 -> 2921

Collection: project_clone
Size Difference: 3678208 -> 3764224

Number of Documents Difference: 101437 -> 119725
There are 3 collections that differ

The script also considers certain collections that keep varying which can be safely ignored during restores even if there are document/index mismatch. It prints the difference but ignores it to count towards an inconsistent state.

'job_process',
'qid_counter',
'compute_cluster_metrics',
'queue_monitor',
'queue',
'job_executions',
'execute_kubeworkers_health_checks'

At the end of the comparision, script will print if database is in consistent state or not based on the differences.