MongoDB Inventory¶
Rationale¶
New MongoDB inventory tool to ensure no data is lost after backup/restore process, after upgrades or migrations. This is available as of the 9.0.1 and 8.0.16 releases
Usage¶
Script is located at Datarobot-RELEASE-10.X.X/installer_tools/scripts/ called mongodb_data_consistency.py on the k8s cluster
Please copy this to mmapp-app pod using below kubectl command, please replace namespace and pod name accordingly
kubectl cp Datarobot-RELEASE-10.X.X/installer_tools/scripts/mongodb_data_consistency.py <your-kubernetes-namespace>/<mmapp-app-pod-name>:/mnt/local_file_storage/mongodb_data_consistency.py
We strongly recommend using mmapp-app pod that has datarobot-runtime available or any other pod that has datarobot-runtime and mongodb secrets accessible to it
For default mongo-uri in dr-secrets:
Run this command from source before backup restore process or before upgrade:
python3 /mnt/local_file_storage/mongodb_data_consistency.py pre-upgrade
Run this command from target after restore or upgrade:
python3 /mnt/local_file_storage/mongodb_data_consistency.py post-upgrade --file <file/path/inventory_pre_upgrade.txt>
For remote mongo-uri connections:
Run this command from source before backup restore process or before upgrade:
python3 /mnt/local_file_storage/mongodb_data_consistency.py pre-upgrade --mongo-uri 'mongodb://username:password@mongo_host:27017'
Run this command from target after restore or upgrade:
python3 /mnt/local_file_storage/mongodb_data_consistency.pypost-upgrade
--file <file/path/inventory_pre_upgrade.txt>
--post-mongo-uri 'mongodb://username:password@mongo_host:27017'
Parameters:
* pre-upgrade option used to get mongodb data inventory of the source before backup/restore, upgrade or migration process. By default inventory script will be stored as inventory_pre_upgrade.txt in the current directory
-
post-upgradeoption is used to get mongodb data inventory of target after restore, upgrade or migration completion. Requires--fileparameter to be passed. -
--fileoption is mandatory forpost-upgradeoption, where pre inventory needs to be passed i.einventory_pre_upgrade.txt -
--mongo-uriparameter can be passed if its remote database not available in dr-secrets
Inventory should be produced before and after running data migration. Once post-upgrade is run, it will compare two inventories and will output the diff between them. Example output
['MMApp', 'admin', 'application_builder', 'audit', 'celery', 'common_infra', 'config', 'datasets', 'draudit', 'dss_profiling', 'identity', 'json_studio', 'loadtest', 'local', 'orm_next', 'prediction_optimization_ux', 'secure_configs', 'test', 'usersecrets', 'varietyResults']
Collection: reason_codes_job
Size Difference: 13230080 -> 13369344
Number of Documents Difference: 101872 -> 103024
Number of Indexes Difference: 5 -> 2
Collection: period_accuracy_period_metadata
Size Difference: 245760 -> 237568
Number of Documents Difference: 2886 -> 2921
Collection: project_clone
Size Difference: 3678208 -> 3764224
Number of Documents Difference: 101437 -> 119725
There are 3 collections that differ
The script also considers certain collections that keep varying which can be safely ignored during restores even if there are document/index mismatch. It prints the difference but ignores it to count towards an inconsistent state.
'job_process',
'qid_counter',
'compute_cluster_metrics',
'queue_monitor',
'queue',
'job_executions',
'execute_kubeworkers_health_checks'
At the end of the comparision, script will print if database is in consistent state or not based on the differences.