Manage the cluster¶
This section presents administrative tasks and commands to ensure your cluster is configured and operating as expected.
Useful commands to know¶
Helm commands¶
The following provides a selection of code snippets that can be used for managing Helm-related aspects of the cluster. For full documentation, see the Helm documentation.
List all deployments in a given namespace
helm list -n $NAMESPACE
````
**Get the values supplied to a Helm chart when a deployment was created**
```shell
helm get values $RELEASE_NAME -n $NAMESPACE
Get all the values Helm computed when a deployment was created
helm get all $RELEASE_NAME -n $NAMESPACE
Get the content of a Helm chart’s default values.yaml file:
helm show values $CHART_NAME
**Get the manifest of a deployed release**
```shell
helm get manifest $RELEASE_NAME -n $NAMESPACE
Kubectl commands¶
For full documentation, see the Kubectl Reference Docs.
Get all resources in a specific namespace:
kubectl get all -n $NAMESPACE
Get a specific resource type across all namespaces
kubectl get $RESOURCE_TYPE -A
Get all of a resource type in a specific namespace:
kubectl get $RESOURCE_TYPE -n $NAMESPACE
Get logs from a pod:
kubectl logs $POD_NAME -n $NAMESPACE
Follow logs from a pod in real time:
kubectl logs -f $POD_NAME -n $NAMESPACE
Get the values used to build a resource and its status, displayed in YAML format:
# General command
kubectl get $RESOURCE_TYPE/$SPECIFIC_RESOURCE -n $NAMESPACE -o yaml
# Example
kubectl get pod/datarobot-nginx-ABESC -n datarobot-core -o yaml
Get all events from the last hour for a namespace:
kubectl get events -n $NAMESPACE
Set the default namespace for kubectl commands:
This allows you to run the commands above without the -n $NAMESPACE flag.
kubectl config set-context --current --namespace=$NAMESPACE
Restart all deployments in the DataRobot core namespace:
for dep in `kubectl get deployments.apps -n DR_CORE_NAMESPACE | tail -n +2 | awk '{print $1}'`; do kubectl rollout restart deployment/$dep -n DR_CORE_NAMESPACE; done
Scale cluster up and down¶
Sometimes it is required to temporarily scale a cluster down, for example, to save resources during weekends. Because there are no simple start/stop commands for Kubernetes applications, DataRobot suggests you temporarily scale the cluster down to zero replicas and then restore its original size when it is required.
Note
If not all nodes of the pcs-rabbitmq stateful set come up after scaling up, you must apply the RabbitMQ cluster recovery procedure.
To scale the cluster down, the following command annotates each deployment and stateful set with its current number of replicas and then scales it down to zero replicas:
for obj in $(kubectl -n DR_CORE_NAMESPACE get deployments,statefulsets -o name); do
r=$(kubectl -n DR_CORE_NAMESPACE get $obj -o jsonpath='{.spec.replicas}')
kubectl -n DR_CORE_NAMESPACE annotate --overwrite $obj replicas=$r
kubectl -n DR_CORE_NAMESPACE scale $obj --replicas=0
done
To restore the cluster to its original size, the following command reads the replica count from the annotation, scales the resources up, and removes the annotation upon completion:
for obj in $(kubectl -n DR_CORE_NAMESPACE get statefulsets,deployments -o name); do
r=$(kubectl -n DR_CORE_NAMESPACE get $obj -o jsonpath='{.metadata.annotations.replicas}')
kubectl -n DR_CORE_NAMESPACE scale $obj --replicas=$r
kubectl -n DR_CORE_NAMESPACE annotate $obj replicas-
done
Collecting a cluster profile¶
During troubleshooting, your DataRobot support representative may ask for a cluster information dump. The following command exports the cluster's currently running configuration, plus logs and events for the various services in use.
Replace /path/to/a/folder/on/disk/ with a valid local path.
kubectl cluster-info dump -n DR_CORE_NAMESPACE --output-directory=/path/to/a/folder/on/disk/cluster-state
This command creates a folder named cluster-state. You can then create a compressed tarball of that folder and its contents to provide to your support representative for detailed analysis.
tar -cvzf cluster-state-$(date +%F).tar.gz /path/to/a/folder/on/disk/cluster-state/