Self-managed installation and maintenance > Manage the cluster

Manage the cluster¶

This section presents administrative tasks and commands to ensure your cluster is configured and operating as expected.

Useful commands to know¶

Helm commands¶

The following provides a selection of code snippets that can be used for managing Helm-related aspects of the cluster. For full documentation, see the Helm documentation.

List all deployments in a given namespace

helm list -n $NAMESPACE
````

**Get the values supplied to a Helm chart when a deployment was created**

```shell
helm get values $RELEASE_NAME -n $NAMESPACE

Get all the values Helm computed when a deployment was created

helm get all $RELEASE_NAME -n $NAMESPACE

Get the content of a Helm chart’s default values.yaml file:

helm show values $CHART_NAME

**Get the manifest of a deployed release**

```shell
helm get manifest $RELEASE_NAME -n $NAMESPACE

Kubectl commands¶

For full documentation, see the Kubectl Reference Docs.

Get all resources in a specific namespace:

kubectl get all -n $NAMESPACE

Get a specific resource type across all namespaces

kubectl get $RESOURCE_TYPE -A

Get all of a resource type in a specific namespace:

kubectl get $RESOURCE_TYPE -n $NAMESPACE

Get logs from a pod:

kubectl logs $POD_NAME -n $NAMESPACE

Follow logs from a pod in real time:

kubectl logs -f $POD_NAME -n $NAMESPACE

Get the values used to build a resource and its status, displayed in YAML format:

# General command
kubectl get $RESOURCE_TYPE/$SPECIFIC_RESOURCE -n $NAMESPACE -o yaml

# Example
kubectl get pod/datarobot-nginx-ABESC -n datarobot-core -o yaml

Get all events from the last hour for a namespace:

kubectl get events -n $NAMESPACE

Set the default namespace for kubectl commands: This allows you to run the commands above without the -n $NAMESPACE flag.

kubectl config set-context --current --namespace=$NAMESPACE

Restart all deployments in the DataRobot core namespace:

for dep in `kubectl get deployments.apps -n DR_CORE_NAMESPACE | tail -n +2 | awk '{print $1}'`; do kubectl rollout restart deployment/$dep -n DR_CORE_NAMESPACE; done

Scale cluster up and down¶

Sometimes it's required to temporarily scale a cluster down, for example, to save resources during weekends. Because there are no simple start/stop commands for Kubernetes applications, DataRobot suggests you temporarily scale the cluster down to zero replicas and then restore its original size when it's required.

Note

If not all nodes of the pcs-rabbitmq stateful set come up after scaling up, you must apply the RabbitMQ cluster recovery procedure.

To scale the cluster down, the following command annotates each deployment and stateful set with its current number of replicas and then scales it down to zero replicas:

for obj in $(kubectl -n DR_CORE_NAMESPACE get deployments,statefulsets -o name); do
    r=$(kubectl -n DR_CORE_NAMESPACE get $obj -o jsonpath='{.spec.replicas}')
    kubectl -n DR_CORE_NAMESPACE annotate --overwrite $obj replicas=$r
    kubectl -n DR_CORE_NAMESPACE scale $obj --replicas=0
done

To restore the cluster to its original size, the following command reads the replica count from the annotation, scales the resources up, and removes the annotation upon completion:

for obj in $(kubectl -n DR_CORE_NAMESPACE get statefulsets,deployments -o name); do
    r=$(kubectl -n DR_CORE_NAMESPACE get $obj -o jsonpath='{.metadata.annotations.replicas}')
    kubectl -n DR_CORE_NAMESPACE scale $obj --replicas=$r
    kubectl -n DR_CORE_NAMESPACE annotate $obj replicas-
done

Collecting a cluster profile¶

During troubleshooting, your DataRobot support representative may ask for a cluster information dump. The following command exports the cluster's currently running configuration, plus logs and events for the various services in use.

Replace /path/to/a/folder/on/disk/ with a valid local path.

kubectl cluster-info dump -n DR_CORE_NAMESPACE --output-directory=/path/to/a/folder/on/disk/cluster-state

This command creates a folder named cluster-state. You can then create a compressed tarball of that folder and its contents to provide to your support representative for detailed analysis.

tar -cvzf cluster-state-$(date +%F).tar.gz /path/to/a/folder/on/disk/cluster-state/