Profile applicability: Level 2
Nodes in a degraded state are an unknown quantity and so may pose a security risk.
Kubernetes Engine's node auto-repair feature helps you keep the nodes in the cluster
in a healthy, running state. When enabled, Kubernetes Engine makes periodic checks
on the health state of each node in the cluster. If a node fails consecutive health
checks over an extended time period, Kubernetes Engine initiates a repair process
for that node.
NoteNode auto-repair is enabled by default.
|
Impact
If multiple nodes require repair, Kubernetes Engine might repair them in parallel.
Kubernetes Engine limits number of repairs depending on the size of the cluster (bigger
clusters have a higher limit) and the number of broken nodes in the cluster (limit
decreases if many nodes are broken). Node auto-repair is not available on Alpha Clusters.
Audit
Using Google Cloud Console:
- Go to Kubernetes Engine by visiting: Google Cloud Console Kubernetes Engine page.
- From the list of clusters, select the desired cluster.
- For each Node pool, view the Node pool Details pane and ensure that under the 'Management' heading, 'Auto-repair' is set to 'Enabled'.
Using Command Line:
To check the existence of node auto-repair for an existing cluster's node pool, first
define 4 variables for Node Pool, Cluster Name, Location and Project and then run
the following command:
gcloud container node-pools describe $POOL_NAME --cluster $CLUSTER_NAME --location $LOCATION --project $PROJECT_ID --format json | jq '.management'
Ensure the output of the above command has JSON key attribute
autoRepair set to true:{ "autoRepair": true }
Remediation
Using Google Cloud Console:
- Go to Kubernetes Engine by visiting: Google Cloud Console Kubernetes Engine page.
- Select the Kubernetes cluster containing the node pool for which auto-repair is disabled.
- Select the Node pool by clicking on the name of the pool.
- Navigate to the Node pool details pane and click EDIT.
- Under the Management heading, check the Enable auto-repair box.
- Click SAVE.
- Repeat steps 2-6 for every cluster and node pool with auto-repair disabled.
Using Command Line:
To enable node auto-repair for an existing cluster's Node pool:
gcloud container node-pools update <node_pool_name> --cluster <cluster_name> --location <location> --enable-autorepair
