Troubleshooting: Instance manager pods are restarted every hour

February 25, 2022 · 2 min read

Phan Le

Applicable versions

v1.0.1 or newer

Background

Each Longhorn volume has one engine and one or more replicas (see more detail about Longhorn architecture at here). When a Longhorn volume is attached, Longhorn launches a process for each engine/replica object. The engine process will be launched inside engine instance manager pods (the instance-manager-e-xxxxxxxx pods inside longhorn-system namespace). The replica process will be launched inside replica instance manager pods (the instance-manager-r-xxxxxxxx pods inside longhorn-system namespace).

Symptoms

The instance manager pods are restarted every hour. As the consequence, Longhorn volumes and the workload pods are crashed every hour.

Reason

One potential root cause is that the cluster has the default PriorityClass (i.e., the PriorityClass with globalDefault field set to true) but the PriorityClass setting in Longhorn is empty. See more about PriorityClass at here.

When Longhorn creates the instance manager pods, it doesn't set the PriorityClass for them because the PriorityClass setting in Longhorn is empty. Because the cluster has default PriorityClass, Kubernetes automatically uses it for newly created Pods without a PriorityClassName. Later on, Longhorn detects the difference between the actual PriorityClass in the instance manager pods and the PriorityClass in Longhorn setting, so Longhorn deletes and recreates the instance manager pods. This happens every hour since Longhorn resyncs all setting every hour.

Solution

Set the PriorityClass setting in Longhorn to be the same as the default PriorityClass

Longhorn issue: https://github.com/longhorn/longhorn/issues/2820

Applicable versions​

Background​

Symptoms​

Reason​

Solution​

Related information​

Applicable versions

Background

Symptoms

Reason

Solution

Related information