Skip to main content

Troubleshooting: Pod with `volumeMode: Block` is stuck in terminating

· 2 min read
Phan Le

Applicable versions

All Longhorn versions.

Symptoms

User has a pod that uses a PVC with volumeMode: Block provisioned by Longhorn CSI driver. After an unexpected crash of the Longhorn volume (due to network, CPU pressure, hardware problem, etc...), the user cannot delete the pod. The pod would be stuck in terminating forever since Kubelet refuses to unmount the block volume. This prevents the user from cleaning up the pod and spinning up a new replacement pod thus leading to a long service degradation. For example, if the pod is part of a StatefulSet, the replacement pod cannot come up due to the old pod being stuck terminating.

Troubleshooting: Instance manager pods are restarted every hour

· 2 min read
Phan Le

Applicable versions

v1.0.1 or newer

Background

Each Longhorn volume has one engine and one or more replicas (see more detail about Longhorn architecture at here). When a Longhorn volume is attached, Longhorn launches a process for each engine/replica object. The engine process will be launched inside engine instance manager pods (the instance-manager-e-xxxxxxxx pods inside longhorn-system namespace). The replica process will be launched inside replica instance manager pods (the instance-manager-r-xxxxxxxx pods inside longhorn-system namespace).

Troubleshooting: Upgrading volume engine is stuck in deadlock

· 3 min read
Phan Le

Applicable versions

This happens when users upgrade Longhorn from version ≤ v1.1.1 to a newer version.

Symptoms

Upgrading Longhorn system includes 2 steps: first upgrade Longhorn manager to the latest version, then upgrade the Longhorn engine to the latest version using the latest Longhorn manager. When doing the second step (upgrading Longhorn engine), you may hit the problem that some volumes are stuck in engine upgrading. You may also see that volume attachment/detachment cannot finish (e.g., Longhorn volumes are stuck in detaching or attaching state).

Tip: Set Longhorn To Only Use Storage On A Specific Set Of Nodes

· 2 min read
Phan Le

Applicable versions

All Longhorn versions.

Background

Let's say you have a cluster of 5 nodes (node-1, node-2, ..., node-5). You have some fast disks on node-1, node-2, and node-3 so you want Longhorn to use storage on those nodes only. There are a few ways to do this as below.