Troubleshooting: Failed RWX mount due to connection timeout
Applicable versions
Confirmed with:
- K3s
v1.24.8+k3s1
and RKE2v1.24.8+rke2r1
- Longhorn
v1.3.3
- SUSE Linux Enterprise Server 15 SP5
Confirmed with:
v1.24.8+k3s1
and RKE2 v1.24.8+rke2r1
v1.3.3
All Longhorn versions, but some features are introduced in v1.4.0 or v1.5.0
Confirmed in:
Potentially mitigated in:
Complete fix planned in:
While the root cause is always the same, symptoms can vary depending on other factors (e.g. whether there are multiple
All Longhorn versions.
Kubernetes versions before v1.28
. A backported PR to v1.27
is awaiting merging.
In the event of a worker node failure, while hosting active Pods, the Pods are gracefully evicted as the node undergoes downtime and awaits restoration. During this period, the kubelet, which is responsible for managing the node, will generate the following error messages at regular intervals of two seconds.
orphaned pod <pod-uid> found, but error not a directory occurred when trying to remove the volumes dir