-
Notifications
You must be signed in to change notification settings - Fork 382
Description
This would be a breaking change and would require a major version bump of the spec but putting here as a backlog item.
NOT_FOUND
errors in ControllerUnpublishVolume
are unrecoverable based on the current recovery behavior defined in the spec. The recovery behavior asks the caller to "verify ... the volume is accessible" which can only be verified through calling the driver. Currently the only call to find that is ListVolumes
which is an optional capability. Therefore, a MUST
to verify the volume exist after this error is actually logically impossible.
The driver is actually the one that can/should decide to return an OK
code when the node/volume is unavailable and it can interpret it as the volume has been controllerunpublished. There should be no situation where an error occurs where the driver will return NOT_FOUND
that could be recoverable by the caller - this is unless the caller got the volume_id
wrong which I would argue should also be returned as an OK
just like the specification states for DeleteVolume
.
The solution to the above problems is to amend the wording:
If the volume corresponding to the volume_id or the node corresponding to node_id cannot be found by the Plugin and the volume can be safely regarded as ControllerUnpublished from the node, the plugin **MUST** return 0 OK.
And remove the must/should recovery behavior from the two NOT_FOUND
errors in ControllerUnpublishVolume
and just cover it with a blanket retry.
The implementation of the external-attacher
for Kubernetes already does not do the recovery behaviors, and is in the process of releasing a new major version which treats NOT_FOUND
errors as real errors that must be retried.