Skip to content

Conversation

WTFKr0
Copy link

@WTFKr0 WTFKr0 commented Aug 22, 2025

This PR closes #1405

Context

Provisionning Machines on GCP Provider
With confidential compute enabled
Try to set onHostMaintenance to Migrate

Current Issue

Today the webhook refused this case, as it is forbidden by default on GCP.

But an exception exists on n2d series
See this doc for information : https://cloud.google.com/confidential-computing/confidential-vm/docs/troubleshoot-live-migration

Resolution

Add an exception on the webhhook for the n2d series VM, and accept the Migrate onHostMaintenance

Environment

This issue is present on Openshift 4.16, 4.18 and 4.19 for sure

Copy link
Contributor

openshift-ci bot commented Aug 22, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 22, 2025
Copy link
Contributor

openshift-ci bot commented Aug 22, 2025

Hi @WTFKr0. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sunzhaohua2
Copy link
Contributor

@miyadav checked with PR & confidentialCompute: Enabled for n2d

  • No onHostMaintenance field works well, instance running successfully .
  • With onHostMaintenance: Migrate works well , instance running successfully.

Both blocked by webhooks , without PR.

machineType: n2d-standard-4 
onHostMaintenance: Migrate 
confidentialCompute: Enabled

Copy link
Contributor

@bgartzi bgartzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also opening another discussion thread:

I can see in https://cloud.google.com/confidential-computing/confidential-vm/docs/troubleshoot-live-migration that live migration in AMD SEV machines is only supported with n2d machines with the AMD EPYC Milan CPU platform.

n2d machines run on EPYC Milan and EPYC Rome (deprecated according to GCP documentation). Do we need to check the CPU platform is properly set too?

We don't currently support CPU platform configuration in the GCP provider. IIUC, the closest we can get to this is by setting the Instance MinCpuPlatform field to EPYC Milan.

If we need to run this check too, we would need to add a new field in the openshift API to support it.

machineSeries := strings.Split(providerSpec.MachineType, "-")[0]
// Check on host maintenance
if providerSpec.OnHostMaintenance != machinev1beta1.TerminateHostMaintenanceType {
if providerSpec.OnHostMaintenance != machinev1beta1.TerminateHostMaintenanceType && !slices.Contains(gcpConfidentialTypeMachineSeriesSupportingOnHostMaintenanceMigrate, machineSeries) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition would let SEV-SNP n2d machines to be configured with OnHostMaintenance: Migrate. However, that is not supported by GCP.

Could you rewrite the condition so this configuration is only accepted for AMD-SEV? (i.e. providerSpec.ConfidentialCompute == Enabled OR AMDEncryptedVirtualization)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK i see
Need this test to fail

		{
			testCase: "with ConfidentialCompute AMDEncryptedVirtualizationNestedPaging and onHostMaintenance set to Migrate on n2d instances",
			modifySpec: func(p *machinev1beta1.GCPMachineProviderSpec) {
				p.ConfidentialCompute = machinev1beta1.ConfidentialComputePolicySEVSNP
				p.OnHostMaintenance = machinev1beta1.MigrateHostMaintenanceType
				p.MachineType = "n2d-standard-4"
				p.GPUs = []machinev1beta1.GCPGPUConfig{}
			},
			expectedOk: true,
		},

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK with last commit i think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GCP - ConfidentialCompute with onHostMaintenance Migrate
3 participants