Migrate Existing Huawei DCS Clusters to Pool-Managed Persistent Disks

Use this guide when you upgrade an existing Huawei DCS cluster from the older template-disk layout to the current pool-managed persistent-disk model.

In DCS provider v1.0.16 or later, this migration is YAML-driven because DCSIpHostnamePool.spec.pool[].persistentDisk is not exposed in the web UI.

INFO

Version

Use this procedure when the cluster runs ACP v4.2.1 or later and the target DCS provider version is v1.0.16 or later.

This procedure currently assumes all of the following:

  • The target environment uses the DCS controller implementation that supports pool-managed persistent disks.
  • The DCS VM templates are 4.2.1 or later.
  • Guest tools (vmtools) are working inside the guest OS so safe shutdown and disk detach can complete.

Overview

Older DCS clusters created reusable data disks through DCSMachineTemplate. That layout does not give the controller enough information to preserve disks safely during delete-recreate replacement.

The current model moves upgrade-preserved disks into DCSIpHostnamePool.spec.pool[].persistentDisk. Each disk is bound to an (ip, slot) identity. During rolling replacement, the controller:

  1. Claims the existing disk from the old VM.
  2. Safely stops the old VM.
  3. Detaches the disk.
  4. Converts stock volumes to independent shared volumes when needed.
  5. Deletes the old VM.
  6. Reattaches the disk to the replacement VM.
  7. Boots the replacement VM, which mounts the existing filesystem without reformatting it.

This is also the documented model for the platform-required /var/cpaas disk.

Before You Start

Verify all of the following before you begin:

  • The cluster is healthy and currently stable.
  • Because pool-managed persistent disks require one-by-one replacement, the relevant control plane and worker rollout strategies use maxSurge: 0.
  • You can identify the current disk sequenceNum values on the old VMs from the DCS UI or by querying VM details through the DCS API.
  • You know which disks must be preserved and which disks can still be recreated with the VM.
  • The target DCSIpHostnamePool already exists and maps each node to a fixed IP slot.

Inspect the Current Disk Layout

First, identify the management-cluster objects and the DCS VM that backs each node:

kubectl get kubeadmcontrolplane -n cpaas-system
kubectl get machinedeployment -n cpaas-system
kubectl get machine -n cpaas-system
kubectl get dcsmachine -n cpaas-system
kubectl get dcsiphostnamepool -n cpaas-system

For any DCSMachine you plan to migrate, inspect the current VM details and record the disk sequenceNum, size, datastore, and PCI type for each disk you want to preserve.

You can gather that information from:

  • The DCS platform UI.
  • Your existing operational tooling that wraps QueryVmInfo.
  • Direct API inspection if your environment already exposes that workflow.

You need the following values for each preserved disk:

  • Old sequenceNum
  • quantityGB
  • datastoreName or datastoreClusterName
  • path
  • format
  • pciType

Determine Which Disks Are Claimable

Existing clusters can only claim disks that sit in the tail-contiguous region of the old VM disk layout.

Use the following formula:

slot = oldSequenceNum - systemDiskCount - newTemplateDataDiskCount - 1

Use these constants when you apply the formula:

  • systemDiskCount = 1
  • newTemplateDataDiskCount = the number of non-system disks that remain in the new DCSMachineTemplate

The computed slot must:

  • Be greater than or equal to 0
  • Be unique within the same IP entry

If a disk is not in the tail-contiguous region, you must either:

  • Move the disks between it and the old template tail into the pool-managed persistent-disk list as well, or
  • Accept that the non-claimable disk will still be lost with the old VM

Worked Example

Assume the old template disk order is:

Old SequenceOld Disk
1system disk
2/var/lib/kubelet
3/var/lib/etcd
4/var/lib/containerd
5/var/cpaas

If the new template keeps only system + /var/lib/kubelet + /var/lib/containerd, then newTemplateDataDiskCount = 2.

Disk You Want to PreserveOld sequenceNumNew Template Data Disk CountComputed slotClaimable
/var/cpaas525 - 1 - 2 - 1 = 1Yes
/var/lib/containerd and /var/cpaas4, 514 - 1 - 1 - 1 = 1, 5 - 1 - 1 - 1 = 2Yes
/var/lib/etcd only323 - 1 - 2 - 1 = -1No

Update the DCSMachineTemplate

Edit the currently referenced DCSMachineTemplate in place so it no longer declares the disks you want to preserve.

  1. Export the current template:

    kubectl get dcsmachinetemplate <template-name> -n cpaas-system -o yaml > current-template.yaml
  2. Update the exported manifest:

    • Keep the system disk.
    • Keep only the template-local disks that should still be recreated with the VM.
    • Remove all disks you want to preserve through the IP pool.
    • If a target disk is only claimable when trailing disks are moved as well, remove those trailing disks from the template too.
    • Keep the original metadata.name, because this migration updates the currently referenced template in place.
    • Remove transient metadata fields such as resourceVersion, uid, creationTimestamp, and managedFields.
  3. Apply the updated template:

    kubectl apply -f current-template.yaml -n cpaas-system

Update the DCSIpHostnamePool

Add persistentDisk entries to the matching IP slot for every preserved disk.

The spec interacts with the live disk attributes in three ways:

Strict claim match. A mismatch on any of these fields fails the claim and sets phase=Error with lastError. The controller retries on a slow loop until the spec is corrected:

  • quantityGB — must match the live disk size exactly
  • datastoreName or datastoreClusterName — must point to the same storage target as the live disk
  • pciType — must match the live disk PCI type. If omitted, the provider uses the default VIRTIO; verify the live disk PCI type before omitting this field, because a non-VIRTIO live disk can fail the strict claim match

Filesystem (affects guest-side initialization, not the claim check):

  • format — used only when initializing a fresh disk. If the live disk already has a filesystem, the existing format is preserved and mkfs is skipped.

Guest-side (applied on replacement VMs only, not part of the claim check):

  • path — mount path inside the guest
  • mountOptions — mount options
  • optionsmkfs options applied only on first format

For the platform-required /var/cpaas disk, move it into the pool-managed layout as part of this migration.

Set slot to the value you calculated in the previous section. Do not reuse a fixed example value across different disk layouts.

Example:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSIpHostnamePool
metadata:
  name: <iphostname-pool-name>
  namespace: cpaas-system
spec:
  pool:
  - ip: "<node-ip>"
    mask: "<mask>"
    gateway: "<gateway>"
    dns: "<dns>"
    hostname: "<hostname>"
    machineName: "<machine-name>"
    persistentDisk:
    - slot: <calculated-slot>
      quantityGB: 40
      datastoreClusterName: <datastore-cluster-name>
      path: /var/cpaas
      format: xfs
      pciType: VIRTIO

Apply the pool update:

kubectl apply -f <updated-pool-file>.yaml -n cpaas-system

Trigger the Rolling Upgrade

Before you trigger replacement:

  • Confirm KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate.maxSurge = 0
  • Confirm each MachineDeployment.spec.strategy.rollingUpdate.maxSurge = 0

These settings are prerequisites for the migration and for later upgrade-time reuse of pool-managed persistent disks.

Then trigger the rollout:

kubectl patch kubeadmcontrolplane <kcp-name> -n cpaas-system \
  --type='merge' \
  -p='{"spec": {"rolloutAfter": "'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}}'
kubectl patch machinedeployment <md-name> -n cpaas-system \
  --type='merge' \
  -p='{"spec": {"rolloutAfter": "'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}}'

Verify Claim, Detach, Conversion, and Reattach

Watch the management-cluster resources during the rollout:

kubectl get kubeadmcontrolplane <kcp-name> -n cpaas-system -w
kubectl get machinedeployment <md-name> -n cpaas-system -w
kubectl get machine -n cpaas-system -w

Inspect the pool status to confirm the controller has claimed and tracked the disks:

kubectl get dcsiphostnamepool <iphostname-pool-name> -n cpaas-system -o yaml

During the transition, each record appears under status.persistentDiskStatus. The stable phases to watch for are:

  • phase: Attached while the old VM still owns the disk
  • phase: Available after the disk is detached (and converted from a stock volume to an independent shared volume when needed)
  • phase: Attached again after the replacement VM reattaches the disk

Transient phases (Attaching, Detaching) may briefly appear during the corresponding operations; Deleting appears when a disk is being permanently removed, for example during pool or cluster cleanup. The full phase set is Creating, Available, Attaching, Attached, Detaching, Deleting, Error.

If a disk enters phase: Error, inspect lastError before retrying.

Limitations and Recovery Notes

  • Only tail-contiguous disks are claimable in the existing-cluster migration path.
  • The controller only protects disks that are declared in persistentDisk. Any undeclared disk still follows VM lifecycle and may be deleted with the old VM.
  • This migration changes the ownership model of preserved disks. Do not keep the same disk defined in both DCSMachineTemplate and DCSIpHostnamePool.
  • If you need to preserve /var/cpaas, move it into the IP pool as part of this migration instead of leaving it in the template.
  • This runbook applies to clusters on ACP v4.2.1 or later that are moving to DCS provider v1.0.16 or later.