DO380 - ch02s05

Bookmark this page

Backup and Restore with OADP

Objectives

Configure one-time and scheduled backups with OADP and restore from them.

OADP Custom Resources

Backup and restore are initiated by creating the corresponding Kubernetes resources in the openshift-adp namespace. Administrative access to the openshift-adp namespace, or to the cluster-admin role, is required to create those resources.

OADP provides the following custom resources for backup and restore:

Backup

The backup resource initiates a single backup attempt. This resource defines the namespaces and resources to include in the backup, and can also include a list of commands to run before or after the backup.

The backup resource definition and the backup information, such as backup logs and the list of included backup resources, are stored in the object storage with the backup.

OADP synchronizes backup definitions between the object storage and the OpenShift cluster to enable restoring backups to a different cluster with the same backup storage location.

If a backup resource exists in the OpenShift cluster but is deleted from the object storage, then OADP deletes the Kubernetes resource. Conversely, if a backup exists in the object storage, but not in OpenShift, then OADP creates the matching backup resource in the cluster.

Note

Only backups with a Completed state are synchronized. The object storage synchronization does not automatically create or remove backup resources with a Failed or PartiallyFailed state.

Restore

The restore resource starts restoring an existing backup resource. The restore result, the list of restored resources, and the restore logs are stored in the object storage.

Schedule

The schedule resource starts a backup on a given schedule that is written in Cron format. A schedule resource is similar to a cron job resource. A schedule defines a backup template to create a backup resource at a recurring interval.

Backing up and Restoring an OpenShift Application

OADP backs up OpenShift applications by using the following process:

	The administrator creates a backup resource in the `openshift-adp` namespace, which triggers the backup process.
	Velero exports all Kubernetes resources from the application namespace to the backup storage location.
	The Velero CSI plug-in creates a CSI snapshot of the application volume.
	The Velero VSM plug-in clones the `volumeSnapshotContent` and `volumeSnapshot` resources to the `openshift-adp` namespace.
	The Velero VSM plug-in creates a PVC from the volume snapshot.
	The OADP Data Mover transfers the volume data to the backup storage location.
	After the backup completes, the OADP Data Mover deletes all volume snapshots and the PVCs that were created during the backup process.

For restoring backups, OADP uses the following process:

	The administrator creates a restore resource in the `openshift-adp` namespace, which triggers the restore process.
	Velero imports the Kubernetes resources from the backup storage location to the application namespace.
	The OADP Data Mover creates a temporary PVC, and transfers the exported data from the backup storage location to the new volume.
	The OADP Data Mover creates a volume snapshot of the PVC.
	The Velero VSM plug-in clones the `volumeSnapshotContent` and `volumeSnapshot` resources to the application namespace.
	The Velero CSI plug-in creates the application volume from the volume snapshot.
	After the restore completes, the OADP Data Mover deletes the volume snapshot and the PVC that were created in the `openshift-adp` namespace during the restore process.

Note

The volumeSnapshotContent and volumeSnapshot resources that OADP Data Mover creates in the application namespace are not automatically deleted, and must be manually deleted after the restore.

Backing up and Restoring a Stateless Application

You can start backing up a namespace by using the following backup definition:

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: my-app-backup 
  namespace: openshift-adp
spec:
  includedNamespaces: 
  - my-app-project
  ttl: 720h0m0s  
  labelSelector:
    matchLabels: 
      app: my-app
  includedResources: 
  - deployments
  - configmaps
  - secrets
  - services
  - routes

	Name of the backup
	List of namespaces to back up
	Amount of time before the automatic deletion of the backup. If the `ttl` value is not specified, then the default is 30 days.
	Labels that are required for the resources to be included in the backup
	List of resource types to back up

The following example shows a stateless application, followed by the needed definition to back it up in the website namespace. The application is a static website that is built with Hugo and that uses a Source-to-Image (S2I) container image. This application uses the following resources that are created with the app=hugo label:

An image stream
A build configuration
A deployment
A service
A route

You can back up this application with the following backup definition:

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: website
  namespace: openshift-adp
spec:
  includedNamespaces:
  - website
  labelSelector:
    matchLabels:
      app: hugo
  includedResources:
  - imagestreams
  - buildconfigs
  - deployments
  - services
  - routes

You can review the configuration and the backup status with the oc describe command:

[user@host ~]$ oc -n openshift-adp describe backup/website
Name:         website
Namespace:    openshift-adp
...output omitted...
Spec:
  Csi Snapshot Timeout:          10m0s
  Default Volumes To Fs Backup:  false
  Included Namespaces:
    website
  Included Resources:
    imagestreams
    buildconfigs
    deployments
    services
    routes
  Item Operation Timeout:  1h0m0s
  Label Selector:
    Match Labels:
      App:           hugo
  Storage Location:  oadp-config-1
  Ttl:               720h0m0s
Status:
  Completion Timestamp:  2023-12-11T14:01:42Z 
  Expiration:            2024-01-10T14:01:27Z 
  Format Version:        1.1.0
  Phase:                 Completed 
  Start Timestamp:       2023-12-11T14:01:27Z 
  Version:               1

	Backup completion time.
	Backup expiration date, which is set with the `ttl` setting. The backup is automatically removed from both the OpenShift cluster and the object storage after this date.
	Status of the backup.
	Backup start time.

The backup resource can have the following statuses during its lifetime:

New

Initial status when a backup is created. If the backup definition is incorrect, then the backup is aborted and its status changes to FailedValidation. You can review the validationErrors status field for more information about the error.

If the backup definition is valid, then the status changes to InProgress.

InProgress

Status when the backup is in progress. During this phase, OADP backs up the resources that are specified in the backup definition and runs the backup hooks.

When OADP uses additional plug-ins, such as OADP Data Mover, the backup enters the WaitingForPluginOperations status. After the plug-in processes are complete, the backup enters the Finalizing status where OADP saves all the remaining backup items, such as backup logs and metadata, to the object storage.

If backing up some resources fails, then the status changes in turn to WaitingForPluginOperationsPartiallyFailed, FinalizingPartiallyFailed, and then PartiallyFailed.

PartiallyFailed and Failed

The final status of a backup, with some missing resources because of a backup failure. A partially failed backup can still restore a project, but incompletely. A backup with the Failed status cannot be restored.

You can review the failureReason status field for more information about the error.

Completed

Final status of a successfully completed backup. All the data is in the object storage and the backup is ready to use in a restore.

You can initiate a restore by using the following restore definition:

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: <restore-name> 
  namespace: openshift-adp
spec:
  backupName: <backup-name>

	Name of the restore resource
	Name of the backup resource to restore

OADP restores only the resources that are not already in the destination namespace. The destination namespace is the same as in the backup, unless you use the namespaceMapping field to specify a different namespace.

For example, you can use the following restore definition to restore the website project from the previous example to a new website-stage project:

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: website-stage
  namespace: openshift-adp
spec:
  backupName: website
  namespaceMapping:
    website: website-stage

After creating the restore resource, you can review the configuration and monitor the progress of the restore with the oc describe command:

[user@host ~]$ oc -n openshift-adp describe restore/website-stage
Name:         website-stage
Namespace:    openshift-adp
...output omitted...
Spec:
  Backup Name:  website
  Excluded Resources: 
    nodes
    events
    events.events.k8s.io
    backups.velero.io
    restores.velero.io
    resticrepositories.velero.io
    csinodes.storage.k8s.io
    volumeattachments.storage.k8s.io
    backuprepositories.velero.io
  Item Operation Timeout:  1h0m0s
  Namespace Mapping:
    Website:  website-stage
Status:
  Completion Timestamp:  2023-12-12T09:36:23Z 
  Phase:                 Completed 
  Progress: 
    Items Restored:  7
    Total Items:     7
  Start Timestamp:   2023-12-12T09:36:14Z

	OADP automatically excludes some resources from the restore, such as Kubernetes events and OADP custom resources. You can modify this list with the `excludedResources` field in the restore definition.
	Restore completion time.
	Status of the restore.
	The restore progress, with the number of restored items so far and the total items to restore.
	Restore start time.

The possible statuses of the restore are the same as for the backup resource. A successful restore goes through the following statuses during its lifetime:

New
InProgress
WaitingForPluginOperations
Completed

During the restore, OADP adds two labels to the imported resources: A velero.io/restore-name label with the restore name, and a velero.io/backup-name with the backup name that the resource came from. You can use those labels to identify the resources that were restored from a specific backup or restore name.

For example, you can list the resources with the velero.io/restore-name=website-stage label that are created by the restore from the previous example:

[user@host ~]$ oc -n website-stage get all \
  -l velero.io/restore-name=website-stage
NAME           TYPE        CLUSTER-IP      EXTERNAL-IP  PORT(S)             AGE
service/hugo   ClusterIP   172.30.61.176   <none>       8080/TCP,8443/TCP   65m

NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/hugo   1/1     1            1           65m

NAME                                  TYPE     FROM   LATEST
buildconfig.build.openshift.io/hugo   Source   Git    0

NAME                            HOST/PORT                                  ...
route.route.openshift.io/hugo   hugo-website-stage.apps.ocp4.example.com   ...

OADP stores in the object storage detailed information about each backup and restore attempt, such as logs, the list of resources that are backed up or restored, and a summary of the errors or warnings that occur during the backup and restore process. This information is not stored in the OADP Kubernetes custom resources, and only limited information such as the runtime status of the backup and restore is available.

The following excerpt shows the object storage layout after the backup and restore from the previous examples:

s3
├── docker 
│   └── registry
│       └── v2
│           ├── blobs
│           └── repositories
│               └── website
│                   └── hugo
└── oadp
    ├── backups 
    │   └── website
    │       ├── website-logs.gz
    │       ├── website-results.gz
    │       └── website.tar.gz
    └── restores 
        └── website-stage
            ├── restore-website-stage-logs.gz
            └── restore-website-stage-results.gz

	The `/docker/registry/v2` path contains a Docker registry with the container images that are included in all backups.
	The `/oadp/backups` path contains all the information about each backup, including the backed-up Kubernetes resources and the backup logs. Each subdirectory is unique to a single backup attempt and relates to the matching `backup` resource in the cluster.
	The `/oadp/restores` path contains all the information about each restore, including the restore logs. Each subdirectory is unique to a single restore attempt and relates to the matching `restore` resource in the cluster.

Introducing the Velero Tool

OADP provides the velero command-line tool that can retrieve backup and restore information from both the object storage and the OpenShift cluster.

The Velero CLI tool is available from the velero deployment in the openshift-adp namespace. You can define an alias to access the velero binary by using the following command:

[user@host ~]$ alias velero=\
  'oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'

Note

The remainder of this section uses the velero alias on the command line to refer to the Velero CLI tool.

The velero command can use the same syntax as the kubectl or oc commands, but is limited to OADP Kubernetes custom resources such as the backup, restore, and schedule resources.

The oc get command displays limited information about OADP resources. However, the velero get command provides the same runtime information from the Kubernetes resources as the oc describe command:

[user@host ~]$ oc -n openshift-adp get backup,restore
NAME                             AGE
backup.velero.io/website         24h

NAME                              AGE
restore.velero.io/website-stage   4h27m


[user@host ~]$ velero get backup
NAME     STATUS     ERRORS WARNINGS CREATED  EXPIRES STORAGE LOCATION  SELECTOR
website  Completed  0      0        ...      29d     oadp-config-1     app=hugo

[user@host ~]$ velero get restore
NAME           BACKUP  STATUS     STARTED  COMPLETED  ERRORS  WARNINGS  ...
website-stage  website Completed  ...      ...        0       0         ...

The Velero tool provides a describe command that is similar to the oc describe command, and adds a --details option that retrieves additional information about the resource from the object storage:

[user@host ~]$ velero describe backup website --details
Name:         website
Namespace:    openshift-adp
...output omitted...

Phase:  Completed 

Namespaces: 
  Included:  website
  Excluded:  <none>

Resources:
  Included:        imagestreams, buildconfigs, deployments, services, routes
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  app=hugo

...output omitted...

Started:    2023-12-11 14:01:27 +0000 UTC
Completed:  2023-12-11 14:01:42 +0000 UTC

Expiration:  2024-01-10 14:01:27 +0000 UTC

Resource List: 
  apps/v1/Deployment:
    - website/hugo
  build.openshift.io/v1/BuildConfig:
    - website/hugo
  image.openshift.io/v1/ImageStream:
    - website/hugo
    - website/nginx-122
  route.openshift.io/v1/Route:
    - website/hugo
  v1/Service:
    - website/hugo

Velero-Native Snapshots: <none included>

	Status of the backup.
	Definition of the namespaces and resources to back up.
	Resources that are included in the backup. The `--details` option adds this information, which comes from the object storage.

If the backup or restore resource has errors or warnings, then the --details option adds them to the command output.

In the following example, the website project is backed up without filtering by resource type. OADP backs up all resources with the app=hugo label in the website namespace.

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: website-label
  namespace: openshift-adp
spec:
  includedNamespaces:
  - website
  labelSelector:
    matchLabels:
      app: hugo

You can use the velero create restore command to restore the previous backup in a new website-dev namespace, as follows:

[user@host ~]$ velero create restore website-dev \
  --from-backup=website-label --namespace-mappings=website:website-dev
Restore request "website-dev" submitted successfully.
Run `velero restore describe website-dev` or `velero restore logs website-dev` for more details.

The restore completes with one warning about an existing resource in the target namespace:

[user@host ~]$ velero describe restore website-dev --details
Name:         website-dev
Namespace:    openshift-adp
...output omitted...

Phase:                       Completed
Total items to be restored:  11
Items restored:              11

Started:    2023-12-12 10:21:21 +0000 UTC
Completed:  2023-12-12 10:21:27 +0000 UTC

Warnings:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    website-dev:  could not restore, Endpoints "hugo" already exists. Warning: the in-cluster version is different than the backed-up version.

Backup:  website-label

...output omitted...

Resource List:
  apps/v1/Deployment:
    - website-dev/hugo(created)
  build.openshift.io/v1/Build:
    - website-dev/hugo-1(skipped)
  build.openshift.io/v1/BuildConfig:
    - website-dev/hugo(created)
  discovery.k8s.io/v1/EndpointSlice:
    - website-dev/hugo-5hq2x(created)
  image.openshift.io/v1/ImageStream:
    - website-dev/hugo(skipped)
  image.openshift.io/v1/ImageStreamTag:
    - website-dev/hugo:latest(skipped)
  image.openshift.io/v1/ImageTag:
    - website-dev/hugo:latest(skipped)
  route.openshift.io/v1/Route:
    - website-dev/hugo(created)
  v1/Endpoints:
    - website-dev/hugo(failed)
  v1/Namespace:
    - website-dev(created)
  v1/Service:
    - website-dev/hugo(created)

In this example, OADP tries unsuccessfully to restore the hugo endpoint. The hugo service in the backup automatically creates this resource.

Avoid including resources that other resources manage, such as builds, endpoints, or replica sets. It is unnecessary and can cause issues during the restore. You must filter to include in the backup only those resources that your application requires for a successful deployment.

Backing up a Stateful Application with Backup Hooks

OADP creates crash-consistent backups of your application by using volume snapshots. To back up persistent volumes, you must include the persistentvolumeclaims and persistentvolumes resource types in the backup definition.

OpenShift assigns to each namespace a unique UID and GID that the application pod uses to write data to persistent volumes. If you restore an application to a new namespace, then OpenShift assigns a new set of UIDs and GIDs that prevent the application from accessing its data.

For more details about user and group ID assignments, refer to the DO180: Red Hat OpenShift Administration I: Operating a Production Cluster training course.

So that OADP can restore the UID and GID, include the namespace resource type in the backup definition. If you use a label selector in the backup definition, then you must add the corresponding label to the namespace.

Alternatively, you can use the kubernetes.io/metadata.name label that Kubernetes automatically sets on all namespaces.

To improve the consistency of a backup, you can use backup hooks to specify a list of commands to execute in the application pod before and after the backup is created. You can then use those hooks to quiesce the application and perform an application-consistent backup.

Some applications require additional steps when using volume snapshots to create a usable backup. For more details about the backup procedure, refer to the application documentation.

To configure backup hooks, you must specify the target pod (by using labels), the container name, and the commands to run on that container.

You can configure the following hook types according to your needs:

Pre backup hooks

A pre backup hook is executed before any other backup action on the pod. If the command fails, then the backup stops immediately with the Failed status.

As an example, you can use this type of hook to quiesce and prepare the application for backup.

Post backup hooks

A post backup hook is executed after the backup of the pod and its attached volumes. If the command fails, then the backup stops immediately with the PartiallyFailed status.

As an example, you can use this type of hook to resume or unlock the application after the backup is complete.

Init restore hooks

An init restore hook is executed after the pod and its attached volumes are restored, but before any container on that pod starts. The init restore hook defines one or more init containers that follow the same specification as the init container in a pod definition.

OADP does not monitor the status of the init container. Therefore, if the command fails, then the restore continues without any error or warning, but the application pod is in error with the Init:Error status.

As an example, you can use this type of hook to restore a database that the application requires and that is external to the OpenShift cluster.

Post restore hooks

A post restore hook is executed when the application pod is restored and running. If the command fails, then the error is logged and the restore continues.

As an example, you can use this type of hook to run an integrity check on the restored database.

The following example is a backup definition for a MongoDB database. It uses backup hooks to flush all pending writes to the disk and to lock the database to prevent any writes during the backup.

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: mongodb
  namespace: openshift-adp
spec:
  includedNamespaces:
  - mongodb
  orLabelSelectors: 
  - matchLabels:
      app: mongodb
  - matchLabels:
      kubernetes.io/metadata.name: mongodb
  includedResources: 
  - deployments
  - services
  - secret
  - pvc
  - pv
  - pods
  - namespace
  hooks:
    resources:
    - name: mongodb-lock
      labelSelector: 
        matchLabels:
          app: mongodb
      pre: 
      - exec:
          container: mongodb
          command:
          - /usr/bin/mongosh
          - --eval
          - db.fsyncLock();
      post: 
      - exec:
          container: mongodb
          command:
          - /usr/bin/mongosh
          - --eval
          - db.fsyncUnlock();

	Resources with the `app: mongodb` or `kubernetes.io/metadata.name: mongodb` label are included in the backup.
	The `pvc` and `pv` resource types must be specified in the `includedResources` key to back up the application volume. The `pods` resource type must also be specified for the backup hooks to be executed. The `namespace` resource type must be specified to preserve the UID and GID that are used in the application volume.
	The hook runs on all pods that match the label selector.
	The `pre` backup hook executes the `db.fsyncLock()` MongoDB command in the `mongodb` container to lock the database before the volume snapshot.
	The `post` backup hook executes the `db.fsyncUnlock()` MongoDB command to unlock the database after the backup is completed.

Important

Because backup and restore hooks are executed only on pods that are included in the backup, you must include the pod resource type in the backup.

If the hooks are configured to run on a pod that is not included in the backup, then the hooks are ignored without any error or warning.

The following restore resource definition restores the MongoDB backup from the previous example:

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: mongodb
  namespace: openshift-adp
spec:
  backupName: mongodb
  hooks:
    resources:
    - name: mongodb-unlock
      labelSelector:
        matchLabels:
          app: mongodb
      postHooks:
      - init: 
          initContainers:
          - name: remove-lock
            image: mongodb/mongodb-community-server:latest
            volumeMounts:
            - name: mongodb-data
              mountPath: /data/db
            command:
            - /usr/bin/rm
            - /data/db/mongod.lock

The init restore hook removes the database lock from the backup before the database starts.

OADP Data Mover stores the volume snapshots in the object storage in the /openshift-adp/<backup_name> path.

The following excerpt shows the object storage layout after the previous backup and restore from examples:

s3
├── oadp
│   ├── backups
│   │   └── mongodb
│   │       ├── mongodb-logs.gz
│   │       └── ...
│   └── restores
│       └── mongodb
│           ├── restore-mongodb-logs.gz
│           └── ...
└── openshift-adp 
    └── mongodb
        └── snapcontent-1234-6789-pvc
            └── ...

The /openshift-adp path contains the data from volume backups with OADP Data Mover. Each subdirectory is unique to a single backup attempt and relates to the matching backup resource in the cluster.

When OADP restores a volume, OADP Data Mover imports the volume snapshot to the application namespace and then creates the application PVC from it. After the restore is complete, you can remove this volume snapshot, which is no longer needed, to free up storage space on the storage back end.

Important

Too many volume snapshots can strain your storage capacity. It is important to purge unused snapshots routinely.

You can identify the volume snapshots and snapshot contents that belong to a restore with the velero.io/restore-name label. For example, to list the volume snapshots that are created with the mongodb restore from the previous example, use the following command:

[user@host ~]$ oc get VolumeSnapshotContent,VolumeSnapshot -A \
  -l velero.io/restore-name=mongodb
NAME
volumesnapshotcontent.snapshot.storage.k8s.io/velero-velero-mongodb-...

NAMESPACE   NAME
mongodb     volumesnapshot.snapshot.storage.k8s.io/velero-mongodb-...

You can then delete those snapshot resources with the oc delete command, as follows:

[user@host ~]$ oc delete -l velero.io/restore-name=mongodb \
  VolumeSnapshotContent,VolumeSnapshot -A
volumesnapshotcontent.snapshot.storage.k8s.io "velero-velero-mongodb-..." deleted
volumesnapshot.snapshot.storage.k8s.io "velero-mongodb-..." deleted

Scheduling a Recurring Backup

You can back up an application at a recurring interval by using the schedule resource. A schedule resource is similar to a cron job resource. A schedule resource requires a schedule in Cron format and a definition template of the backup resource to create, at the specified time.

The following example is a schedule definition of a daily backup for the previous static website example:

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: website-daily
  namespace: openshift-adp
  labels: 
    app: hugo
spec:
  schedule: 0 7 * * * 
  paused: false 
  template: 
    includedNamespaces:
    - website
    labelSelector:
      matchLabels:
        app: hugo
    includedResources:
    - imagestreams
    - buildconfigs
    - deployments
    - services
    - routes
    ttl: 720h0m0s

	The labels that you set on the schedule resource are automatically copied to the backup resources that the schedule creates.
	Specifies the schedule for the job in Cron format.
	You can disable the schedule by setting the `paused` parameter to `true`.
	Sets the backup definition template by using the same settings as in a backup resource.

See the references section for more information about the schedule API definition.

To get detailed information about a schedule, use the velero get schedule command:

[user@host ~]$ velero get schedule
NAME          STATUS  CREATED SCHEDULE  BACKUP TTL LAST BACKUP  SELECTOR PAUSED
website-daily Enabled ...     0 7 * * * 720h0m0s   ...          app=hugo false

The status of an activated schedule is Enabled. If the schedule is disabled with the paused parameter, then its status is New. The LAST BACKUP column shows the time of the latest backup that the schedule created.

With the Velero tool, you can create a one-time backup by using the same definition as in an existing schedule. To back up an application on demand, you can create a disabled schedule as a backup template. You can then start a new backup with this schedule when you need it.

In this example, you are creating a pre-upgrade-1.1 backup from the website-daily schedule:

[user@host ~]$ velero create backup pre-upgrade-1.1 \
  --from-schedule website-daily
Creating backup from schedule, all other filters are ignored.
Backup request "pre-upgrade-1.1" submitted successfully.
...output omitted...

Backups that are created from a schedule inherit the schedule's labels. In addition, the velero.io/schedule-name label is set on the backup resources with the schedule name. You can use those labels to identify the schedules and backups for your application on the openshift-adp namespace:

[user@host ~]$ velero get backup -l app=hugo
NAME                          STATUS      ERRORS WARNINGS ...
post-upgrade-1.1              Finalizing  0      0        ...
pre-upgrade-1.1               Completed   0      0        ...
website-daily-20231215100856  Completed   0      0        ...
website-daily-20231215091728  Completed   0      0        ...

Because the openshift-adp namespace contains the backup and restore resources for all applications that are running on the cluster, it is important to use labels to identify OADP resources that relate to your application.

Cleaning Backups

Because OADP synchronizes backup definitions with the object storage, OADP automatically re-creates any backup that you delete with the oc command. To permanently delete a backup, you must delete it from the object storage.

Use the velero command to delete backup and restore information from the object storage, and all the associated resources from the cluster:

[user@host ~]$ velero delete backup backup-name
Are you sure you want to continue (Y/N)? y
Request to delete backup "backup-name" submitted successfully.
The backup will be fully deleted after all associated data (disk snapshots, backup files, restores) are removed.

The backup status changes to Deleting, and OADP removes all restore resources that are attached to this backup from both the OpenShift cluster and the object storage. Then the backup itself is removed from the cluster and the object storage.

You can instruct OADP to delete a backup automatically after a specified elapsed time by using the TTL (Time To Live) setting on the backup and schedule definition. By default, OADP deletes backups after 30 days. The minimum lifetime of a backup is 1 hour.

OADP does not delete the volume snapshots in the object storage that OADP Data Mover creates. You must delete those backups manually.

You can use the s3cmd command to remove the snapshot from the object storage:

[user@host ~]$ s3cmd rm -r s3://backup-bucket/openshift-adp/backup-name
delete: 's3://backup-bucket/openshift-adp/backup-name/snapcontent-...'
...output omitted...

Backing up Volumes with File System Backup

With the File System Backup feature with Restic, you can back up volumes that are not compatible with snapshots.

OADP uses the following process to back up OpenShift applications with File System Backup:

	The administrator creates a backup resource in the `openshift-adp` namespace that triggers the backup process.
	Velero exports all Kubernetes resources from the application namespace to the backup storage location.
	The `node-agent` daemon set, which runs on the same node as the application pod, exports the volume data from the volume mount point on the cluster node to the backup storage location.

Important

OADP copies the data from the application file system while the application is still running. You must use backup hooks to ensure that the application data is not altered during the backup process.

The File System Backup restore process is as follows:

	The administrator creates a restore resource in the `openshift-adp` namespace, which triggers the restore process.
	Velero imports the Kubernetes resources from the backup storage location to the application namespace.
	The `node-agent` daemon set, which runs on the same node as the application pod, imports the application data from the backup storage location to the mount point for the application volume on the cluster node.

You can instruct OADP to back up all volumes in a backup definition with the File System Backup feature by using the defaultVolumesToFsBackup option:

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: <backup-name>
  namespace: openshift-adp
spec:
  defaultVolumesToFsBackup: true

You can also annotate the application pod to specify which volumes to back up with the File System Backup feature by using the backup.velero.io/backup-volumes annotation. If your application uses multiple volumes from different storage classes, then the volumes in the annotation are backed up with Restic, and all other volumes are backed up with volume snapshots.

In the following example, the application to back up is an Nginx web server that serves a website from an Amazon Elastic File System (AWS EFS). Because the AWS EFS CSI driver does not support volume snapshots, you must back up the volume with Restic.

To enable the File System Backup feature for the wwwdata volume, you must set the backup.velero.io/backup-volumes annotation in the deployment, as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: website-nginx
  namespace: org-website
spec:
  template:
    metadata:
      annotations:
        backup.velero.io/backup-volumes: wwwdata 
    spec:
      containers:
        name: web
        image: registry.access.redhat.com/ubi9/nginx-120
        ...output omitted...
        volumeMounts:
        - mountPath: /opt/app-root/src
          name: wwwdata
      ...output omitted...
      volumes:
      - name: wwwdata
        persistentVolumeClaim:
          claimName: nginx-wwwdata
...output omitted...

List of volumes to back up with Restic. You must use the same volume name that is defined in the pod definition.

Note

You can specify multiple volumes with the backup.velero.io/backup-volumes annotation, with a comma to separate each volume. For example, backup.velero.io/backup-volumes: volume1,volume2,volume3

Important

Only volumes that are compatible with volume snapshots are included in the backup by default. You must enable the File System Backup feature to include volumes in your backup that are not compatible with volume snapshots.

OADP stores the volume backup in the object storage inside a Restic repository. A Restic repository is a directory structure that Restic creates to store volume backups in an encrypted format. OADP uses a dedicated Restic backup repository for each namespace to store all volume backups for that namespace.

When the first backup of a namespace is created, OADP initializes a new Restic repository in the object storage, and creates a matching BackupRepository resource in the openshift-adp namespace.

A BackupRepository resource is a custom resource that OADP uses to store information about the Restic repository such as the repository encryption keys and object storage information.

The following excerpt shows the object storage layout that includes a backup of the Nginx website from the previous example:

s3
├── oadp
│   ├── backups
│   │   ├── mongodb
│   │   ├── org-website-backup 
│   │   └── ...
│   ├── restores
│   │   ├── mongodb
│   │   └── ...
│   └── restic 
│       ├── org-website
│       ├── other-app
│       └── ...
└── openshift-adp
    ├── mongodb
    └── ...

	The `org-website-backup` is the backup of the Nginx application from the `org-website` namespace.
	The `/oadp/restic` path contains the Restic backup repositories with data from volume backups with File System Backup. Each subdirectory is a unique Restic repository for a single namespace in the cluster. The `org-website` repository contains the volume backups for all backups of the `org-website` namespace, including the `org-website-backup` backup.

If you delete a backup, then OADP removes the volume backup from the Restic repository. OADP never removes the Restic repository, even if all the backups for a namespace are removed.

Important

If you delete a Restic backup repository in the /oadp/restic path, then you must also delete the associated BackupRepository resource on the OpenShift cluster. If the backup repositories are left in a inconsistent state, then backups will fail.

Troubleshooting Backups and Restores

If a backup or restore fails, you can get more information about the failure with the velero command:

[user@host ~]$ velero get backup mybackup
NAME        STATUS            ERRORS   WARNINGS   ...
mybackup    PartiallyFailed   1        0          ...

Use the velero describe command for more details about the errors and warnings:

[user@host ~]$ velero describe backup mybackup
Name:         mybackup
Namespace:    openshift-adp
Labels:       velero.io/storage-location=oadp-backup-1
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.25.7+eab9cc9
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=25

Phase:  PartiallyFailed (run `velero backup logs mybackup` for more information)


Errors:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    database:   resource: /pods name: /mariadb-757c5bdc88-mrwhb error: /command terminated with exit code 1
...output omitted...

You can view the backup logs with the velero backup logs command:

[user@host ~]$ velero backup logs mybackup
...output omitted...

Because the logs are stored in the object storage, you can also download them with the s3cmd command, and view them with any text editor.

The logs are stored in the s3://<bucket-name>/oadp/backups/<backup-name>/<backup-name>-logs.gz path for backups and in the s3://<bucket-name>/oadp/restores/<restore-name>/<restore-name>-logs.gz path for restores.

You can use the following command to download the log file to the current directory with s3cmd:

[user@host ~]$ s3cmd get \
  s3://backup-bucket/oadp/backups/mybackup/mybackup-logs.gz
download: 's3://backup-bucket/oadp/backups/mybackup/mybackup-logs.gz' ...
 7454 of 7454   100% in    0s    96.14 KB/s  done

For a backup that uses hooks, you can search the log with the hookPhase keyword to review the status of the hooks. The standard and error outputs of the hook command are recorded in the stdout and stderr lines, respectively.

[user@host ~]$ velero backup logs mybackup | grep hookPhase
... msg="running exec hook" hookCommand="[/bin/bash -c mariadb -u root -e \"set global read_only=1;flush tables\"]" ... hookPhase=pre

... msg="stdout: " ... hookPhase=pre

... msg="stderr: ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)\n" ... hookPhase=pre

... msg="Error executing hook" error="command terminated with exit code 1" ... hookPhase=pre

In the previous example, the backup hook was misconfigured and is missing the password to connect to the database.

References

Backup API Definition

Restore API Definition

Schedule API Definition

For more information about backing up applications with OADP, refer to the OADP Backing Up section in the OADP Application Backup and Restore chapter in the Red Hat OpenShift Container Platform 4.14 Backup and Restore documentation at https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html-single/backup_and_restore/index#oadp-backing-up

For more information about troubleshooting OADP, refer to the Troubleshooting section in the OADP Application Backup and Restore chapter in the Red Hat OpenShift Container Platform 4.14 Backup and Restore documentation at https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html-single/backup_and_restore/index#troubleshooting

Discuss Red Hat OpenShift Administration III: Scaling Deployments in the Enterprise

Go to community

Welcome to Red Hat OpenShift Administration III: Scaling Kubernetes Deployments in the Enterprise!

Syed

12 wrz 2023

We are excited to launch a space dedicated to the Red Hat Training course Red Hat OpenShift Administration III: Scaling Kubernetes Deployments in the Enterprise! To gain the most value from this group - click the "Join Group" button in the upper right hand corner of the group home page.We encourage group members to collaborate in this group to discuss topics, ask questions, share best practices and tips, provide course feedback, and share their accomplishments as it relates to DO378.Read more about Red Hat OpenShift Administration III: Scaling Kubernetes Deployments in the Enterprise here.

Revision: do380-4.14-397a507