Bookmark this page

P 1 2 3 4 5 6 7 8 9 10 11

Chapter 10. Troubleshooting and Disaster Recovery Planning for IdM

Section Recovering IdM with Backups and Replication

SectionObjectives
SectionDisaster Recovery in IdM
SectionBacking up and Restoring Identity Management

Guided Exercise: Recovering IdM with Backups and Replication

Troubleshooting IdM

Objectives
General Services Troubleshooting
Problems with SSSD
Kerberos Problems
Debugging Sudo with SSSD

Guided Exercise: Troubleshooting IdM

Lab: Troubleshooting and Disaster Recovery Planning for IdM

Summary

Abstract

Goal	Troubleshooting and preparing for disaster recovery with Identity Management.
Objectives	Recovering from a disaster affecting an Identity Management deployment. Monitor, analyze, and troubleshoot the individual components of IdM.
Sections	Recovering IdM with Backups and Replication (and Guided Exercise) Troubleshooting IdM (and Guided Exercise)
Lab	Troubleshooting and Disaster Recovery Planning for IdM

Recovering IdM with Backups and Replication

Objectives

Recovering from a disaster affecting an Identity Management deployment.

Disaster Recovery in IdM

Because Identity Management authenticates users and services across different environments, it becomes one of the crucial parts of a corporate infrastructure. As such, you must configure backup and restoration mechanisms to ensure its continuous availability.

You can use IdM native tools to back up and restore, or you can use external tools. Depending on the scenario, different disaster recovery solutions are available. At first, recovering from a disaster might seem simple: back up data, store it in a secure location, and then use the saved data to restore your environment when something goes wrong. However, IdM is usually a multi-instance deployment, and you might have different versions or configurations after restoring a server. This is where backing up and restoration becomes a challenge.

Two types of disaster scenarios are typically encountered:

Server Loss: the IdM topology loses one, several, or all servers, and you must recover them as quickly as possible. This might be caused by a hardware malfunction.
Data Loss: the IdM topology loses data and this change is subsequently propagated to all servers. This might be caused by a user accidentally deleting data.

Recovering from Server Loss Scenarios

Red Hat recommends that you implement a resilient configuration to mitigate this type of disaster. For example, install two or three IdM servers in each data center and configure replication agreements between them. In this type of configuration, when one server is lost, you can create an IdM replica from one of the active servers and restore the size of the topology.

This recovery process might depend on the type of server that was lost. Commonly, the first IdM server installed in the topology acts as the renewal certificate server and publishes certificate revocation lists (CRLs). This role might have been allocated to another replica at any time, but the server assigned to this role is called the first IdM server.

For example, if you lose the first IdM server, then you must choose a replica with the CA role to be the new renewal server.

In this scenario, perform the following steps to recover from the server loss:

Remove all replication agreements with the lost server.
Choose an active replica with the CA role installed to be the new renewal server.
Install a new replica to replace the lost of the failed server.
Configure the appropriate replication agreements with the new replica.

Important

If only one server in the topology provides the CA service and it is unavailable, then the entire environment is lost. Red Hat strongly recommends having three or more replicas with the CA services installed, and configuring CA replication agreements between them.

When you lose an IdM server that is not the first IdM server, then you must only remove the replication agreements with the lost server and install a new replica.

In a scenario where several servers are lost at the same time, the question is whether you can rebuild the topology from what is left. If the first IdM server is still working, the procedure for recovering any other server can be followed to rebuild the environment. If the first IdM server is lost, and there is at least one replica with CA services available, follow the procedure to establish a new first IdM server.

If there are no other replicas with the CA service left in your environment, you lose the ability to rebuild the topology. This is called a total loss scenario.

Recovering from a Total Infrastructure Loss

In this scenario, all IdM servers in your environment are lost, or the remaining servers are insufficient to rebuild the topology. To recover from this scenario, you must have at least one of your replicas (with full CA) installed as a virtual machine (VM) and take a snapshot of it.

A backup copies data to a secondary location in case you need to restore it. By using VM snapshots, you keep a consistent state of the complete VM at a given point in time; data and software included. This feature makes the recovery procedures faster and less prone to failure.

You can schedule a periodical downtime for the VM to create a full snapshot of it. This process can also help you analyze the behavior of the topology when you stop a replica.

Recovering IdM Clients

Every topology change might have an impact on IdM clients. If the clients use DNS discovery, they automatically adapt to the topology changes. However, if the clients are configured to explicitly connect to specific servers, their configuration must be changed to reflect the new IdM server hostnames.

If the clients are not updated to reflect the loss of a server or the hostname change, then the client might not be able to authenticate. In this case, you should always verify and, as needed, update the /etc/sssd/sssd.conf and /etc/krb5.conf files.

For example, if a client has only one server to connect to and that server is lost, then you must update the /etc/sssd/sssd.conf file on the client. The ipa_server parameter contains a list of servers to contact when authenticating to IdM.

In some cases, even after the updates, you might still have issues connecting to IdM. It is possible that you have a stale data problem with clients. This happens when clients are working with outdated data or configuration stored in cache. In these scenarios, purge the SSSD cache on the client to remove any stale data. You can use the sss_cache -E command and then restart the sssd service to purge the cache.

Recovering from a Data Loss Scenario

A data loss scenario might be caused by a hardware failure on a server, or by an administrative accident. As soon as a data loss situation is identified, immediately take actions to stop data loss replication in the infrastructure. Isolate the affected replica servers by stopping the IdM services, disconnecting the service interface, or by powering down the machine.

If the data loss is already replicated to all servers, then use one of the following methods to recover:

Use an existing backup to restore the lost data. The restored entries are automatically propagated to other servers using the standard replication process.
If data loss made the topology completely dysfunctional, start over from the most recent functional snapshot and make sure no replication agreements exist between the affected replicas and the restored server. Then you might install new replicas to restore the designed topology.

Backing up and Restoring Identity Management

IdM provides a utility to manually back up and restore the server data. The utility creates a directory containing all the configuration information and the LDAP database. You should always rebuild any lost server by reinstalling it as a replica. Only when this is not possible should you consider restoring your IdM server from a backup.

IdM has two backup options:

Full backup: The script stops the IdM services to create a full server backup. It creates a backup copy of all IdM server files and the LDAP data. Because it is a raw file backup, it is performed offline.
Data backup: The script creates only a backup copy of the LDAP data and the change log. This back up can be performed online and offline.

The script stores the backups in the /var/lib/ipa/backup directory.

Creating a Backup

To create a full backup, use the ipa-backup command as the root user, with IdM services stopped. To create a data-only backup, use the same command with the --data option. With a data-only backup the IdM services can be stopped or running. However, you must add the --online option if you want to create a data-only backup with the IdM services running.

The backup utility fails when you try to back up a server that does not have the required roles to back up all the services in the topology. For example, in a topology that has an integrated CA service, you cannot back up a server that does have the CA role. Instead, use a server that has the globally used roles.

By default, IdM creates the backup in the /var/lib/ipa/backup/ directory and adds the date information to the file name. This example demonstrates how to perform a data-only backup online:

[root@host ~]# ipa-backup --data --online
Directory Manager (existing master) password: password
Preparing backup on replica10.example.com
Local roles match globally used roles, proceeding.
Backing up ipaca in EXAMPLE-COM to LDIF
Waiting for LDIF to finish
Backing up userRoot in EXAMPLE-COM to LDIF
Waiting for LDIF to finish
Backing up EXAMPLE-COM
Waiting for BAK to finish
Backed up to /var/lib/ipa/backup/ipa-data-2023-06-30-20-25-03
The ipa-backup command was successful

After you create the backup, copy it to a secure location.

Restoring from a Backup

Use the ipa-restore command to restore IdM from a backup. You can only restore a backup to the same host upon which the backup was originally created. Red Hat recommends that you uninstall a server before restoring a full backup on it. Uninstalling does not remove backups present on the IdM server.

You can restore all backup types by using the ipa-restore command. Provide the path to the backup file as an argument. The utility detects the backup type automatically and performs the appropriate type of restoration.

The following example shows the process for restoring from a backup:

[root@host ~]# ipa-restore /var/lib/ipa/backup/ipa-data-2023-06-30-20-25-03
Directory Manager (existing master) password: password

Preparing restore from /var/lib/ipa/backup/ipa-data-2023-06-30-20-25-03 on replica10.example.com
Performing DATA restore from DATA backup
Temporary setting umask to 022
Restoring data will overwrite existing live data. Continue to restore? [no]: yes
Each master will individually need to be re-initialized or
re-created from this one. The replication agreements on
masters running IPA 3.1 or earlier will need to be manually
re-enabled. See the man page for details.
Disabling all replication.
Disabling replication agreement on idmserver.example.com to replica10.example.com
Disabling replication agreement on idmserver.example.com to replica11.example.com
Disabling CA replication agreement on idmserver.example.com to replica10.example.com
Disabling replication agreement on replica11.example.com to idmserver.example.com
Stopping Directory Server
Restoring from userRoot in EXAMPLE-COM
Restoring from ipaca in EXAMPLE-COM
Starting Directory Server
Restoring umask to 18
The ipa-restore command was successful

Red Hat recommends rebooting the server after restoring from backup. A full restore is the default behavior, however, you can add the following options to the ipa-restore command:

Option	Description
`--data`	Restores only the data from a full server backup.
`--online`	Restores the data in online mode.
`--instance`	Specifies which 389 Directory Server instance to restore.

After a replica is offline for a long period of time or the replica is restored from a backup, you might have to restart the replication process for that replica. You can start the replication process manually with the ipa-replica-manage re-initialize command. This command restarts the replication updates for an existing replication agreement. You can also force an update on a working replication agreement with the ipa-replica-manage force-sync command.

In the following example, the replica03 machine does not have a replication agreement with the host machine, but it does with the replica05 machine:

[root@host ~]# ipa-replica-manage re-initialize --from replica03.example.com
'host.example.com' has no replication agreement for 'replica03.example.com'
[root@host ~]# ipa-replica-manage re-initialize --from=replica05.example.com
Update in progress, 3 seconds elapsed
Update succeeded

You can also force a CA data update to the local replica by using the ipa-csreplica-manage force-sync command. The replica that sends the CA update must have the CA services installed.

References

For more information, refer to the Preparing for Disaster Recovery with Identity Management guide at https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html-single/preparing_for_disaster_recovery_with_identity_management/index

For more information, refer to the Performing Disaster Recovery with Identity Management guide at https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html-single/performing_disaster_recovery_with_identity_management/index

Discuss Red Hat Security: Identity Management and Authentication

Go to community

Welcome to the Red Hat Security: Identity Management and Active Directory Integration group!

Syed

26 wrz 2023

We are excited to launch a space dedicated to the Red Hat Training course Red Hat Security: Identity Management and Active Directory Integration! To gain the most value from this group - click the "Join Group" button in the upper right hand corner of the group home page.We encourage group members to collaborate in this group to discuss topics, ask questions, share best practices and tips, provide course feedback, and share their accomplishments as it relates to RH362.Read more about Red Hat Security: Identity Management and Active Directory Integration here.

215

Revision: rh362-9.1-4c6fdb8