Recoverability Fields

This is the information that can be recorded for each Recoverability mechanism documented in ArchRepo.

Recoverability is a cross-cutting concern, and the recoverability mechanism can be applied to multiple solution building blocks.

What it’s for: Provides a short overview of the recoverability mechanism being described.
What to include:
- A concise explanation of what this mechanism does.
- Mention key goals or outcomes (e.g., data protection, system restoration).
Examples:
- "This recoverability mechanism ensures business continuity by maintaining frequent database backups and enabling fast disaster recovery."
- "The mechanism automates web server recovery with no data loss and minimal downtime."

What it’s for: Captures the method or strategy used for backups.
What to include:
- Fully describe how backups are created, stored, and retrieved.
- Specify whether the backup approach is Full, Incremental, Differential, or Snapshot-based.
- Mention where backups are stored (e.g., on cloud, off-site, local storage) and how they are protected.
Examples:
- "Daily incremental backups are created on AWS S3, and weekly full backups are stored off-site for added redundancy."
- "A snapshot-based backup system is used for virtual machines, which allows quick restoration to previous states in seconds."

What it’s for: Defines the maximum amount of downtime the system can tolerate after an issue before being restored.
What to include:
- State the target time to fully recover the system.
- Include specific time values (e.g., “15 minutes”, “2 hours”).
- Mention any priorities or service-level agreements (SLAs) attached to this time window.
Examples:
- "The target recovery time for the website is less than 30 minutes for critical failures."
- "In the event of a major incident, recovery will be completed within 1 hour to minimize downtime for end-user systems."

What it’s for: Describes the maximum acceptable amount of data loss during a failover or recovery process.
What to include:
- Define the time period of acceptable data loss (e.g., “5 minutes of data,” “zero data loss”).
- Connect this to the backup strategy used (e.g., frequency of backups).
- Emphasize criticality for business operations or compliance requirements.
Examples:
- "Backups are taken every 5 minutes, so the system's RPO allows for a maximum of 5 minutes of data loss during recovery."
- "Zero data loss is critical for the financial application; continuous replication ensures this target is met."

What it’s for: Describes the overarching strategy for recovering from major disruptions or system failures.
What to include:
- Provide a reference or summary of the disaster recovery process.
- Mention major components of the plan such as:
- Steps for failover to standby systems.
- Locations of recovery servers (e.g., DR site locations).
- Contacts, procedures, or automation tools used.
- Indicate where the detailed plan is stored for team reference.
Examples:
- "The disaster recovery plan includes automated failover to a geographically redundant data center and a manual escalation protocol for critical incidents."
- "Refer to document DC-001 in the internal wiki for the full disaster recovery process, including data validation steps."

What it’s for: Specifies how often recovery processes are tested to ensure they work effectively.
What to include:
- Be explicit about the testing cadence (e.g., quarterly, bi-annually, annually).
- Specify the type of tests conducted (e.g., failover testing, data restoration tests).
- Mention any special cases where testing is triggered (e.g., before major deployments, after configuration changes).
Examples:
- "Full data recovery tests are conducted quarterly, while failover testing is done annually to verify RTO and RPO compliance."
- "Testing is done after each major release to ensure recovery steps are updated for new system configurations."

Be clear and specific: Use measurable and verifiable terms (time, frequency, strategy) when describing recoverability mechanisms.
Align with business needs: Ensure the information reflects requirements from SLAs, compliance, and business-critical priorities.
Provide traceability: Refer to recovery plans, policies, or systems that can be accessed by relevant teams as needed.