Skip to Content
MetamodelRecoverabilityRecoverability Fields

Recoverability Fields

This is the information that can be recorded for each Recoverability mechanism documented in ArchRepo.

Recoverability is a cross-cutting concern, and the recoverability mechanism can be applied to multiple solution building blocks.

1. Description

  • What it’s for: Provides a short overview of the recoverability mechanism being described.

  • What to include:

    • A concise explanation of what this mechanism does.
    • Mention key goals or outcomes (e.g., data protection, system restoration).
  • Examples:

    • "This recoverability mechanism ensures business continuity by maintaining frequent database backups and enabling fast disaster recovery."
    • "The mechanism automates web server recovery with no data loss and minimal downtime."

2. Backup Strategy

  • What it’s for: Captures the method or strategy used for backups.

  • What to include:

    • Fully describe how backups are created, stored, and retrieved.
    • Specify whether the backup approach is Full, Incremental, Differential, or Snapshot-based.
    • Mention where backups are stored (e.g., on cloud, off-site, local storage) and how they are protected.
  • Examples:

    • "Daily incremental backups are created on AWS S3, and weekly full backups are stored off-site for added redundancy."
    • "A snapshot-based backup system is used for virtual machines, which allows quick restoration to previous states in seconds."

3. Recovery Time Objective

  • What it’s for: Defines the maximum amount of downtime the system can tolerate after an issue before being restored.

  • What to include:

    • State the target time to fully recover the system.
    • Include specific time values (e.g., “15 minutes”, “2 hours”).
    • Mention any priorities or service-level agreements (SLAs) attached to this time window.
  • Examples:

    • "The target recovery time for the website is less than 30 minutes for critical failures."
    • "In the event of a major incident, recovery will be completed within 1 hour to minimize downtime for end-user systems."

4. Recovery Point Objective

  • What it’s for: Describes the maximum acceptable amount of data loss during a failover or recovery process.

  • What to include:

    • Define the time period of acceptable data loss (e.g., “5 minutes of data,” “zero data loss”).
    • Connect this to the backup strategy used (e.g., frequency of backups).
    • Emphasize criticality for business operations or compliance requirements.
  • Examples:

    • "Backups are taken every 5 minutes, so the system's RPO allows for a maximum of 5 minutes of data loss during recovery."
    • "Zero data loss is critical for the financial application; continuous replication ensures this target is met."

5. Disaster Recovery Plan

  • What it’s for: Describes the overarching strategy for recovering from major disruptions or system failures.

  • What to include:

    • Provide a reference or summary of the disaster recovery process.

    • Mention major components of the plan such as:

    • Steps for failover to standby systems.

    • Locations of recovery servers (e.g., DR site locations).

    • Contacts, procedures, or automation tools used.

    • Indicate where the detailed plan is stored for team reference.

  • Examples:

    • "The disaster recovery plan includes automated failover to a geographically redundant data center and a manual escalation protocol for critical incidents."
    • "Refer to document DC-001 in the internal wiki for the full disaster recovery process, including data validation steps."

6. Test Frequency

  • What it’s for: Specifies how often recovery processes are tested to ensure they work effectively.

  • What to include:

    • Be explicit about the testing cadence (e.g., quarterly, bi-annually, annually).
    • Specify the type of tests conducted (e.g., failover testing, data restoration tests).
    • Mention any special cases where testing is triggered (e.g., before major deployments, after configuration changes).
  • Examples:

    • "Full data recovery tests are conducted quarterly, while failover testing is done annually to verify RTO and RPO compliance."
    • "Testing is done after each major release to ensure recovery steps are updated for new system configurations."

General Guidance

  • Be clear and specific: Use measurable and verifiable terms (time, frequency, strategy) when describing recoverability mechanisms.
  • Align with business needs: Ensure the information reflects requirements from SLAs, compliance, and business-critical priorities.
  • Provide traceability: Refer to recovery plans, policies, or systems that can be accessed by relevant teams as needed.
Last updated on