Skip to Content
MetamodelAvailabilityAvailability Fields

Availability Fields

Within ArchRepo you can record multiple availability mechanisms for your solutions.

You can assign one or more availability mechanisms to core solution building blocks, such as applications, data stores, services, and APIs.

1. Description

  • What it’s for: A high-level explanation of what this model item represents.

  • What to include:

    • Provide a simple overview of the system or service’s availability goals.
    • Mention any critical business requirements around uptime.
    • Explain why this system’s availability is important for its users or business goals.
  • Example: "This web application is designed to meet a 99.9% availability target, ensuring users can access services reliably across all regions with minimal downtime."

2. SLA Target

  • What it’s for: Defines the Service Level Agreement (SLA) for availability as a percentage.

  • What to include:

    • Specify the target uptime percentage for this system (e.g., 99.9%, 99.99%, or “five nines”).
    • Describe any timeframes for the SLA, e.g., monthly or yearly.
    • Optionally include specific contractual obligations or penalties for breaching this SLA.
  • Example: "The system will maintain 99.99% uptime on a monthly basis, allowing for no more than 4 minutes of downtime per calendar month."

3. Redundancy Strategy

  • What it’s for: Describes the redundancy mechanisms in place to ensure availability during failures.

  • What to include:

    • Explain how the system avoids a single point of failure with redundancy.

    • Specify redundancy type:

    • Active-Active: Multiple systems running at the same time.

    • Active-Passive: A backup system becomes active during failure.

    • N+1: One extra unit is ready to handle failure (e.g., extra servers in a cluster).

    • Mention hardware/software redundancy or the use of cloud provider redundancy features.

  • Example: "The system employs an active-active redundancy strategy across three cloud regions, ensuring seamless failover in case of node or regional failures."

4. Failover Mechanisms

  • What it’s for: Defines how the system handles failures and switches to backups or alternatives.

  • What to include:

    • Explain the system’s failover process:

    • What triggers the failover (e.g., heartbeat checks, monitoring alerts)?

    • What mechanism is used (e.g., DNS failover, load balancers, automatic server replication)?

    • If applicable, include failover timing or Recovery Time Objective (RTO)—how fast the failover occurs.

  • Example: "Failover is managed via a DNS routing mechanism, with less than 1-minute RTO. The load balancer automatically routes traffic to healthy nodes if failures are detected."

5. Geographical Distribution

  • What it’s for: Details how the system is distributed across geographical regions to enhance availability.

  • What to include:

    • List the locations (datacenters, cloud regions) where the system or its components are deployed.

    • Describe how regional distribution improves availability:

    • Lowers latency for users across the globe.

    • Reduces impact of regional outages.

    • Mention any dependencies on Content Delivery Networks (CDNs), multi-region failover, or disaster recovery zones.

  • Example: "The application is deployed across three AWS regions (US-East-1, EU-West-1, AP-Southeast-1) with failover capabilities between regions and CDN caching to handle localized requests."

6. Testing Strategy

  • What it’s for: Defines how availability is tested to ensure the system meets its availability targets.

  • What to include:

    • Specify types of testing conducted:

    • Chaos engineering (e.g., injecting simulated failures to test resilience).

    • Load testing (to simulate heavy traffic conditions).

    • Failover testing (to verify backup and redundancy systems work).

    • Mention frequency of tests and tools used (e.g., Gremlin for chaos scenarios, JMeter for load testing).

    • Detail plans to simulate real-world scenarios to confirm the system remains available.

  • Example: "Availability is tested bi-weekly using failover drills to ensure uptime during simulated outages. Additionally, quarterly chaos engineering sessions are conducted to evaluate system resilience under failure scenarios."

General Notes for Architects

  • Be specific: Provide as much detail as possible while remaining concise.
  • Use real-world examples: Reference specific configurations, tools, or practices already in place.
  • Tie information to business value: Always describe how meeting these availability goals impacts the user experience or business operations.
Last updated on