Skip to Content
MetamodelData SetsData Sets

Data Sets

What Is a Data Set in ArchRepo?

A Data Set is a named, scoped collection of data — tables, entities, documents, files, message payloads, or API contracts — grouped for a specific purpose in the solution. Data Sets sit in the Data concern and are the primary way to define, document, and visualise the data structures that the solution uses.

Examples of Data Sets:

  • “Customer Master — relational tables holding the canonical customer record”
  • “Invoice Import File — CSV file format for supplier invoice batch uploads”
  • “Order Event Payload — JSON message schema published to the order event stream”
  • “Product Catalogue — the relational star schema for the data warehouse product dimension”
  • “Field Engineer Report — XML document structure for field service completion reports”

Data Sets are referenced using the prefix DSet-DSet-1, DSet-2, and so on.


The Data Modeller

Within each Data Set, you define the entities, attributes, and relationships that make up the data structure. The data modeller supports all structure types — not just relational tables, but also JSON documents, XML schemas, file formats, message payloads, and API contracts.

The data modeller produces two outputs:

  • Visual entity diagram — an interactive entity-relationship diagram showing all entities and their connections, rendered directly in the Data Set view
  • Entity report — a complete, structured report of every entity with its attributes (name, type, constraints) and all entity relationships; useful for design reviews, developer handoff, and data governance documentation

Populate the data model as early as possible — even a draft model surfaces missing attributes, naming inconsistencies, and relationship ambiguities before development begins.


Structure

The Structure field classifies the type of data this set contains. Selecting the correct structure type helps communicate the nature of the data and ensures the data model is interpreted correctly by the team.

GroupStructure options
RelationalRelational Tables, Relational Tables (Geospatial), Relational Star Schema
Other database typesStructured Documents, Graph Databases, Key-Value Stores
File formatsCSV, JSON, XML, Text, Excel, Word, PDF, Parquet, Avro, Delta Lake, Delta Sharing, ORC, GeoDB, Binary, and others
Message payloadsJSON, XML, Text, Binary
API contractsJSON, XML, CSV, Text, Binary

Business Information Bridge

Data Sets implement Business Information — the higher-level information concept that describes what the business works with (documents, records, messages, verbal exchanges). The Contains Business Information relationship records which business information items a Data Set is the technical implementation of.

The Data Sets v Business Information collection view provides a completeness check: any Business Information item without a linked Data Set has no defined technical implementation yet. Any Data Set without a linked Business Information item may be an orphaned data structure without a clear business purpose.


Data Stores

Data Sets belong to Data Stores — the physical or logical stores (databases, file systems, object storage) that host the data. The Data Sets v Data Stores collection view maps which Data Sets live in which Data Stores, completing the data architecture picture from logical model to physical location.


Transition States

Data Sets support Transition States, which is useful for documenting how data structures change with the solution:

  • A new Data Set may be introduced (new table schema, new message format)
  • An existing Data Set may be extended or restructured
  • A legacy Data Set may be retired once migration is complete

Use Transition States to make the evolution of the data model explicit.


Categories

Data Sets can be assigned to categories to group them by domain or theme. The Data Sets by Category view organises the register by category, making it easier to review all data structures within a particular domain (e.g. Finance, Customer, Product, Infrastructure).


Fields Reference

See Data Set Fields for a description of each field and guidance on what to record.

Last updated on