Schema Design Principles

⚠️🚧 Work in Progress

This document is a work in progress. Content may change, and some sections may be incomplete.

Validation philosophy, refinement patterns, and extensibility choices

The schema layer is the foundation that enables both runtime validation and compile-time type safety in the ATT&CK Data Model. The design of these schemas reflects careful consideration of competing concerns: STIX compliance, ATT&CK domain rules, TypeScript integration, extensibility, and performance. This explanation explores the principles that guide schema design decisions.

Core Design Philosophy

Validation as Documentation

Principle: Schemas serve as executable documentation of ATT&CK requirements.

Rather than maintaining separate documentation about data format requirements, the schemas encode these requirements directly. This ensures that documentation stays synchronized with implementation and provides immediate feedback when requirements change.

// Schema encodes the requirement that techniques must have ATT&CK IDs
const techniqueSchema = z.object({
    external_references: z.array(externalReferenceSchema).min(1)
}).superRefine((data, ctx) => {
    const firstRef = data.external_references[0];
    if (firstRef.source_name !== 'mitre-attack' || !firstRef.external_id) {
        ctx.addIssue({
            code: z.ZodIssueCode.custom,
            message: 'First external reference must contain ATT&CK ID'
        });
    }
});

This approach means that anyone using the schemas automatically understands ATT&CK requirements through validation feedback, rather than needing to consult separate documentation.

Type Inference Over Type Declaration

Principle: TypeScript types are inferred from schemas, not declared separately.

This prevents the common problem of types becoming out of sync with runtime validation:

// ✅ Single source of truth - types inferred from schema
const techniqueSchema = z.object({
    name: z.string(),
    x_mitre_platforms: z.array(z.string()).optional()
});
type Technique = z.infer<typeof techniqueSchema>;

// ❌ Avoided - separate type declarations that can drift
interface Technique {
    name: string;
    x_mitre_platforms?: string[];
}
const techniqueSchema = z.object(/* ... */);

Benefits:

Impossible for types to become inconsistent with validation
Schema changes automatically propagate to TypeScript types
Refactoring tools work across both validation and type layers

Trade-offs:

More complex TypeScript types that reflect schema structure
Limited ability to create "prettier" type declarations
Schema design is constrained by TypeScript type inference capabilities

Schema Architecture Patterns

Hierarchical Schema Composition

ATT&CK schemas are built through composition rather than inheritance, reflecting the layered nature of STIX + ATT&CK requirements:

// Base STIX requirements
const stixDomainObjectSchema = z.object({
    type: z.string(),
    spec_version: z.literal('2.1'),
    id: stixIdentifierSchema,
    created: stixTimestampSchema,
    modified: stixTimestampSchema
});

// ATT&CK-specific additions
const attackBaseObjectSchema = z.object({
    name: z.string().min(1),
    description: z.string().min(1),
    x_mitre_attack_spec_version: z.string(),
    x_mitre_version: z.string(),
    x_mitre_domains: z.array(z.enum(['enterprise-attack', 'mobile-attack', 'ics-attack'])),
    external_references: z.array(externalReferenceSchema).min(1)
});

// Object-specific requirements
const techniqueSchema = stixDomainObjectSchema
    .merge(attackBaseObjectSchema)
    .extend({
        type: z.literal('attack-pattern'),
        kill_chain_phases: z.array(killChainPhaseSchema),
        x_mitre_platforms: z.array(z.string()).optional(),
        x_mitre_is_subtechnique: z.boolean().optional()
    });

Design Rationale:

Composition over inheritance enables flexible schema reuse
Layer separation makes it clear which requirements come from which standards
Merge vs extend choice depends on whether properties can conflict

Simple property validation isn't sufficient for ATT&CK's business rules. The refinement pattern handles cross-property and contextual validation:

const techniqueWithRefinements = techniqueSchema
    .superRefine(createAttackIdRefinement())
    .superRefine(createTacticRefinement())
    .superRefine(createSubtechniqueRefinement());

// Refinements are factory functions for reusability
function createAttackIdRefinement() {
    return (data: any, ctx: z.RefinementCtx) => {
        const firstRef = data.external_references?.[0];
        if (!firstRef || firstRef.source_name !== 'mitre-attack') {
            ctx.addIssue({
                code: z.ZodIssueCode.custom,
                message: 'First external reference must be ATT&CK ID',
                path: ['external_references', 0]
            });
        }

        // Validate ATT&CK ID format
        const attackId = firstRef.external_id;
        if (attackId && !isValidAttackId(attackId, data.type)) {
            ctx.addIssue({
                code: z.ZodIssueCode.custom,
                message: `Invalid ATT&CK ID format: ${attackId}`,
                path: ['external_references', 0, 'external_id']
            });
        }
    };
}

Why Factory Functions:

Reusability across multiple schemas
Testability in isolation from schemas
Composability - refinements can be mixed and matched
Extensibility - users can add their own refinements

Extensibility Architecture

Zod's extension methods (extend, merge, pick, omit) discard refinements, creating a challenge for extensibility:

// ❌ This loses refinements
const extendedTechnique = techniqueSchema.extend({
    x_custom_field: z.string()
});

// ✅ This preserves refinements
const extendedTechnique = techniqueSchema
    .extend({ x_custom_field: z.string() })
    .superRefine(createAttackIdRefinement())  // Must reapply manually
    .superRefine(createTacticRefinement());

Design Response:

Export refinement factories so users can reapply them
Document which refinements each schema uses
Provide extension helpers that automate refinement reapplication

// Helper function for common extension pattern
export function extendAttackSchema<T extends z.ZodRawShape, U>(
    baseSchema: z.ZodObject<T>,
    extensions: U,
    refinements: Array<(data: any, ctx: z.RefinementCtx) => void>
) {
    let extended = baseSchema.extend(extensions);
    for (const refinement of refinements) {
        extended = extended.superRefine(refinement);
    }
    return extended;
}

// Usage
const customTechnique = extendAttackSchema(
    techniqueSchema,
    { x_org_threat_level: z.enum(['low', 'medium', 'high']) },
    [createAttackIdRefinement(), createTacticRefinement()]
);

Custom Property Conventions

STIX custom property naming conventions are enforced through schema design:

// Custom properties must use x_ prefix
const customPropertySchema = z.record(
    z.string().regex(/^x_[a-z0-9_]+$/),  // Enforce naming convention
    z.unknown()                          // Allow any value type
);

const extensibleObjectSchema = baseSchema.and(customPropertySchema);

Benefits:

Standards compliance is automatically enforced
Namespace conflicts are prevented
Tool interoperability is preserved

Limitations:

Naming restrictions may feel constraining
TypeScript integration is more complex with dynamic properties
Validation messages may be unclear for naming violations

Validation Philosophy

Strict vs Relaxed Modes

The schema design supports two validation philosophies through parsing modes:

Strict Mode Philosophy

"Data integrity is paramount - invalid data must be rejected completely"

// Strict mode - all objects must validate
const result = techniqueSchema.parse(data);  // Throws on any error

Use cases:

Production applications requiring data quality guarantees
Integration with systems that can't handle partial data
Testing and validation scenarios

Relaxed Mode Philosophy

"Partial data is better than no data - log errors but continue processing"

// Relaxed mode - continue with valid objects
const result = techniqueSchema.safeParse(data);
if (!result.success) {
    console.warn('Validation warnings:', result.error.errors);
    return null;  // Skip invalid objects but continue processing
}
return result.data;

Use cases:

Research and development environments
Integration with data sources that may have minor quality issues
Gradual data migration scenarios

Error Message Design

Schema error messages are designed for developer clarity rather than end-user consumption:

// Good error message design
ctx.addIssue({
    code: z.ZodIssueCode.custom,
    message: 'Technique must specify at least one platform in x_mitre_platforms',
    path: ['x_mitre_platforms']
});

// Poor error message design
ctx.addIssue({
    code: z.ZodIssueCode.custom,
    message: 'Invalid',  // Too vague
    path: []             // No specific location
});

Design principles:

Specific location - use path arrays to pinpoint issues
Actionable guidance - explain what needs to be fixed
Context awareness - reference ATT&CK concepts, not just schema structure
Technical accuracy - precise terminology for developers

Performance Considerations

Schema Compilation Strategy

Schemas are designed to compile once and reuse many times:

// ✅ Compile schema once
const compiledSchema = techniqueSchema;

// ✅ Reuse for multiple validations
data.forEach(item => compiledSchema.parse(item));

// ❌ Don't recompile schemas
data.forEach(item => z.object({/* ... */}).parse(item));

Optimization strategies:

Schema caching to avoid recompilation
Partial compilation for frequently-used sub-schemas
Lazy loading for rarely-used validation rules

Memory vs Speed Tradeoffs

Schema design balances memory usage against validation speed:

// Memory-efficient but slower - shared schema instances
const baseSchema = z.object({/* large schema */});
const schema1 = baseSchema.extend({/* additions */});
const schema2 = baseSchema.extend({/* different additions */});

// Faster but memory-intensive - separate compiled schemas
const schema1 = z.object({/* complete schema */});
const schema2 = z.object({/* complete schema with duplicated parts */});

The library chooses memory efficiency over raw speed because:

ATT&CK datasets are large but validation happens infrequently
Schema compilation time is amortized across many validations
Memory usage affects application scalability more than validation speed

Testing Strategy for Schemas

Positive and Negative Test Cases

Each schema includes comprehensive test coverage:

describe('techniqueSchema', () => {
    it('accepts valid technique objects', () => {
        const validTechnique = {
            type: 'attack-pattern',
            id: 'attack-pattern--12345678-1234-1234-1234-123456789012',
            spec_version: '2.1',
            created: '2023-01-01T00:00:00.000Z',
            modified: '2023-01-01T00:00:00.000Z',
            name: 'Process Injection',
            description: 'Adversaries may inject code...',
            external_references: [{
                source_name: 'mitre-attack',
                external_id: 'T1055',
                url: 'https://attack.mitre.org/techniques/T1055'
            }],
            x_mitre_attack_spec_version: '3.3.0',
            x_mitre_version: '1.0'
        };

        expect(() => techniqueSchema.parse(validTechnique)).not.toThrow();
    });

    it('rejects techniques without ATT&CK IDs', () => {
        const invalidTechnique = {
            // Valid STIX structure but missing ATT&CK ID
            type: 'attack-pattern',
            id: 'attack-pattern--12345678-1234-1234-1234-123456789012',
            // ... other required fields
            external_references: [{
                source_name: 'other-source',  // Not mitre-attack
                external_id: 'OTHER-001'
            }]
        };

        expect(() => techniqueSchema.parse(invalidTechnique)).toThrow();
    });
});

Real-World Data Testing

Schemas are tested against actual ATT&CK releases:

// Test against production ATT&CK data
describe('schema compatibility', () => {
    it('validates current ATT&CK enterprise data', async () => {
        const enterpriseData = await loadAttackData('enterprise-attack', '15.1');

        enterpriseData.objects.forEach(obj => {
            if (obj.type === 'attack-pattern') {
                expect(() => techniqueSchema.parse(obj)).not.toThrow();
            }
        });
    });
});

This ensures schemas remain compatible with real ATT&CK data evolution.

Future Evolution Patterns

Schema Versioning Strategy

Schemas evolve alongside ATT&CK specifications:

// Version-aware schema selection
const getSchemaForVersion = (attackVersion: string) => {
    if (attackVersion >= '3.3.0') {
        return techniqueSchemaV3_3;
    } else if (attackVersion >= '3.0.0') {
        return techniqueSchemaV3_0;
    } else {
        return legacyTechniqueSchema;
    }
};

Migration strategy:

Backward compatibility for older ATT&CK versions
Gradual deprecation of obsolete schema features
Clear migration paths between schema versions

Extension Point Evolution

Schema extension mechanisms evolve to support new use cases:

// Current extension pattern
const extended = baseSchema.extend(customFields).superRefine(customRefinement);

// Future extension pattern (conceptual)
const extended = baseSchema.withExtensions({
    fields: customFields,
    refinements: [customRefinement],
    metadata: { version: '2.0', namespace: 'org' }
});

Living with Schema Complexity

When Schemas Feel Too Complex

The ATT&CK Data Model schemas can feel overwhelming because they encode the full complexity of STIX + ATT&CK requirements. This complexity is intentional and necessary:

STIX compliance requires numerous fields and validation rules
ATT&CK domain rules add additional constraints beyond STIX
Type safety demands precise specification of all possibilities
Extensibility needs mechanisms that maintain standards compliance

Simplification Strategies

When working with the schemas feels too complex, consider:

Use higher-level APIs - Implementation classes abstract schema complexity
Focus on your use case - You don't need to understand all schema features
Start simple - Begin with basic objects and add complexity gradually
Leverage validation feedback - Let schema errors guide you to correct usage

The Schema Design Philosophy in Practice

The schema design reflects a philosophical commitment to:

Correctness over convenience - Better to be precise than easy
Standards compliance over optimization - STIX compatibility is non-negotiable
Type safety over flexibility - Catch errors at compile time when possible
Documentation through code - Schemas serve as executable specifications
Future-proofing over current simplicity - Design for evolution and extension

Understanding these philosophical commitments helps you work effectively with the schema layer and contribute to its continued development.

Schema Design Principles

Core Design Philosophy

Validation as Documentation

Type Inference Over Type Declaration

Schema Architecture Patterns

Hierarchical Schema Composition

Refinement Pattern for Complex Validation

Extensibility Architecture

The Refinement Challenge

Custom Property Conventions

Validation Philosophy

Strict vs Relaxed Modes

Strict Mode Philosophy

Relaxed Mode Philosophy

Error Message Design

Performance Considerations

Schema Compilation Strategy

Memory vs Speed Tradeoffs

Testing Strategy for Schemas

Positive and Negative Test Cases

Real-World Data Testing

Future Evolution Patterns

Schema Versioning Strategy

Extension Point Evolution

Living with Schema Complexity

When Schemas Feel Too Complex

Simplification Strategies

The Schema Design Philosophy in Practice

Core Design Philosophy​

Validation as Documentation​

Type Inference Over Type Declaration​

Schema Architecture Patterns​

Hierarchical Schema Composition​

Refinement Pattern for Complex Validation​

Extensibility Architecture​

The Refinement Challenge​

Custom Property Conventions​

Validation Philosophy​

Strict vs Relaxed Modes​

Strict Mode Philosophy​

Relaxed Mode Philosophy​

Error Message Design​

Performance Considerations​

Schema Compilation Strategy​

Memory vs Speed Tradeoffs​

Testing Strategy for Schemas​

Positive and Negative Test Cases​

Real-World Data Testing​

Future Evolution Patterns​

Schema Versioning Strategy​

Extension Point Evolution​

Living with Schema Complexity​

When Schemas Feel Too Complex​

Simplification Strategies​

The Schema Design Philosophy in Practice​

Core Design Philosophy

Validation as Documentation

Type Inference Over Type Declaration

Schema Architecture Patterns

Hierarchical Schema Composition

Refinement Pattern for Complex Validation

Extensibility Architecture

The Refinement Challenge

Custom Property Conventions

Validation Philosophy

Strict vs Relaxed Modes

Strict Mode Philosophy

Relaxed Mode Philosophy

Error Message Design

Performance Considerations

Schema Compilation Strategy

Memory vs Speed Tradeoffs

Testing Strategy for Schemas

Positive and Negative Test Cases

Real-World Data Testing

Future Evolution Patterns

Schema Versioning Strategy

Extension Point Evolution

Living with Schema Complexity

When Schemas Feel Too Complex

Simplification Strategies

The Schema Design Philosophy in Practice