GitHub - chrbailey/aether: AETHER: Adaptive Epistemic Trust for Human-AI Evaluation and Review

AETHER
Process-JEPA: Extending LeCun's Joint Embedding Architecture to Business Event Prediction

Why JEPA? • The Problem • How It Works • Quick Start • Architecture • Quick Start

JEPA for Processes

If JEPA can learn to predict the physical world from images and video, can it predict how business processes unfold?

AETHER is JEPA implementation for discrete business event sequences. It takes ideas from Yann LeCun's Joint Embedding Predictive Architecture and applies them to enterprise workflow prediction — purchase-to-pay, order-to-cash, and procurement processes.

The key ideas from the JEPA ecosystem that AETHER adapts:

JEPA Concept	Original Domain	AETHER Application
Joint Embedding	Images (I-JEPA), Video (V-JEPA)	Business event sequences
Latent-space prediction	Pixel masking, frame prediction	Event transition: `f(z_t, action, variant) → z_{t+1}`
Energy-based scoring	LeCun's EBM framework (2006)	Process conformance anomaly detection
SIGReg loss	LeJEPA (Balestriero & LeCun, 2025)	Latent collapse prevention via eigenvalue regularization
VICReg loss	VICReg (Bardes, Ponce & LeCun, 2022)	Variance-Invariance-Covariance as alternative regularizer

The novel contribution: AETHER combines these JEPA components with epistemic uncertainty decomposition and adaptive governance. The model decomposes uncertainty into what's reducible (epistemic) vs. what's inherently random (aleatoric), and uses that decomposition to dynamically tighten or relax governance thresholds — no static thresholds, no manual tuning. The system earns trust through demonstrated calibration.

"A possible path towards building a world model is to learn hierarchical representations of the world that capture both short-term and long-term dependencies." — LeCun, A Path Towards Autonomous Machine Intelligence (2022)

AETHER explores the complementary question: can JEPA model enterprise workflows, where the "world" is a structured sequence of business events?

The Problem

Every AI governance system today uses static thresholds:

Flag if confidence < 0.90
Review if drift > 0.15
Block if uncertainty > 0.80

These break immediately. A well-calibrated model gets held back by thresholds tuned for a bad one. A degrading model sails through gates set during its best day.

Worse: not all uncertainty is equal. A model that's uncertain because it hasn't seen enough data (epistemic) should trigger more review — human judgment helps. A model that's uncertain because the process is inherently random (aleatoric) should not trigger more review — no amount of human oversight reduces coin-flip randomness.

No existing system makes this distinction.

How It Works

The Core Formula (v3)

effective_threshold = base × mode_factor × uncertainty_factor × calibration_factor
min_floor = 0.50 + 0.05 × log(vocab_size / 20) / log(4)   # v3: vocabulary-aware

Each factor is independently computed and composable:

Factor	What it captures	Effect
Mode	Operational context (flexible → strict → forbidden)	Symbolic governance from PromptSpeak modes
Uncertainty	Epistemic ratio of total uncertainty	Only reducible uncertainty tightens governance
Calibration	Recent ECE/MCE/Brier score	Poorly calibrated models get tighter oversight
Vocabulary	Activity taxonomy complexity (v3)	High-vocab datasets get conservative floors

The key insight: aleatoric uncertainty is ignored in governance tightening. This is the formal contribution. It means the system won't waste human attention on inherently random outcomes.

v3 addition: The vocabulary-aware minimum floor prevents regressions on high-activity datasets. At 80+ activities, the floor rises to match static thresholds, implementing a "do no harm" principle.

Asymmetric Trust

Trust is earned slowly and lost quickly:

SUPERVISED ──[10 calibrated windows]──> GUIDED
GUIDED     ──[20 calibrated windows]──> COLLABORATIVE
COLLABORATIVE ──[50 calibrated windows]──> AUTONOMOUS

Any level ──[1 critical miss]──> immediate demotion
Any level ──[immutable violation]──> reset to SUPERVISED

This mirrors real-world trust: it takes months to build and seconds to destroy.

Safety Floor

Some constraints never relax, regardless of trust level or calibration:

Forbidden mode → always block
Sensitive data patterns (SSN, API keys, private keys) → always hold
Dempster-Shafer conflict > 0.7 → always review
Circuit breaker floor → 3+ consecutive failures = block
Uncertainty ceiling > 0.95 → always hold

Quick Start

TypeScript (Governance + MCP Server)

git clone https://github.com/christopherbailey/aether.git
cd aether
npm install
npm run build
npm test          # 99 tests — governance, modulation, bridge, tools, vocab-aware

Python (ML Core)

pip install -e ".[dev]"              # Editable install with test dependencies
python -m pytest core/tests/ -v      # 303 tests — encoder, world model, critic, training, data

Run Both Together

# Terminal 1: Python inference server
python -m core.inference.server          # Starts on localhost:8712

# Terminal 2: MCP server (connects to Claude, Cursor, etc.)
npm start

That's it. AETHER exposes 6 MCP tools that any AI assistant can call to get uncertainty-aware predictions and governance decisions.

Architecture

                          MCP Tools (6)
                    predict_next_event
                    predict_outcome
                    get_calibration
                    get_autonomy_level
                    get_effective_thresholds
                    evaluate_gate
                              |
                    TypeScript MCP Server
                    ├── Governance Modulation ← aether.config.ts
                    ├── Autonomy Controller     (asymmetric trust)
                    ├── Immutable Constraints    (safety floor)
                    │         |
                    │    HTTP bridge (:8712)
                    │         |
                    Python FastAPI Server
                    ├── EventEncoder            (activity + time + context → 128D)
                    ├── TransitionModel          (JEPA predictor: z_t → z_{t+1})
                    ├── EnergyScorer            (energy-based anomaly scoring)
                    ├── HierarchicalPredictor    (activity / phase / outcome)
                    ├── LatentVariable          (Gumbel-Softmax path variants)
                    ├── UncertaintyDecomposer    (epistemic vs. aleatoric)
                    ├── CalibrationTracker       (ECE / MCE / Brier)
                    └── ConformalPredictor       (distribution-free prediction sets)

Python Core (`core/`)

Module	Purpose
`encoder/`	Event → 128D latent state via vocabularies + Time2Vec + causal transformer
`world_model/`	JEPA-style transition model with energy scoring and hierarchical predictions
`critic/`	Epistemic/aleatoric decomposition, calibration tracking, adaptive conformal inference
`training/`	VICReg + SIGReg loss functions, multi-loss training loop, checkpoints
`inference/`	FastAPI server with `/predict`, `/calibration`, `/health` endpoints
`data/`	Unified pipeline for SAP, BPI 2019, OCEL 2.0, and CSV event logs

TypeScript MCP Server (`mcp-server/`)

Module	Purpose
`governance/`	Compositional modulation, autonomy state machine, immutable safety
`bridge/`	HTTP client to Python server with conservative fallbacks
`tools/`	6 MCP tools for predictions, calibration, and governance decisions
`types/`	Full type system mirroring Python structures

Configuration

All governance tuning lives in one file: mcp-server/src/governance/aether.config.ts

Base Thresholds

export const BASE_THRESHOLDS = {
  driftThreshold:       0.15,   // Concept drift detection
  reviewGateAutoPass:   0.92,   // Auto-pass confidence
  threatActivation:     0.60,   // Threat level activation
  conformanceDeviation: 0.05,   // Process conformance
  sayDoGap:             0.20,   // Say-Do consistency
  knowledgePromotion:   0.75,   // Knowledge promotion score
};

Modulation Coefficients

export const COEFFICIENTS = {
  modeStrength:          0.3,   // Governance mode sensitivity
  uncertaintyStrength:   0.5,   // Epistemic uncertainty sensitivity
  calibrationStrength:   0.4,   // Calibration quality sensitivity
};

Clamp Bounds

Every threshold is bounded to prevent pathological behavior. See aether.config.ts for the full configuration.

Environment Variables

Variable	Default	Description
`AETHER_PYTHON_URL`	`http://localhost:8712`	Python inference server URL
`AETHER_BPI2019_PATH`	—	Path to BPI 2019 dataset JSON file

MCP Tools

AETHER exposes 6 tools via the Model Context Protocol:

Tool	Description
`predict_next_event`	Next activity predictions with uncertainty decomposition and conformal sets
`predict_outcome`	Case outcome prediction (on-time, rework, remaining hours)
`get_calibration`	Current model calibration metrics (ECE, MCE, Brier)
`get_autonomy_level`	Trust state: SUPERVISED → GUIDED → COLLABORATIVE → AUTONOMOUS
`get_effective_thresholds`	All 6 adaptive thresholds with full modulation breakdown
`evaluate_gate`	Allow/hold/block decision with audit trail

Example: Claude Desktop Integration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "aether": {
      "command": "node",
      "args": ["/path/to/aether/mcp-server/dist/index.js"]
    }
  }
}

Key References

JEPA Ecosystem (LeCun et al.)

JEPA — LeCun, 2022. A Path Towards Autonomous Machine Intelligence. The foundational architecture.
LeJEPA — Balestriero & LeCun, 2025. Provable and Scalable Self-Supervised Learning Without the Heuristics (arXiv 2511.08544). SIGReg regularization via Epps-Pulley / random-projection. AETHER uses the eigenvalue formulation. (Official repo)
I-JEPA — Assran et al., CVPR 2023. Joint embedding for images.
V-JEPA — Bardes et al., 2024. Joint embedding for video.
VICReg — Bardes, Ponce & LeCun, ICLR 2022. Variance-Invariance-Covariance Regularization.
Energy-Based Models — LeCun et al., 2006. A Tutorial on Energy-Based Learning. The theoretical framework for AETHER's anomaly scoring.

Uncertainty & Calibration

Adaptive Conformal Inference — Gibbs & Candes, NeurIPS 2021. Distribution-free prediction sets.
Law of Total Variance — Classic. Epistemic/aleatoric uncertainty decomposition.

Temporal Encoding

Time2Vec — Kazemi et al., ICLR 2019. Continuous temporal encoding.

Benchmark Results

AETHER has been evaluated on 10 process mining datasets across 5 domains:

Dataset	Domain	Cases	MCC Improvement	Notes
Road Traffic Fine	Government	30,074	+266%	Scale validation (150K total)
SAP Workflow	Enterprise	2,896	+31.3%	Best enterprise result
Wearable Tracker	Retail	218	+17.8%	O2C process
Sepsis	Healthcare	210	+2.3%	Clinical workflows
BPI 2019	Finance	500	+0.6%	Procurement
BPIC 2012	Finance	500	+0.4%	Loan applications
Judicial	Legal	5	0.0%	Novel domain
BPI 2018	Government	2,000	-2.2%	v3 floor applied
NetSuite 2025	Finance	274	-3.3%	High class imbalance
SAP BSP669	Enterprise	767	-24.0%	77 activities (v3 candidate)

Key findings:

AETHER improves MCC on 7/10 datasets
Largest improvement at scale: Road Traffic Fine (+266% on 150K cases)
v3 vocabulary-aware floor reduces regressions on high-activity datasets

See docs/BENCHMARK_COMPARISON.md for detailed analysis.

Testing

npm test                          # TypeScript: 99 tests
python -m pytest core/tests/ -v   # Python: 303 tests
npm run test:coverage             # TypeScript coverage report
npm run test:python:coverage      # Python coverage report
npm run test:all                  # Run everything

CI runs automatically on push to main and all PRs via GitHub Actions.

License

MIT — Christopher Bailey, 2026

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
core		core
data/benchmarks		data/benchmarks
docs		docs
examples		examples
mcp-server		mcp-server
scripts		scripts
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
.prettierrc		.prettierrc
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SECURITY_AUDIT.md		SECURITY_AUDIT.md
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JEPA for Processes

The Problem

How It Works

The Core Formula (v3)

Asymmetric Trust

Safety Floor

Quick Start

TypeScript (Governance + MCP Server)

Python (ML Core)

Run Both Together

Architecture

Python Core (`core/`)

TypeScript MCP Server (`mcp-server/`)

Configuration

Base Thresholds

Modulation Coefficients

Clamp Bounds

Environment Variables

MCP Tools

Example: Claude Desktop Integration

Key References

JEPA Ecosystem (LeCun et al.)

Uncertainty & Calibration

Temporal Encoding

Benchmark Results

Testing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

chrbailey/aether

Folders and files

Latest commit

History

Repository files navigation

JEPA for Processes

The Problem

How It Works

The Core Formula (v3)

Asymmetric Trust

Safety Floor

Quick Start

TypeScript (Governance + MCP Server)

Python (ML Core)

Run Both Together

Architecture

Python Core (core/)

TypeScript MCP Server (mcp-server/)

Configuration

Base Thresholds

Modulation Coefficients

Clamp Bounds

Environment Variables

MCP Tools

Example: Claude Desktop Integration

Key References

JEPA Ecosystem (LeCun et al.)

Uncertainty & Calibration

Temporal Encoding

Benchmark Results

Testing

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Python Core (`core/`)

TypeScript MCP Server (`mcp-server/`)

Packages