cluster_stack is a security engineering platform designed to model, observe, and validate identity, configuration, and workload security controls across a hybrid environment.
This repository documents a deliberately staged build: a stable on‑prem foundation first, followed by controlled introduction of Kubernetes, cloud identity, misconfiguration scenarios, adversary activity, and detection logic.
The emphasis is on engineering correctness, telemetry integrity, and reproducible security failure modes—not dashboards, vendor demos, or SOC simulations.
- Virtualization layer: Proxmox (bare metal)
- Telemetry spine: Elastic Stack with Fleet‑managed agents
- Baseline workloads: Hardened Ubuntu hosts
- Future workload domain: Isolated local Kubernetes
- Hybrid intent: AWS integration for identity and IaC scenarios (Phase 2+)
Elastic is treated as the authoritative telemetry backbone. All workloads—VMs, containers, and future cloud integrations—are observable through it.
docs/
├── 00_architecture/
│
├── phase-1/
│ ├── virtualization-foundation.md
│ ├── elastic-stack-foundation.md
│ └── proxmox-backup-nfs.md
│
├── phase-2/
│ ├── phase-2-build-plan.md
│ └── kubernetes-security-objectives.md
│
└── reference/
├── kibana-configuration.md
├── elastic-agent-enrollment.md
├── elastic-fleet-control-invariants.md
├── disk-watermarks.md
└── operational-command-reference.md
Phase 1 establishes a stable, verifiable foundation:
- Proxmox installed on bare metal
- Ubuntu gold image created and sanitized
- Elastic Stack deployed with TLS and ILM
- Fleet Server operational
- Elastic Agents enrolled and reporting
- Osquery telemetry validated end‑to‑end
- Snapshot and backup strategy enforced
Phase 2 introduces controlled complexity, not scale:
- Local Kubernetes as a workload domain
- Kubernetes RBAC and service account abuse scenarios
- Container hardening and runtime security
- Identity and IaC misconfiguration modeling
- Hybrid AWS integration for CIEM/CSPM scenarios
All Phase 2 work is gated on:
- Clean isolation from baseline hosts
- Confirmed telemetry ingestion
- Reproducible evidence for each scenario
For every capability added, the platform produces:
- Build steps — commands and configuration
- Intentional misconfiguration — what is broken
- Evidence — logs, queries, artifacts
- Remediation — what fixes it and why
If a change cannot be observed, measured, or reversed, it does not belong in the platform.