A config-driven HTTP proxy with enterprise-grade observability, security, and reliability—purpose-built for internal API gateways.
You need a proxy that:
- ✅ Routes requests intelligently (not just round-robin)
- ✅ Validates requests before they hit your backend
- ✅ Rate limits abusive clients
- ✅ Fails gracefully when upstreams are down
- ✅ Gives you deep observability (not just access logs)
- ✅ Can be configured by non-engineers
Nginx/HAProxy: Fast but config is cryptic, no custom logic
Kong/Tyk: Powerful but heavyweight, complex to operate
Roll your own: Easy to start, hard to make production-ready
A middle ground: production-ready proxy in Node.js with:
- Config-driven routing (YAML, not code)
- Built-in security (SSRF protection, rate limiting, auth)
- Deep observability (structured logs, Prometheus metrics, correlation IDs)
- Reliability patterns (circuit breakers, retries, timeouts)
- Developer-friendly (JavaScript, not Lua or C++)
- Internal API gateway for microservices
- Development/staging proxy with observability
- Custom routing logic that's easier in JavaScript than Nginx config
- Request transformation (header manipulation, body validation)
- Team has Node.js expertise
- Public-facing edge proxy (use Nginx/Cloudflare)
- Ultra-high throughput (> 10K req/sec per instance)
- Ultra-low latency (P99 < 5ms required)
- Service mesh (use Istio/Linkerd)
Most "proxy tutorials" stop at forwarding requests. This goes further:
- SSRF Protection: Block access to cloud metadata, private IPs
- Authentication: API key validation (HMAC-SHA256)
- Rate Limiting: Token bucket with Redis backend
- Input Validation: Header sanitization, payload size limits
- Allow-list: Deny by default, explicit upstream allow-list
- Circuit Breakers: Stop hitting failing upstreams
- Retries: Exponential backoff with jitter
- Timeouts: Request, connection, DNS, header, idle
- Backpressure: Reject when overloaded (don't OOM crash)
- Connection Pooling: Reuse TCP connections
- Structured Logs: JSON logs with correlation IDs
- Metrics: Prometheus-compatible (RPS, latency histograms, error rates)
- Health Checks: Liveness, readiness, deep health
- Tracing: Request flow across services (correlation IDs)
- Config Hot Reload: Update routes without restart
- Graceful Shutdown: Drain connections before exit
- Error Handling: Fail fast on bad config (don't serve traffic)
- Kubernetes-Ready: Health probes, resource limits, signals
git clone https://github.com/tapas100/flexgate-proxy.git
cd flexgate-proxy
npm install# config/proxy.yml
upstreams:
- name: "example-api"
url: "https://api.example.com"
timeout: 5000
retries: 3
routes:
- path: "/api/*"
upstream: "example-api"
auth: required
rateLimit:
max: 100
windowMs: 60000
security:
allowedHosts:
- "api.example.com"
blockedIPs:
- "169.254.169.254" # AWS metadata# Development
npm run dev
# Production
npm startcurl http://localhost:3000/api/users┌─────────────┐
│ Client │
└──────┬──────┘
│ HTTP/HTTPS
▼
┌─────────────────────────────────────┐
│ Proxy Server │
│ ┌────────────────────────────┐ │
│ │ 1. Authentication │ │
│ │ 2. Rate Limiting │ │
│ │ 3. Request Validation │ │
│ │ 4. Circuit Breaker Check │ │
│ │ 5. Route Resolution │ │
│ └────────────────────────────┘ │
└─────────────┬───────────────────────┘
│
▼
┌────────────────┐
│ Redis (State) │
│ - Rate limits │
│ - CB state │
└────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Backend Services │
│ ┌─────────┐ ┌─────────┐ │
│ │ API A │ │ API B │ │
│ └─────────┘ └─────────┘ │
└─────────────────────────────────────┘
| Metric | Value | Comparison |
|---|---|---|
| Throughput | 4.7K req/sec | Nginx: 52K (11x faster) |
| P95 Latency | 35ms | Nginx: 8ms (4x faster) |
| P99 Latency | 52ms | Nginx: 12ms (4x faster) |
| Memory | 78 MB | Nginx: 12 MB (6x smaller) |
| Proxy Overhead | ~3ms | (14% of total latency) |
Why slower than Nginx?
- Node.js (interpreted) vs C (compiled)
- Single-threaded vs multi-threaded
- GC pauses
Why use it anyway?
- Custom logic in JavaScript (not Nginx config)
- Better observability
- Shared code with backend
- Faster development
# config/proxy.yml
upstreams:
- name: "backend"
url: "http://localhost:8080"
routes:
- path: "/*"
upstream: "backend"# Global settings
proxy:
port: 3000
timeout: 30000
maxBodySize: "10mb"
# Security
security:
allowedHosts:
- "api.example.com"
- "*.internal.corp"
blockedIPs:
- "169.254.169.254"
- "10.0.0.0/8"
auth:
type: "apiKey"
header: "X-API-Key"
# Rate limiting
rateLimit:
backend: "redis"
redis:
url: "redis://localhost:6379"
global:
max: 1000
windowMs: 60000
# Upstreams
upstreams:
- name: "primary-api"
url: "https://api.primary.com"
timeout: 5000
retries: 3
circuitBreaker:
enabled: true
failureThreshold: 50
openDuration: 30000
- name: "fallback-api"
url: "https://api.fallback.com"
timeout: 10000
# Routes
routes:
- path: "/api/users/*"
upstream: "primary-api"
auth: required
rateLimit:
max: 100
windowMs: 60000
- path: "/api/batch/*"
upstream: "primary-api"
auth: required
timeout: 120000
rateLimit:
max: 10
windowMs: 60000
# Logging
logging:
level: "info"
format: "json"
sampling:
successRate: 0.1
errorRate: 1.0Kubernetes liveness probe.
{
"status": "UP",
"timestamp": "2026-01-26T10:30:45.123Z"
}Kubernetes readiness probe.
{
"status": "UP",
"checks": {
"config": "UP",
"upstreams": "UP",
"redis": "UP"
}
}Prometheus metrics.
http_requests_total{method="GET",route="/api/users",status="200"} 12543
http_request_duration_ms_bucket{route="/api/users",le="50"} 12000
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
EXPOSE 3000
CMD ["node", "bin/www"]docker build -t flexgate-proxy .
docker run -p 3000:3000 \
-v $(pwd)/config:/app/config \
-e NODE_ENV=production \
flexgate-proxyapiVersion: apps/v1
kind: Deployment
metadata:
name: flexgate-proxy
spec:
replicas: 3
selector:
matchLabels:
app: flexgate-proxy
template:
metadata:
labels:
app: flexgate-proxy
spec:
containers:
- name: proxy
image: flexgate-proxy:latest
ports:
- containerPort: 3000
resources:
limits:
memory: "256Mi"
cpu: "500m"
requests:
memory: "128Mi"
cpu: "250m"
livenessProbe:
httpGet:
path: /health/live
port: 3000
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
env:
- name: NODE_ENV
value: "production"
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: proxy-secrets
key: redis-urlImport grafana/dashboard.json for:
- Request rate (by route, status)
- Latency percentiles (P50, P95, P99)
- Error rate
- Circuit breaker state
- Rate limit hits
# Prometheus alerts
groups:
- name: proxy
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "Proxy error rate > 5%"
- alert: HighLatency
expr: http_request_duration_ms{quantile="0.99"} > 1000
for: 5m
labels:
severity: warning
annotations:
summary: "P99 latency > 1s"security:
allowedHosts:
- "api.example.com"
blockedIPs:
- "169.254.169.254" # AWS metadata
- "169.254.170.2" # ECS metadata
- "fd00:ec2::254" # AWS IPv6 metadata
- "10.0.0.0/8" # Private network
- "127.0.0.0/8" # Localhostsecurity:
auth:
type: "apiKey"
header: "X-API-Key"
keys:
- key: "client-a-key-sha256-hash"
name: "Client A"
- key: "client-b-key-sha256-hash"
name: "Client B"rateLimit:
perRoute:
- path: "/api/expensive/*"
max: 10
windowMs: 60000
message: "This endpoint is heavily rate limited"| ❌ Not This | ✅ Use Instead |
|---|---|
| CDN / Edge cache | Cloudflare, Fastly |
| Service mesh | Istio, Linkerd |
| Raw performance proxy | Nginx, HAProxy, Envoy |
| Public-facing API gateway | Kong, Tyk, AWS API Gateway |
| Load balancer | HAProxy, AWS ALB |
Consider switching to Nginx/Envoy when:
- Throughput > 10K req/sec per instance needed
- P99 latency < 10ms required
- No custom logic needed (pure reverse proxy)
- Team lacks Node.js expertise
Circuit breaker opens → Fast-fail with 503
↓
Retry every 30s (half-open state)
↓
If success → Circuit closes
Queue fills → Backpressure kicks in
↓
Reject low-priority routes
↓
Sample logging aggressively
↓
If still overloaded → Reject all with 503
Rate limiter falls back to local state
↓
Less accurate (per-instance limits)
↓
But service stays up
Config validation fails → Startup blocked
↓
Old config still serving (hot reload)
↓
Alert fires → Engineer fixes config
Key principle: Fail closed, degrade gracefully
- Problem Statement - Scope, constraints, use cases
- Threat Model - Security analysis
- Observability - Logging, metrics, tracing
- Traffic Control - Rate limiting, circuit breakers, retries
- Trade-offs - Architectural decisions
- Benchmarks - Performance numbers
We welcome contributions! Please see CONTRIBUTING.md.
git clone https://github.com/tapas100/flexgate-proxy.git
cd flexgate-proxy
npm install
# Run tests
npm test
# Run in dev mode (with hot reload)
npm run dev
# Lint
npm run lint
# Benchmarks
npm run benchmark- mTLS Support: Mutual TLS to backends
- OpenTelemetry: Distributed tracing
- GraphQL Federation: GraphQL proxy support
- WebAssembly Plugins: Custom logic in Wasm
- gRPC Support: Proxy gRPC services
- Admin UI: Web UI for config management
MIT © Tapas M
Inspired by:
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: support@example.com
Built with ❤️ for the backend engineering community