Skip to content

Conversation

@gocanto
Copy link
Collaborator

@gocanto gocanto commented Nov 12, 2025

Add Loki logging datasource to both production and local Grafana instances. This enables log viewing and analysis in Grafana dashboards.

@coderabbitai
Copy link

coderabbitai bot commented Nov 12, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/add-loki-datasource-011CV3jC33uPLgjaBXXT1sAc

Comment @coderabbitai help to get the list of available commands and usage tips.

@gocanto gocanto changed the title chore: Add Loki datasource to Grafana chore: Grafana (Loki & Traces) Nov 13, 2025
gocanto pushed a commit that referenced this pull request Nov 19, 2025
This commit fixes several critical and important issues identified in PR #159:

Critical fixes:
- Fix hardcoded Prometheus URL in Tempo config by creating separate configs
  for prod (oullin_prometheus) and local (oullin_prometheus_local) profiles
- Fix unconditional insecure OTLP connection - now conditional based on
  environment (insecure only for local/staging, secure for production)

Important fixes:
- Update OpenTelemetry semantic conventions from v1.4.0 to v1.21.0
- Fix validation tag syntax for Enabled field (true -> True)
- Fix silent error handling in tracer shutdown - now properly logs errors
  using slog.Error instead of discarding them

Changes:
- pkg/portal/tracing.go: Environment-aware security settings and updated semconv
- metal/env/tracing.go: Fixed validation tag syntax
- metal/kernel/helpers.go: Added proper error logging with slog
- docker-compose.yml: Use separate Tempo configs for prod/local profiles
- infra/metrics/tempo/tempo-config.prod.yaml: New prod-specific config
- infra/metrics/tempo/tempo-config.local.yaml: New local-specific config

Note: go mod tidy should be run separately to sync dependency versions
when network connectivity is available.
gocanto pushed a commit that referenced this pull request Nov 19, 2025
This commit addresses dependency issues preventing tests from running:

1. Fix invalid Go version: Changed from 1.25.3 (doesn't exist) to 1.24
   which matches the installed Go toolchain (1.24.7)

2. Downgrade OpenTelemetry OTLP exporter: Changed from v1.38.0 to v1.19.0
   to match the version in go.sum. The PR originally specified v1.38.0 in
   go.mod but only v1.19.0 was in go.sum, causing module resolution errors.

3. Remove semconv dependency: Replaced semantic convention imports with
   direct attribute.String() calls to avoid requiring the semconv package
   which isn't in go.sum. This maintains the same functionality using
   standard attribute keys.

These changes allow the code to build with existing cached modules.
Note: `go mod tidy` should still be run when network connectivity is
available to properly update go.sum with all dependencies.

Related to PR #159 review fixes.
claude and others added 9 commits November 26, 2025 15:14
Add Loki logging datasource to both production and local Grafana instances.
This enables log viewing and analysis in Grafana dashboards.
Add Grafana Tempo for distributed tracing support in both production and local environments.
Includes full integration with Loki (logs) and Prometheus (metrics) for unified observability.

Features:
- OTLP receivers (HTTP/gRPC) for OpenTelemetry trace ingestion
- Service graphs and span metrics generation
- Trace-to-logs and trace-to-metrics correlation
- Node graph visualization support
Integrate OpenTelemetry distributed tracing into the Go API with full Tempo support.

Changes:
- Add OpenTelemetry SDK and OTLP HTTP exporter dependencies
- Create tracing environment configuration (ENV_TRACING_ENABLED, ENV_TRACING_OTLP_ENDPOINT)
- Initialize OpenTelemetry tracer provider with automatic span export to Tempo
- Add HTTP instrumentation middleware using otelhttp for automatic request tracing
- Configure service name, version, and deployment environment in trace metadata
- Add graceful tracer shutdown on application exit
- Update docker-compose to connect API to Tempo (oullin_tempo:4318)
- Enable tracing by default in production with configurable endpoints

Features:
- Automatic HTTP request/response tracing with span naming
- Trace context propagation for distributed tracing
- Integration with existing Sentry error tracking
- Support for both production and local Tempo instances
- Always-on sampling for complete trace capture
This commit fixes several critical and important issues identified in PR #159:

Critical fixes:
- Fix hardcoded Prometheus URL in Tempo config by creating separate configs
  for prod (oullin_prometheus) and local (oullin_prometheus_local) profiles
- Fix unconditional insecure OTLP connection - now conditional based on
  environment (insecure only for local/staging, secure for production)

Important fixes:
- Update OpenTelemetry semantic conventions from v1.4.0 to v1.21.0
- Fix validation tag syntax for Enabled field (true -> True)
- Fix silent error handling in tracer shutdown - now properly logs errors
  using slog.Error instead of discarding them

Changes:
- pkg/portal/tracing.go: Environment-aware security settings and updated semconv
- metal/env/tracing.go: Fixed validation tag syntax
- metal/kernel/helpers.go: Added proper error logging with slog
- docker-compose.yml: Use separate Tempo configs for prod/local profiles
- infra/metrics/tempo/tempo-config.prod.yaml: New prod-specific config
- infra/metrics/tempo/tempo-config.local.yaml: New local-specific config

Note: go mod tidy should be run separately to sync dependency versions
when network connectivity is available.
This commit addresses dependency issues preventing tests from running:

1. Fix invalid Go version: Changed from 1.25.3 (doesn't exist) to 1.24
   which matches the installed Go toolchain (1.24.7)

2. Downgrade OpenTelemetry OTLP exporter: Changed from v1.38.0 to v1.19.0
   to match the version in go.sum. The PR originally specified v1.38.0 in
   go.mod but only v1.19.0 was in go.sum, causing module resolution errors.

3. Remove semconv dependency: Replaced semantic convention imports with
   direct attribute.String() calls to avoid requiring the semconv package
   which isn't in go.sum. This maintains the same functionality using
   standard attribute keys.

These changes allow the code to build with existing cached modules.
Note: `go mod tidy` should still be run when network connectivity is
available to properly update go.sum with all dependencies.

Related to PR #159 review fixes.
Critical fix: The previous implementation determined TLS usage based on
environment type (local/staging = insecure, production = TLS), which
caused production deployments to fail when using plain HTTP endpoints.

The problem:
- .env.example specifies http://oullin_tempo:4318 (plain HTTP)
- docker-compose.yml Tempo service only exposes plain HTTP
- Previous code: production tried to use TLS for all endpoints
- Result: TLS handshake failed, all traces dropped

The fix:
- TLS decision now based on URL scheme, not environment
- http://  = plain HTTP (WithInsecure())
- https:// = HTTPS with TLS (no WithInsecure())
- No scheme = defaults to plain HTTP (backward compatibility)

This allows:
- Production to use plain HTTP when TLS termination happens at proxy/LB
- Production to use HTTPS by changing scheme to https://
- Docker Compose setups to work in production without TLS

Updated .env.example documentation to clarify scheme-based behavior.

Fixes issue where production traces would be silently dropped due to
failed TLS handshake with plain HTTP Tempo endpoints.
Fixed validation issues causing test failures:

1. Removed `validate:"required"` from Enabled field
   - Boolean fields don't need "required" validation
   - Booleans are always either true or false (never "blank")
   - The validator was incorrectly treating false as "blank"

2. Fixed Endpoint validation tag syntax:
   - Changed from: `required_if=Enabled True,omitempty,url`
   - Changed to: `omitempty,required_if=Enabled true,url`
   - Put `omitempty` first so empty values skip validation
   - Changed `True` to `true` (lowercase) for proper boolean comparison
   - go-playground/validator expects lowercase boolean values

This fixes the test failure:
```
panic: Environment: invalid [tracing] model: {
  "enabled":"field 'enabled' cannot be blank",
  "endpoint":"field 'endpoint': '' must satisfy 'required_if' 'Enabled True' criteria"
}
```

Now validation works correctly:
- Tracing disabled (Enabled=false, Endpoint=""): Valid ✓
- Tracing enabled (Enabled=true, Endpoint="http://..."): Valid ✓
Fixed test failure where Environment validation was rejecting zero-value
TracingEnvironment structs as "blank":

```
panic: Environment: invalid [oullin] model: {
  "tracing":"field 'tracing' cannot be blank"
}
```

The problem:
- Tracing field had `validate:"required"` tag
- When tracing is disabled, TracingEnvironment has zero values
  (Enabled=false, Endpoint="")
- The validator treats zero-value structs as "blank"
- This caused validation to fail even though tracing is optional

The fix:
- Removed `validate:"required"` from Tracing field
- Tracing is now correctly treated as optional
- Individual fields within TracingEnvironment have their own validation
- When Enabled=true, the Endpoint validation still applies correctly
- When Enabled=false, validation passes as expected

This allows tests to pass when tracing environment variables are not set,
which is the expected behavior for an optional feature.
@gocanto gocanto force-pushed the claude/add-loki-datasource-011CV3jC33uPLgjaBXXT1sAc branch from fe0c553 to 62bae75 Compare November 26, 2025 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants