-
-
Notifications
You must be signed in to change notification settings - Fork 0
chore: Grafana (Loki & Traces) #159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
gocanto
wants to merge
9
commits into
main
Choose a base branch
from
claude/add-loki-datasource-011CV3jC33uPLgjaBXXT1sAc
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
gocanto
pushed a commit
that referenced
this pull request
Nov 19, 2025
This commit fixes several critical and important issues identified in PR #159: Critical fixes: - Fix hardcoded Prometheus URL in Tempo config by creating separate configs for prod (oullin_prometheus) and local (oullin_prometheus_local) profiles - Fix unconditional insecure OTLP connection - now conditional based on environment (insecure only for local/staging, secure for production) Important fixes: - Update OpenTelemetry semantic conventions from v1.4.0 to v1.21.0 - Fix validation tag syntax for Enabled field (true -> True) - Fix silent error handling in tracer shutdown - now properly logs errors using slog.Error instead of discarding them Changes: - pkg/portal/tracing.go: Environment-aware security settings and updated semconv - metal/env/tracing.go: Fixed validation tag syntax - metal/kernel/helpers.go: Added proper error logging with slog - docker-compose.yml: Use separate Tempo configs for prod/local profiles - infra/metrics/tempo/tempo-config.prod.yaml: New prod-specific config - infra/metrics/tempo/tempo-config.local.yaml: New local-specific config Note: go mod tidy should be run separately to sync dependency versions when network connectivity is available.
gocanto
pushed a commit
that referenced
this pull request
Nov 19, 2025
This commit addresses dependency issues preventing tests from running: 1. Fix invalid Go version: Changed from 1.25.3 (doesn't exist) to 1.24 which matches the installed Go toolchain (1.24.7) 2. Downgrade OpenTelemetry OTLP exporter: Changed from v1.38.0 to v1.19.0 to match the version in go.sum. The PR originally specified v1.38.0 in go.mod but only v1.19.0 was in go.sum, causing module resolution errors. 3. Remove semconv dependency: Replaced semantic convention imports with direct attribute.String() calls to avoid requiring the semconv package which isn't in go.sum. This maintains the same functionality using standard attribute keys. These changes allow the code to build with existing cached modules. Note: `go mod tidy` should still be run when network connectivity is available to properly update go.sum with all dependencies. Related to PR #159 review fixes.
Add Loki logging datasource to both production and local Grafana instances. This enables log viewing and analysis in Grafana dashboards.
Add Grafana Tempo for distributed tracing support in both production and local environments. Includes full integration with Loki (logs) and Prometheus (metrics) for unified observability. Features: - OTLP receivers (HTTP/gRPC) for OpenTelemetry trace ingestion - Service graphs and span metrics generation - Trace-to-logs and trace-to-metrics correlation - Node graph visualization support
Integrate OpenTelemetry distributed tracing into the Go API with full Tempo support. Changes: - Add OpenTelemetry SDK and OTLP HTTP exporter dependencies - Create tracing environment configuration (ENV_TRACING_ENABLED, ENV_TRACING_OTLP_ENDPOINT) - Initialize OpenTelemetry tracer provider with automatic span export to Tempo - Add HTTP instrumentation middleware using otelhttp for automatic request tracing - Configure service name, version, and deployment environment in trace metadata - Add graceful tracer shutdown on application exit - Update docker-compose to connect API to Tempo (oullin_tempo:4318) - Enable tracing by default in production with configurable endpoints Features: - Automatic HTTP request/response tracing with span naming - Trace context propagation for distributed tracing - Integration with existing Sentry error tracking - Support for both production and local Tempo instances - Always-on sampling for complete trace capture
This commit fixes several critical and important issues identified in PR #159: Critical fixes: - Fix hardcoded Prometheus URL in Tempo config by creating separate configs for prod (oullin_prometheus) and local (oullin_prometheus_local) profiles - Fix unconditional insecure OTLP connection - now conditional based on environment (insecure only for local/staging, secure for production) Important fixes: - Update OpenTelemetry semantic conventions from v1.4.0 to v1.21.0 - Fix validation tag syntax for Enabled field (true -> True) - Fix silent error handling in tracer shutdown - now properly logs errors using slog.Error instead of discarding them Changes: - pkg/portal/tracing.go: Environment-aware security settings and updated semconv - metal/env/tracing.go: Fixed validation tag syntax - metal/kernel/helpers.go: Added proper error logging with slog - docker-compose.yml: Use separate Tempo configs for prod/local profiles - infra/metrics/tempo/tempo-config.prod.yaml: New prod-specific config - infra/metrics/tempo/tempo-config.local.yaml: New local-specific config Note: go mod tidy should be run separately to sync dependency versions when network connectivity is available.
This commit addresses dependency issues preventing tests from running: 1. Fix invalid Go version: Changed from 1.25.3 (doesn't exist) to 1.24 which matches the installed Go toolchain (1.24.7) 2. Downgrade OpenTelemetry OTLP exporter: Changed from v1.38.0 to v1.19.0 to match the version in go.sum. The PR originally specified v1.38.0 in go.mod but only v1.19.0 was in go.sum, causing module resolution errors. 3. Remove semconv dependency: Replaced semantic convention imports with direct attribute.String() calls to avoid requiring the semconv package which isn't in go.sum. This maintains the same functionality using standard attribute keys. These changes allow the code to build with existing cached modules. Note: `go mod tidy` should still be run when network connectivity is available to properly update go.sum with all dependencies. Related to PR #159 review fixes.
Critical fix: The previous implementation determined TLS usage based on environment type (local/staging = insecure, production = TLS), which caused production deployments to fail when using plain HTTP endpoints. The problem: - .env.example specifies http://oullin_tempo:4318 (plain HTTP) - docker-compose.yml Tempo service only exposes plain HTTP - Previous code: production tried to use TLS for all endpoints - Result: TLS handshake failed, all traces dropped The fix: - TLS decision now based on URL scheme, not environment - http:// = plain HTTP (WithInsecure()) - https:// = HTTPS with TLS (no WithInsecure()) - No scheme = defaults to plain HTTP (backward compatibility) This allows: - Production to use plain HTTP when TLS termination happens at proxy/LB - Production to use HTTPS by changing scheme to https:// - Docker Compose setups to work in production without TLS Updated .env.example documentation to clarify scheme-based behavior. Fixes issue where production traces would be silently dropped due to failed TLS handshake with plain HTTP Tempo endpoints.
Fixed validation issues causing test failures:
1. Removed `validate:"required"` from Enabled field
- Boolean fields don't need "required" validation
- Booleans are always either true or false (never "blank")
- The validator was incorrectly treating false as "blank"
2. Fixed Endpoint validation tag syntax:
- Changed from: `required_if=Enabled True,omitempty,url`
- Changed to: `omitempty,required_if=Enabled true,url`
- Put `omitempty` first so empty values skip validation
- Changed `True` to `true` (lowercase) for proper boolean comparison
- go-playground/validator expects lowercase boolean values
This fixes the test failure:
```
panic: Environment: invalid [tracing] model: {
"enabled":"field 'enabled' cannot be blank",
"endpoint":"field 'endpoint': '' must satisfy 'required_if' 'Enabled True' criteria"
}
```
Now validation works correctly:
- Tracing disabled (Enabled=false, Endpoint=""): Valid ✓
- Tracing enabled (Enabled=true, Endpoint="http://..."): Valid ✓
Fixed test failure where Environment validation was rejecting zero-value
TracingEnvironment structs as "blank":
```
panic: Environment: invalid [oullin] model: {
"tracing":"field 'tracing' cannot be blank"
}
```
The problem:
- Tracing field had `validate:"required"` tag
- When tracing is disabled, TracingEnvironment has zero values
(Enabled=false, Endpoint="")
- The validator treats zero-value structs as "blank"
- This caused validation to fail even though tracing is optional
The fix:
- Removed `validate:"required"` from Tracing field
- Tracing is now correctly treated as optional
- Individual fields within TracingEnvironment have their own validation
- When Enabled=true, the Endpoint validation still applies correctly
- When Enabled=false, validation passes as expected
This allows tests to pass when tracing environment variables are not set,
which is the expected behavior for an optional feature.
fe0c553 to
62bae75
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add Loki logging datasource to both production and local Grafana instances. This enables log viewing and analysis in Grafana dashboards.