Spring Boot Actuator & OpenTelemetry — Production Monitoring
Set up Spring Boot Actuator for health checks and metrics, integrate OpenTelemetry for distributed tracing, and export to Prometheus and Grafana.
Your application works in dev. Now you need to know when it doesn’t work in production — health checks, metrics, and distributed tracing. Spring Boot 4 ships with Actuator for health/metrics and a first-class OpenTelemetry starter for tracing.
Dependencies
dependencies {
implementation("org.springframework.boot:spring-boot-starter-web")
implementation("org.springframework.boot:spring-boot-starter-actuator")
implementation("org.springframework.boot:spring-boot-starter-opentelemetry")
implementation("io.micrometer:micrometer-registry-prometheus")
implementation("io.opentelemetry:opentelemetry-exporter-otlp")
}
Spring Boot 4 includes spring-boot-starter-opentelemetry — no more manual OpenTelemetry SDK configuration.
Actuator basics
Configuration
management:
endpoints:
web:
exposure:
include: health, info, metrics, prometheus
endpoint:
health:
show-details: when-authorized
show-components: when-authorized
info:
env:
enabled: true
info:
app:
name: my-app
version: 1.0.0
description: Product catalog service
Available endpoints
| Endpoint | URL | Purpose |
|---|---|---|
| health | /actuator/health | Application health status |
| info | /actuator/info | Application metadata |
| metrics | /actuator/metrics | Micrometer metrics |
| prometheus | /actuator/prometheus | Prometheus-format metrics |
Don’t expose all endpoints in production. Only expose what your monitoring system needs.
Securing actuator endpoints
@Bean
fun securityFilterChain(http: HttpSecurity): SecurityFilterChain {
return http
.authorizeHttpRequests { auth ->
auth
.requestMatchers("/actuator/health").permitAll()
.requestMatchers("/actuator/prometheus").permitAll()
.requestMatchers("/actuator/**").hasRole("ADMIN")
.requestMatchers("/api/**").authenticated()
.anyRequest().denyAll()
}
// ... rest of config
.build()
}
Health and Prometheus endpoints are public (load balancers and Prometheus need to reach them). Everything else requires admin access.
Health checks
Default health indicators
Spring Boot auto-configures health indicators for:
- Database connectivity (DataSource)
- Disk space
- Kafka broker connectivity
- Redis connectivity
- And many more
GET /actuator/health
{
"status": "UP",
"components": {
"db": { "status": "UP" },
"diskSpace": { "status": "UP" },
"kafka": { "status": "UP" }
}
}
Custom health indicator
package com.example.demo.health
import org.springframework.boot.actuate.health.Health
import org.springframework.boot.actuate.health.HealthIndicator
import org.springframework.stereotype.Component
import org.springframework.web.client.RestClient
@Component
class PaymentGatewayHealthIndicator(
private val restClient: RestClient
) : HealthIndicator {
override fun health(): Health {
return try {
val response = restClient.get()
.uri("https://payments.example.com/health")
.retrieve()
.toBodilessEntity()
if (response.statusCode.is2xxSuccessful) {
Health.up()
.withDetail("gateway", "payments.example.com")
.build()
} else {
Health.down()
.withDetail("gateway", "payments.example.com")
.withDetail("status", response.statusCode.value())
.build()
}
} catch (e: Exception) {
Health.down(e)
.withDetail("gateway", "payments.example.com")
.build()
}
}
}
This appears as paymentGateway in the health response (Spring strips the HealthIndicator suffix and camelCases the name).
Liveness and readiness probes
For Kubernetes:
management:
endpoint:
health:
probes:
enabled: true
group:
liveness:
include: livenessState
readiness:
include: readinessState, db
GET /actuator/health/liveness → Is the app alive?
GET /actuator/health/readiness → Can it handle traffic?
Liveness checks if the JVM is running. Readiness checks if the app can serve requests (database up, dependencies available).
Metrics with Micrometer
Default metrics
Spring Boot auto-instruments:
- HTTP request durations and counts
- JVM memory usage
- JVM thread counts
- Database connection pool stats
- Kafka consumer/producer metrics
- Cache hit/miss rates
Browsing metrics
GET /actuator/metrics
→ Lists all available metric names
GET /actuator/metrics/http.server.requests
→ Shows request count, duration, and tags
GET /actuator/metrics/http.server.requests?tag=uri:/api/v1/products
→ Filtered by endpoint
Custom metrics
package com.example.demo.service
import com.example.demo.repository.ProductRepository
import io.micrometer.core.instrument.Counter
import io.micrometer.core.instrument.MeterRegistry
import io.micrometer.core.instrument.Timer
import org.springframework.stereotype.Service
import java.util.UUID
@Service
class ProductService(
private val productRepository: ProductRepository,
meterRegistry: MeterRegistry
) {
private val createCounter: Counter = Counter.builder("products.created")
.description("Number of products created")
.register(meterRegistry)
private val searchTimer: Timer = Timer.builder("products.search.duration")
.description("Time to search products")
.register(meterRegistry)
fun create(request: CreateProductRequest): ProductResponse {
val product = productRepository.save(request.toEntity())
createCounter.increment()
return product.toResponse()
}
fun search(query: String): List<ProductResponse> {
return searchTimer.record<List<ProductResponse>> {
productRepository.search("%$query%").map { it.toResponse() }
}
}
}
Prometheus export
The /actuator/prometheus endpoint exposes all metrics in Prometheus format:
# HELP products_created_total Number of products created
# TYPE products_created_total counter
products_created_total 42.0
# HELP products_search_duration_seconds Time to search products
# TYPE products_search_duration_seconds summary
products_search_duration_seconds_count 156.0
products_search_duration_seconds_sum 3.2
Prometheus scrape config
# prometheus.yml
scrape_configs:
- job_name: 'my-app'
metrics_path: '/actuator/prometheus'
scrape_interval: 15s
static_configs:
- targets: ['my-app:8080']
OpenTelemetry distributed tracing
Configuration
management:
tracing:
sampling:
probability: 1.0 # Sample 100% in dev, lower in production
otlp:
tracing:
endpoint: http://localhost:4318/v1/traces
spring:
application:
name: product-service
Spring Boot 4’s OpenTelemetry starter auto-instruments:
- HTTP requests (incoming and outgoing)
- Database queries
- Kafka produce/consume
- RestClient/WebClient calls
How it works
Every incoming HTTP request gets a trace ID. That trace ID propagates to every downstream call — database queries, Kafka messages, HTTP calls to other services.
Request → product-service (trace-id: abc123)
→ PostgreSQL query (trace-id: abc123, span: db-query)
→ Kafka send (trace-id: abc123, span: kafka-produce)
→ inventory-service consumer (trace-id: abc123, span: kafka-consume)
→ PostgreSQL query (trace-id: abc123, span: db-query)
All operations with the same trace ID are part of one distributed transaction. You can view the entire flow in Jaeger, Zipkin, or Grafana Tempo.
Custom spans
For important business operations, create custom spans:
package com.example.demo.service
import io.micrometer.observation.Observation
import io.micrometer.observation.ObservationRegistry
import org.springframework.stereotype.Service
@Service
class PaymentService(
private val observationRegistry: ObservationRegistry
) {
fun processPayment(orderId: String, amount: Double): PaymentResult {
return Observation.createNotStarted("payment.process", observationRegistry)
.lowCardinalityKeyValue("payment.method", "credit-card")
.observe {
// This block is wrapped in a span
validatePayment(orderId, amount)
chargeCard(orderId, amount)
PaymentResult(success = true)
}
}
}
Propagating trace context to Kafka
Spring Boot auto-propagates trace context in Kafka headers. When a consumer picks up a message, the trace continues:
// Producer — trace context is automatically added to Kafka headers
kafkaTemplate.send("order-events", event.orderId.toString(), event)
// Consumer — trace context is automatically extracted from Kafka headers
@KafkaListener(topics = ["order-events"])
fun handleOrder(event: OrderPlacedEvent) {
// This span is a child of the producer's span
processOrder(event)
}
Configuring for RestClient
package com.example.demo.config
import org.springframework.context.annotation.Bean
import org.springframework.context.annotation.Configuration
import org.springframework.web.client.RestClient
@Configuration
class RestClientConfig {
@Bean
fun restClient(builder: RestClient.Builder): RestClient {
return builder
.baseUrl("https://api.example.com")
.build()
}
}
Spring Boot auto-configures the RestClient.Builder with tracing instrumentation. Use the builder bean — don’t create RestClient manually.
Complete observability stack with Docker Compose
# docker-compose-monitoring.yml
services:
prometheus:
image: prom/prometheus:v2.51.0
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:10.4.0
ports:
- "3000:3000"
environment:
GF_SECURITY_ADMIN_PASSWORD: admin
tempo:
image: grafana/tempo:2.4.0
ports:
- "4318:4318" # OTLP HTTP
- "3200:3200" # Tempo API
command: ["-config.file=/etc/tempo.yaml"]
volumes:
- ./tempo.yaml:/etc/tempo.yaml
Grafana dashboard queries
Useful PromQL queries for your Grafana dashboard:
# Request rate per endpoint
rate(http_server_requests_seconds_count{uri!="/actuator/prometheus"}[5m])
# P99 latency
histogram_quantile(0.99, rate(http_server_requests_seconds_bucket[5m]))
# Error rate
rate(http_server_requests_seconds_count{status=~"5.."}[5m])
/ rate(http_server_requests_seconds_count[5m])
# JVM memory usage
jvm_memory_used_bytes{area="heap"}
# Active database connections
hikaricp_connections_active
Production configuration
# application-prod.yml
management:
endpoints:
web:
exposure:
include: health, prometheus
endpoint:
health:
show-details: never
tracing:
sampling:
probability: 0.1 # Sample 10% of traces
otlp:
tracing:
endpoint: ${OTEL_EXPORTER_OTLP_ENDPOINT}
In production:
- Expose only health and Prometheus endpoints
- Don’t show health details to unauthenticated users
- Sample traces at 10% (or less) to reduce overhead and storage
- Load OTLP endpoint from environment variables
Common mistakes
Exposing all actuator endpoints. Endpoints like env, configprops, and heapdump leak sensitive data. Only expose what monitoring needs.
Sampling at 100% in production. Tracing adds overhead. Sample at 1-10% and increase when debugging specific issues.
Not adding custom metrics. Default metrics tell you about the infrastructure. Custom metrics tell you about the business — products created, orders processed, payments failed.
Creating RestClient without the builder. If you create RestClient.create() directly, you lose auto-instrumentation. Always use the injected RestClient.Builder.
Actuator gives you health and metrics. OpenTelemetry gives you distributed tracing. Together, they tell you what’s happening in your application and why it’s slow.