Spring Boot Actuator & OpenTelemetry — Production Monitoring

Your application works in dev. Now you need to know when it doesn’t work in production — health checks, metrics, and distributed tracing. Spring Boot 4 ships with Actuator for health/metrics and a first-class OpenTelemetry starter for tracing.

Dependencies

dependencies {
  implementation("org.springframework.boot:spring-boot-starter-web")
  implementation("org.springframework.boot:spring-boot-starter-actuator")
  implementation("org.springframework.boot:spring-boot-starter-opentelemetry")
  implementation("io.micrometer:micrometer-registry-prometheus")
  implementation("io.opentelemetry:opentelemetry-exporter-otlp")
}

Spring Boot 4 includes spring-boot-starter-opentelemetry — no more manual OpenTelemetry SDK configuration.

Actuator basics

Configuration

management:
  endpoints:
    web:
      exposure:
        include: health, info, metrics, prometheus
  endpoint:
    health:
      show-details: when-authorized
      show-components: when-authorized
  info:
    env:
      enabled: true

info:
  app:
    name: my-app
    version: 1.0.0
    description: Product catalog service

Available endpoints

Endpoint	URL	Purpose
health	`/actuator/health`	Application health status
info	`/actuator/info`	Application metadata
metrics	`/actuator/metrics`	Micrometer metrics
prometheus	`/actuator/prometheus`	Prometheus-format metrics

Don’t expose all endpoints in production. Only expose what your monitoring system needs.

Securing actuator endpoints

@Bean
fun securityFilterChain(http: HttpSecurity): SecurityFilterChain {
  return http
    .authorizeHttpRequests { auth ->
      auth
        .requestMatchers("/actuator/health").permitAll()
        .requestMatchers("/actuator/prometheus").permitAll()
        .requestMatchers("/actuator/**").hasRole("ADMIN")
        .requestMatchers("/api/**").authenticated()
        .anyRequest().denyAll()
    }
    // ... rest of config
    .build()
}

Health and Prometheus endpoints are public (load balancers and Prometheus need to reach them). Everything else requires admin access.

Health checks

Default health indicators

Spring Boot auto-configures health indicators for:

Database connectivity (DataSource)
Disk space
Kafka broker connectivity
Redis connectivity
And many more

GET /actuator/health

{
  "status": "UP",
  "components": {
    "db": { "status": "UP" },
    "diskSpace": { "status": "UP" },
    "kafka": { "status": "UP" }
  }
}

Custom health indicator

package com.example.demo.health

import org.springframework.boot.actuate.health.Health
import org.springframework.boot.actuate.health.HealthIndicator
import org.springframework.stereotype.Component
import org.springframework.web.client.RestClient

@Component
class PaymentGatewayHealthIndicator(
  private val restClient: RestClient
) : HealthIndicator {

  override fun health(): Health {
    return try {
      val response = restClient.get()
        .uri("https://payments.example.com/health")
        .retrieve()
        .toBodilessEntity()

      if (response.statusCode.is2xxSuccessful) {
        Health.up()
          .withDetail("gateway", "payments.example.com")
          .build()
      } else {
        Health.down()
          .withDetail("gateway", "payments.example.com")
          .withDetail("status", response.statusCode.value())
          .build()
      }
    } catch (e: Exception) {
      Health.down(e)
        .withDetail("gateway", "payments.example.com")
        .build()
    }
  }
}

This appears as paymentGateway in the health response (Spring strips the HealthIndicator suffix and camelCases the name).

Liveness and readiness probes

For Kubernetes:

management:
  endpoint:
    health:
      probes:
        enabled: true
      group:
        liveness:
          include: livenessState
        readiness:
          include: readinessState, db

GET /actuator/health/liveness   → Is the app alive?
GET /actuator/health/readiness  → Can it handle traffic?

Liveness checks if the JVM is running. Readiness checks if the app can serve requests (database up, dependencies available).

Metrics with Micrometer

Default metrics

Spring Boot auto-instruments:

HTTP request durations and counts
JVM memory usage
JVM thread counts
Database connection pool stats
Kafka consumer/producer metrics
Cache hit/miss rates

Browsing metrics

GET /actuator/metrics
→ Lists all available metric names

GET /actuator/metrics/http.server.requests
→ Shows request count, duration, and tags

GET /actuator/metrics/http.server.requests?tag=uri:/api/v1/products
→ Filtered by endpoint

Custom metrics

package com.example.demo.service

import com.example.demo.repository.ProductRepository
import io.micrometer.core.instrument.Counter
import io.micrometer.core.instrument.MeterRegistry
import io.micrometer.core.instrument.Timer
import org.springframework.stereotype.Service
import java.util.UUID

@Service
class ProductService(
  private val productRepository: ProductRepository,
  meterRegistry: MeterRegistry
) {

  private val createCounter: Counter = Counter.builder("products.created")
    .description("Number of products created")
    .register(meterRegistry)

  private val searchTimer: Timer = Timer.builder("products.search.duration")
    .description("Time to search products")
    .register(meterRegistry)

  fun create(request: CreateProductRequest): ProductResponse {
    val product = productRepository.save(request.toEntity())
    createCounter.increment()
    return product.toResponse()
  }

  fun search(query: String): List<ProductResponse> {
    return searchTimer.record<List<ProductResponse>> {
      productRepository.search("%$query%").map { it.toResponse() }
    }
  }
}

Prometheus export

The /actuator/prometheus endpoint exposes all metrics in Prometheus format:

# HELP products_created_total Number of products created
# TYPE products_created_total counter
products_created_total 42.0

# HELP products_search_duration_seconds Time to search products
# TYPE products_search_duration_seconds summary
products_search_duration_seconds_count 156.0
products_search_duration_seconds_sum 3.2

Prometheus scrape config

# prometheus.yml
scrape_configs:
  - job_name: 'my-app'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 15s
    static_configs:
      - targets: ['my-app:8080']

OpenTelemetry distributed tracing

Configuration

management:
  tracing:
    sampling:
      probability: 1.0  # Sample 100% in dev, lower in production
  otlp:
    tracing:
      endpoint: http://localhost:4318/v1/traces

spring:
  application:
    name: product-service

Spring Boot 4’s OpenTelemetry starter auto-instruments:

HTTP requests (incoming and outgoing)
Database queries
Kafka produce/consume
RestClient/WebClient calls

How it works

Every incoming HTTP request gets a trace ID. That trace ID propagates to every downstream call — database queries, Kafka messages, HTTP calls to other services.

Request → product-service (trace-id: abc123)
  → PostgreSQL query (trace-id: abc123, span: db-query)
  → Kafka send (trace-id: abc123, span: kafka-produce)
    → inventory-service consumer (trace-id: abc123, span: kafka-consume)
      → PostgreSQL query (trace-id: abc123, span: db-query)

All operations with the same trace ID are part of one distributed transaction. You can view the entire flow in Jaeger, Zipkin, or Grafana Tempo.

Custom spans

For important business operations, create custom spans:

package com.example.demo.service

import io.micrometer.observation.Observation
import io.micrometer.observation.ObservationRegistry
import org.springframework.stereotype.Service

@Service
class PaymentService(
  private val observationRegistry: ObservationRegistry
) {

  fun processPayment(orderId: String, amount: Double): PaymentResult {
    return Observation.createNotStarted("payment.process", observationRegistry)
      .lowCardinalityKeyValue("payment.method", "credit-card")
      .observe {
        // This block is wrapped in a span
        validatePayment(orderId, amount)
        chargeCard(orderId, amount)
        PaymentResult(success = true)
      }
  }
}

Propagating trace context to Kafka

Spring Boot auto-propagates trace context in Kafka headers. When a consumer picks up a message, the trace continues:

// Producer — trace context is automatically added to Kafka headers
kafkaTemplate.send("order-events", event.orderId.toString(), event)

// Consumer — trace context is automatically extracted from Kafka headers
@KafkaListener(topics = ["order-events"])
fun handleOrder(event: OrderPlacedEvent) {
  // This span is a child of the producer's span
  processOrder(event)
}

Configuring for RestClient

package com.example.demo.config

import org.springframework.context.annotation.Bean
import org.springframework.context.annotation.Configuration
import org.springframework.web.client.RestClient

@Configuration
class RestClientConfig {

  @Bean
  fun restClient(builder: RestClient.Builder): RestClient {
    return builder
      .baseUrl("https://api.example.com")
      .build()
  }
}

Spring Boot auto-configures the RestClient.Builder with tracing instrumentation. Use the builder bean — don’t create RestClient manually.

Complete observability stack with Docker Compose

# docker-compose-monitoring.yml
services:
  prometheus:
    image: prom/prometheus:v2.51.0
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana:10.4.0
    ports:
      - "3000:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin

  tempo:
    image: grafana/tempo:2.4.0
    ports:
      - "4318:4318"  # OTLP HTTP
      - "3200:3200"  # Tempo API
    command: ["-config.file=/etc/tempo.yaml"]
    volumes:
      - ./tempo.yaml:/etc/tempo.yaml

Grafana dashboard queries

Useful PromQL queries for your Grafana dashboard:

# Request rate per endpoint
rate(http_server_requests_seconds_count{uri!="/actuator/prometheus"}[5m])

# P99 latency
histogram_quantile(0.99, rate(http_server_requests_seconds_bucket[5m]))

# Error rate
rate(http_server_requests_seconds_count{status=~"5.."}[5m])
/ rate(http_server_requests_seconds_count[5m])

# JVM memory usage
jvm_memory_used_bytes{area="heap"}

# Active database connections
hikaricp_connections_active

Production configuration

# application-prod.yml
management:
  endpoints:
    web:
      exposure:
        include: health, prometheus
  endpoint:
    health:
      show-details: never
  tracing:
    sampling:
      probability: 0.1  # Sample 10% of traces
  otlp:
    tracing:
      endpoint: ${OTEL_EXPORTER_OTLP_ENDPOINT}

In production:

Expose only health and Prometheus endpoints
Don’t show health details to unauthenticated users
Sample traces at 10% (or less) to reduce overhead and storage
Load OTLP endpoint from environment variables

Common mistakes

Exposing all actuator endpoints. Endpoints like env, configprops, and heapdump leak sensitive data. Only expose what monitoring needs.

Sampling at 100% in production. Tracing adds overhead. Sample at 1-10% and increase when debugging specific issues.

Not adding custom metrics. Default metrics tell you about the infrastructure. Custom metrics tell you about the business — products created, orders processed, payments failed.

Creating RestClient without the builder. If you create RestClient.create() directly, you lose auto-instrumentation. Always use the injected RestClient.Builder.

Actuator gives you health and metrics. OpenTelemetry gives you distributed tracing. Together, they tell you what’s happening in your application and why it’s slow.