API Rate Limiting & Abuse Prevention
Implement API rate limiting — token bucket, sliding window, per-user and per-IP limits, Spring Boot implementation with Bucket4j, and abuse prevention strategies.
Without rate limiting, a single client can exhaust your server. A bot scrapes your entire catalog. A buggy mobile app retries 1000 times per second. A competitor hammers your API to degrade your service.
Rate limiting caps how many requests a client can make in a time window. It protects your infrastructure, ensures fair usage, and prevents abuse.
Rate limiting algorithms
Token bucket
Imagine a bucket that fills with tokens at a steady rate. Each request takes one token. When the bucket is empty, requests are rejected.
Bucket capacity: 100 tokens
Refill rate: 10 tokens/second
Request 1: 100 tokens → 99 tokens (allowed)
Request 2: 99 tokens → 98 tokens (allowed)
...
Request 100: 1 token → 0 tokens (allowed)
Request 101: 0 tokens (rejected — 429 Too Many Requests)
1 second later: 0 + 10 = 10 tokens available
Pros: Allows short bursts (up to bucket capacity), smooth refill. Cons: Burst at bucket capacity can still be high.
Fixed window
Count requests in fixed time windows (e.g., per minute):
Window: 12:00:00 - 12:00:59 → limit 100 requests
12:00:00 → count: 1 (allowed)
12:00:30 → count: 50 (allowed)
12:00:59 → count: 100 (allowed)
12:00:59 → count: 101 (rejected)
12:01:00 → count resets to 0
Pros: Simple to implement. Cons: Boundary problem — 100 requests at 12:00:59 + 100 at 12:01:00 = 200 requests in 2 seconds.
Sliding window
Combines fixed window and sliding log to smooth the boundary:
Current time: 12:01:15 (25% into current window)
Previous window count: 80
Current window count: 20
Weighted count = 80 * 0.75 + 20 = 80
Limit: 100 → allowed
Pros: Smoother than fixed window, no boundary spikes. Cons: Slightly more complex to implement.
Implementation with Spring Boot + Bucket4j
Bucket4j implements the token bucket algorithm and integrates with Spring Boot:
dependencies {
implementation("com.bucket4j:bucket4j-core:8.14.0")
}
Per-IP rate limiter
import io.github.bucket4j.Bandwidth
import io.github.bucket4j.Bucket
import io.github.bucket4j.Refill
import jakarta.servlet.FilterChain
import jakarta.servlet.http.HttpServletRequest
import jakarta.servlet.http.HttpServletResponse
import org.springframework.stereotype.Component
import org.springframework.web.filter.OncePerRequestFilter
import java.time.Duration
import java.util.concurrent.ConcurrentHashMap
@Component
class RateLimitFilter : OncePerRequestFilter() {
private val buckets = ConcurrentHashMap<String, Bucket>()
override fun doFilterInternal(
request: HttpServletRequest,
response: HttpServletResponse,
filterChain: FilterChain
) {
val clientIp = getClientIp(request)
val bucket = buckets.computeIfAbsent(clientIp) { createBucket() }
val probe = bucket.tryConsumeAndReturnRemaining(1)
if (probe.isConsumed) {
response.setHeader("X-Rate-Limit-Remaining", probe.remainingTokens.toString())
filterChain.doFilter(request, response)
} else {
val waitSeconds = Duration.ofNanos(probe.nanosToWaitForRefill).seconds
response.setHeader("Retry-After", waitSeconds.toString())
response.setHeader("X-Rate-Limit-Remaining", "0")
response.status = 429
response.writer.write("""{"error": "Too many requests. Retry after $waitSeconds seconds."}""")
}
}
private fun createBucket(): Bucket {
val limit = Bandwidth.classic(
100, // 100 tokens
Refill.intervally(100, Duration.ofMinutes(1)) // refill 100 every minute
)
return Bucket.builder().addLimit(limit).build()
}
private fun getClientIp(request: HttpServletRequest): String {
val forwarded = request.getHeader("X-Forwarded-For")
return if (forwarded != null) forwarded.split(",")[0].trim()
else request.remoteAddr
}
}
Per-user rate limiter
For authenticated endpoints, rate limit by user ID instead of IP:
@Component
class AuthenticatedRateLimitFilter(
private val rateLimitService: RateLimitService
) : OncePerRequestFilter() {
override fun doFilterInternal(
request: HttpServletRequest,
response: HttpServletResponse,
filterChain: FilterChain
) {
val userId = SecurityContextHolder.getContext().authentication?.name
?: return filterChain.doFilter(request, response)
val bucket = rateLimitService.getBucket(userId)
val probe = bucket.tryConsumeAndReturnRemaining(1)
if (probe.isConsumed) {
response.setHeader("X-Rate-Limit-Remaining", probe.remainingTokens.toString())
filterChain.doFilter(request, response)
} else {
response.status = 429
response.writer.write("""{"error": "Rate limit exceeded"}""")
}
}
}
Different limits per endpoint
@RestController
@RequestMapping("/api")
class ApiController(private val rateLimitService: RateLimitService) {
@GetMapping("/search")
fun search(@RequestParam q: String): List<Result> {
// Search: 30 requests/minute
rateLimitService.checkLimit("search", 30, Duration.ofMinutes(1))
return searchService.search(q)
}
@PostMapping("/orders")
fun createOrder(@RequestBody request: CreateOrderRequest): Order {
// Order creation: 5 requests/minute
rateLimitService.checkLimit("orders", 5, Duration.ofMinutes(1))
return orderService.create(request)
}
@GetMapping("/products")
fun listProducts(): List<Product> {
// Product listing: 100 requests/minute
rateLimitService.checkLimit("products", 100, Duration.ofMinutes(1))
return productService.getAll()
}
}
Distributed rate limiting with Redis
For multiple server instances, use Redis as the shared counter:
dependencies {
implementation("com.bucket4j:bucket4j-redis:8.14.0")
implementation("io.lettuce:lettuce-core:6.4.1.RELEASE")
}
@Bean
fun proxyManager(redisClient: RedisClient): ProxyManager<String> {
val connection = redisClient.connect()
return LettuceBasedProxyManager.builderFor(connection)
.build()
}
fun getBucket(key: String): Bucket {
val config = BucketConfiguration.builder()
.addLimit(Bandwidth.classic(100, Refill.intervally(100, Duration.ofMinutes(1))))
.build()
return proxyManager.builder()
.build(key, { config })
}
All server instances share the same rate limit counter through Redis.
Response headers
Always include rate limit information in response headers:
HTTP/1.1 200 OK
X-Rate-Limit-Limit: 100
X-Rate-Limit-Remaining: 73
X-Rate-Limit-Reset: 1719003600
HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-Rate-Limit-Remaining: 0
| Header | Purpose |
|---|---|
X-Rate-Limit-Limit | Total requests allowed |
X-Rate-Limit-Remaining | Requests remaining in window |
X-Rate-Limit-Reset | When the window resets (Unix timestamp) |
Retry-After | Seconds to wait before retrying |
These headers help legitimate clients adjust their request rate.
Abuse prevention beyond rate limiting
1. IP-based blocking
Block IPs that consistently exceed rate limits:
val violationCounts = ConcurrentHashMap<String, AtomicInteger>()
fun recordViolation(ip: String) {
val count = violationCounts.computeIfAbsent(ip) { AtomicInteger(0) }
if (count.incrementAndGet() > 10) {
blockIp(ip, Duration.ofHours(1))
}
}
2. Request size limits
// application.yml
spring:
servlet:
multipart:
max-file-size: 10MB
max-request-size: 10MB
codec:
max-in-memory-size: 1MB
3. Slow down responses (tarpit)
Instead of rejecting immediately, introduce artificial delay for suspicious clients:
if (suspiciousClient(request)) {
Thread.sleep(2000) // 2-second delay
}
This wastes the attacker’s resources without revealing that they’ve been detected.
4. Require authentication for expensive operations
Don’t let anonymous users access expensive endpoints (search, export, bulk operations). Require authentication so you can rate limit per user and revoke access.
5. CAPTCHA for sensitive actions
Registration, password reset, and contact forms are targets for bots. Add CAPTCHA (reCAPTCHA, hCaptcha) for these endpoints.
Choosing rate limits
| Endpoint type | Suggested limit |
|---|---|
| Public API (read) | 60–100 requests/minute |
| Authenticated API | 100–1000 requests/minute |
| Search | 20–30 requests/minute |
| Write operations | 10–30 requests/minute |
| Authentication | 5–10 attempts/minute |
| Password reset | 3 requests/hour |
| File upload | 10 requests/hour |
Start generous and tighten based on actual usage data.
Summary
- Token bucket is the most common algorithm — allows bursts, smooth refill
- Per-IP for unauthenticated endpoints, per-user for authenticated
- Use Redis for distributed rate limiting across server instances
- Always include rate limit headers in responses
- Combine rate limiting with IP blocking, request size limits, and CAPTCHA
- Start with generous limits, tighten based on data
Rate limiting is your first line of defense against abuse. It won’t stop a determined attacker, but it stops bots, prevents accidental overload, and buys you time to respond to attacks.