When you expose a REST API, you may encounter situations where a specific client keeps sending a large number of requests, causing your server to slow down. For the basics of building a REST API, see this article as well.
In this article, we’ll implement HTTP-layer rate limiting from scratch by combining Bucket4j with Spring Boot’s OncePerRequestFilter. It supports both per-IP and per-API-key limiting, and covers returning HTTP 429 when the limit is exceeded.
Differences from Resilience4j @RateLimiter
Let’s clarify a common point of confusion first.
| Comparison | Resilience4j @RateLimiter | Bucket4j + Filter |
|---|---|---|
| Primary use | In-app throttling (protecting outbound API calls) | Per-HTTP-client limiting |
| Limiting unit | Method / entire service | IP, API key, etc. |
| Applied at | Application layer | Servlet Filter layer |
| On limit exceeded | Throws exception | HTTP 429 response |
Resilience4j is for “preventing your app from calling external services too frequently.” Bucket4j + Filter is for “blocking excessive inbound access from outside.” They complement rather than compete with each other, and are often used together.
For a detailed look at implementing Resilience4j Circuit Breaker, see this article.
Adding Dependencies
For Maven:
<dependency>
<groupId>com.github.bucket4j</groupId>
<artifactId>bucket4j-core</artifactId>
<version>8.10.1</version>
</dependency>
For Gradle:
implementation 'com.github.bucket4j:bucket4j-core:8.10.1'
No additional Spring Boot dependencies are required — this is all you need to get started.
What Is the Token Bucket Algorithm?
Bucket4j uses the token bucket algorithm. “Tokens” are continuously replenished in a bucket at a fixed rate, and each incoming request consumes one token. When the bucket is empty, the request is rejected. You only need to configure two things: the “capacity (maximum token count)” and the “refill rate.”
Configuring Bucket Capacity and Refill Rate
Use Bandwidth to define the policy. In Bucket4j 8.x, the builder style is used.
Bandwidth limit = Bandwidth.builder()
.capacity(60)
.refillGreedy(60, Duration.ofMinutes(1))
.build();
Bucket bucket = Bucket.builder().addLimit(limit).build();
This configures a limit of “up to 60 requests per minute, with 60 tokens refilled every minute.” refillGreedy replenishes tokens immediately, while refillIntervally refills them all at once per interval. Choose based on whether you want to allow short bursts.
Implementing the Rate Limit Filter with OncePerRequestFilter
See this article for the differences between Filters and Interceptors.
Here is a Filter implementation using IP address as the identifying key:
@Component
public class RateLimitFilter extends OncePerRequestFilter {
private final Map<String, Bucket> buckets = new ConcurrentHashMap<>();
@Override
protected void doFilterInternal(
HttpServletRequest request,
HttpServletResponse response,
FilterChain filterChain) throws ServletException, IOException {
String clientKey = getClientIp(request);
Bucket bucket = buckets.computeIfAbsent(clientKey, k -> createBucket());
ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);
if (probe.isConsumed()) {
filterChain.doFilter(request, response);
} else {
long waitSeconds = probe.getNanosToWaitForRefill() / 1_000_000_000;
response.setStatus(429);
response.setHeader("Retry-After", String.valueOf(waitSeconds));
response.setContentType("application/json");
response.setCharacterEncoding("UTF-8");
response.getWriter().write("{\"error\": \"Too Many Requests\"}");
}
}
private String getClientIp(HttpServletRequest request) {
// X-Forwarded-For can be spoofed by clients; only use it behind a trusted reverse proxy.
// In directly exposed environments, using getRemoteAddr() only is safer.
String forwarded = request.getHeader("X-Forwarded-For");
if (forwarded != null && !forwarded.isBlank()) {
return forwarded.split(",")[0].trim();
}
return request.getRemoteAddr();
}
private Bucket createBucket() {
Bandwidth limit = Bandwidth.builder()
.capacity(60)
.refillGreedy(60, Duration.ofMinutes(1))
.build();
return Bucket.builder().addLimit(limit).build();
}
}
computeIfAbsent() ensures a Bucket is created per IP only on first access. Using tryConsumeAndReturnRemaining() lets you retrieve both the consumption result and remaining token count in one call, so the Retry-After header value can be calculated accurately.
Warning The
X-Forwarded-Forheader can be freely spoofed by clients. Simply sendingX-Forwarded-For: 127.0.0.1can bypass rate limiting, so it should only be trusted when running behind a trusted reverse proxy such as Nginx or ALB. For directly exposed environments, usegetRemoteAddr()instead.
Watch Out for ConcurrentHashMap Memory Growth
Continuously adding per-IP Buckets to ConcurrentHashMap means memory keeps growing as the number of unique IPs increases. In a DDoS scenario with traffic from a large number of IPs, there is a risk of OutOfMemoryError, so consider replacing it with a Caffeine cache for production use.
Cache<String, Bucket> cache = Caffeine.newBuilder()
.expireAfterWrite(1, TimeUnit.HOURS)
.maximumSize(10_000)
.build();
// Use cache.get(key, k -> createBucket()) instead of buckets.computeIfAbsent()
Extending to Per-API-Key Limiting
If you want to use an API key as the identifier instead of an IP, simply change what value you extract:
private String getClientKey(HttpServletRequest request) {
String apiKey = request.getHeader("X-API-Key");
if (apiKey != null && !apiKey.isBlank()) {
return "apikey:" + apiKey;
}
return "ip:" + getClientIp(request);
}
Replace getClientIp(request) with getClientKey(request) inside doFilterInternal() to switch to API key-based limiting. By separating createBucket() per identifier type, you can implement fine-grained policies — for example, 300 requests per minute for API key holders and 30 requests per minute for unauthenticated IPs. See this article for implementing JWT authentication.
Restricting to Specific URL Patterns
Adding @Component applies the filter to all endpoints. To restrict it to specific paths, use FilterRegistrationBean:
@Bean
public FilterRegistrationBean<RateLimitFilter> rateLimitRegistration(RateLimitFilter filter) {
FilterRegistrationBean<RateLimitFilter> bean = new FilterRegistrationBean<>(filter);
bean.addUrlPatterns("/api/*");
bean.setOrder(1);
return bean;
}
In this case, remove @Component from RateLimitFilter. Setting setOrder(1) means the filter runs after Spring Security (which typically runs around Order=-100), making it suitable when you want to apply rate limiting only to authenticated requests. If you want to block before authentication, set a negative value.
Choosing Between In-Memory and Redis
This implementation is entirely in-memory, which means Buckets are not shared across instances when running multiple app instances.
| In-Memory (ConcurrentHashMap) | Redis (Bucket4j-Redis) | |
|---|---|---|
| Best for | Single instance, PoC | Multi-instance, production |
| Setup | Simple | Requires Redis |
| Accuracy | Per-instance | Accurate across the entire cluster |
If you need to scale out, consider migrating to bucket4j-redis-lettuce for Lettuce or bucket4j-redis-jedis for Jedis. The API interface is nearly identical, so migration cost is low.
Verifying with curl
# Loop and confirm that 429 is returned
for i in $(seq 1 70); do
curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8080/api/hello
done
Starting around the 61st request, you should see 429 returned, and the Retry-After header included in the response.
Summary
By combining Bucket4j with OncePerRequestFilter, you can implement HTTP-layer rate limiting with just one added dependency.
- Bucket4j 8.x is configured using the
Bandwidth.builder()style - Only trust
X-Forwarded-Forbehind a reverse proxy; usegetRemoteAddr()in directly exposed environments - For long-term operation, consider replacing
ConcurrentHashMapwith a Caffeine cache - When multi-instance support is needed, migrate to
bucket4j-redis-lettuce/bucket4j-redis-jedis
See this article for more on exception handling.