When you expose a REST API, you may encounter situations where a specific client keeps sending a large number of requests, causing your server to slow down. For the basics of building a REST API, see this article as well.

In this article, we’ll implement HTTP-layer rate limiting from scratch by combining Bucket4j with Spring Boot’s OncePerRequestFilter. It supports both per-IP and per-API-key limiting, and covers returning HTTP 429 when the limit is exceeded.

Differences from Resilience4j @RateLimiter

Let’s clarify a common point of confusion first.

ComparisonResilience4j @RateLimiterBucket4j + Filter
Primary useIn-app throttling (protecting outbound API calls)Per-HTTP-client limiting
Limiting unitMethod / entire serviceIP, API key, etc.
Applied atApplication layerServlet Filter layer
On limit exceededThrows exceptionHTTP 429 response

Resilience4j is for “preventing your app from calling external services too frequently.” Bucket4j + Filter is for “blocking excessive inbound access from outside.” They complement rather than compete with each other, and are often used together.

For a detailed look at implementing Resilience4j Circuit Breaker, see this article.

Adding Dependencies

For Maven:

<dependency>
    <groupId>com.github.bucket4j</groupId>
    <artifactId>bucket4j-core</artifactId>
    <version>8.10.1</version>
</dependency>

For Gradle:

implementation 'com.github.bucket4j:bucket4j-core:8.10.1'

No additional Spring Boot dependencies are required — this is all you need to get started.

What Is the Token Bucket Algorithm?

Bucket4j uses the token bucket algorithm. “Tokens” are continuously replenished in a bucket at a fixed rate, and each incoming request consumes one token. When the bucket is empty, the request is rejected. You only need to configure two things: the “capacity (maximum token count)” and the “refill rate.”

Configuring Bucket Capacity and Refill Rate

Use Bandwidth to define the policy. In Bucket4j 8.x, the builder style is used.

Bandwidth limit = Bandwidth.builder()
        .capacity(60)
        .refillGreedy(60, Duration.ofMinutes(1))
        .build();
Bucket bucket = Bucket.builder().addLimit(limit).build();

This configures a limit of “up to 60 requests per minute, with 60 tokens refilled every minute.” refillGreedy replenishes tokens immediately, while refillIntervally refills them all at once per interval. Choose based on whether you want to allow short bursts.

Implementing the Rate Limit Filter with OncePerRequestFilter

See this article for the differences between Filters and Interceptors.

Here is a Filter implementation using IP address as the identifying key:

@Component
public class RateLimitFilter extends OncePerRequestFilter {

    private final Map<String, Bucket> buckets = new ConcurrentHashMap<>();

    @Override
    protected void doFilterInternal(
            HttpServletRequest request,
            HttpServletResponse response,
            FilterChain filterChain) throws ServletException, IOException {

        String clientKey = getClientIp(request);
        Bucket bucket = buckets.computeIfAbsent(clientKey, k -> createBucket());

        ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);
        if (probe.isConsumed()) {
            filterChain.doFilter(request, response);
        } else {
            long waitSeconds = probe.getNanosToWaitForRefill() / 1_000_000_000;
            response.setStatus(429);
            response.setHeader("Retry-After", String.valueOf(waitSeconds));
            response.setContentType("application/json");
            response.setCharacterEncoding("UTF-8");
            response.getWriter().write("{\"error\": \"Too Many Requests\"}");
        }
    }

    private String getClientIp(HttpServletRequest request) {
        // X-Forwarded-For can be spoofed by clients; only use it behind a trusted reverse proxy.
        // In directly exposed environments, using getRemoteAddr() only is safer.
        String forwarded = request.getHeader("X-Forwarded-For");
        if (forwarded != null && !forwarded.isBlank()) {
            return forwarded.split(",")[0].trim();
        }
        return request.getRemoteAddr();
    }

    private Bucket createBucket() {
        Bandwidth limit = Bandwidth.builder()
                .capacity(60)
                .refillGreedy(60, Duration.ofMinutes(1))
                .build();
        return Bucket.builder().addLimit(limit).build();
    }
}

computeIfAbsent() ensures a Bucket is created per IP only on first access. Using tryConsumeAndReturnRemaining() lets you retrieve both the consumption result and remaining token count in one call, so the Retry-After header value can be calculated accurately.

Warning The X-Forwarded-For header can be freely spoofed by clients. Simply sending X-Forwarded-For: 127.0.0.1 can bypass rate limiting, so it should only be trusted when running behind a trusted reverse proxy such as Nginx or ALB. For directly exposed environments, use getRemoteAddr() instead.

Watch Out for ConcurrentHashMap Memory Growth

Continuously adding per-IP Buckets to ConcurrentHashMap means memory keeps growing as the number of unique IPs increases. In a DDoS scenario with traffic from a large number of IPs, there is a risk of OutOfMemoryError, so consider replacing it with a Caffeine cache for production use.

Cache<String, Bucket> cache = Caffeine.newBuilder()
        .expireAfterWrite(1, TimeUnit.HOURS)
        .maximumSize(10_000)
        .build();
// Use cache.get(key, k -> createBucket()) instead of buckets.computeIfAbsent()

Extending to Per-API-Key Limiting

If you want to use an API key as the identifier instead of an IP, simply change what value you extract:

private String getClientKey(HttpServletRequest request) {
    String apiKey = request.getHeader("X-API-Key");
    if (apiKey != null && !apiKey.isBlank()) {
        return "apikey:" + apiKey;
    }
    return "ip:" + getClientIp(request);
}

Replace getClientIp(request) with getClientKey(request) inside doFilterInternal() to switch to API key-based limiting. By separating createBucket() per identifier type, you can implement fine-grained policies — for example, 300 requests per minute for API key holders and 30 requests per minute for unauthenticated IPs. See this article for implementing JWT authentication.

Restricting to Specific URL Patterns

Adding @Component applies the filter to all endpoints. To restrict it to specific paths, use FilterRegistrationBean:

@Bean
public FilterRegistrationBean<RateLimitFilter> rateLimitRegistration(RateLimitFilter filter) {
    FilterRegistrationBean<RateLimitFilter> bean = new FilterRegistrationBean<>(filter);
    bean.addUrlPatterns("/api/*");
    bean.setOrder(1);
    return bean;
}

In this case, remove @Component from RateLimitFilter. Setting setOrder(1) means the filter runs after Spring Security (which typically runs around Order=-100), making it suitable when you want to apply rate limiting only to authenticated requests. If you want to block before authentication, set a negative value.

Choosing Between In-Memory and Redis

This implementation is entirely in-memory, which means Buckets are not shared across instances when running multiple app instances.

In-Memory (ConcurrentHashMap)Redis (Bucket4j-Redis)
Best forSingle instance, PoCMulti-instance, production
SetupSimpleRequires Redis
AccuracyPer-instanceAccurate across the entire cluster

If you need to scale out, consider migrating to bucket4j-redis-lettuce for Lettuce or bucket4j-redis-jedis for Jedis. The API interface is nearly identical, so migration cost is low.

Verifying with curl

# Loop and confirm that 429 is returned
for i in $(seq 1 70); do
  curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8080/api/hello
done

Starting around the 61st request, you should see 429 returned, and the Retry-After header included in the response.

Summary

By combining Bucket4j with OncePerRequestFilter, you can implement HTTP-layer rate limiting with just one added dependency.

  • Bucket4j 8.x is configured using the Bandwidth.builder() style
  • Only trust X-Forwarded-For behind a reverse proxy; use getRemoteAddr() in directly exposed environments
  • For long-term operation, consider replacing ConcurrentHashMap with a Caffeine cache
  • When multi-instance support is needed, migrate to bucket4j-redis-lettuce / bucket4j-redis-jedis

See this article for more on exception handling.