How to Implement API Rate Limiting in Spring Boot - Limiting Request Count with Bucket4j and Filter

2026-05-07

Written by: Zuko

About this article

This article deepens your understanding of Spring Boot. Step-by-step guide to implementing rate limiting per IP and per API key from scratch by combining Bucket4j with Spring Boot's Servlet Filter. Covers how to return HTTP 429 on limit exceeded, and clarifies the differences in use cases compared to Resilience4j @RateLimiter.

About the author View all Spring Boot articles

When you expose a REST API, you may encounter situations where a specific client keeps sending a large number of requests, causing your server to slow down. For the basics of building a REST API, see this article as well.

In this article, we’ll implement HTTP-layer rate limiting from scratch by combining Bucket4j with Spring Boot’s OncePerRequestFilter. It supports both per-IP and per-API-key limiting, and covers returning HTTP 429 when the limit is exceeded.

Differences from Resilience4j @RateLimiter

Let’s clarify a common point of confusion first.

Comparison	Resilience4j @RateLimiter	Bucket4j + Filter
Primary use	In-app throttling (protecting outbound API calls)	Per-HTTP-client limiting
Limiting unit	Method / entire service	IP, API key, etc.
Applied at	Application layer	Servlet Filter layer
On limit exceeded	Throws exception	HTTP 429 response

Resilience4j is for “preventing your app from calling external services too frequently.” Bucket4j + Filter is for “blocking excessive inbound access from outside.” They complement rather than compete with each other, and are often used together.

For a detailed look at implementing Resilience4j Circuit Breaker, see this article.

Adding Dependencies

For Maven:

<dependency>
    <groupId>com.github.bucket4j</groupId>
    <artifactId>bucket4j-core</artifactId>
    <version>8.10.1</version>
</dependency>

For Gradle:

implementation 'com.github.bucket4j:bucket4j-core:8.10.1'

No additional Spring Boot dependencies are required — this is all you need to get started.

What Is the Token Bucket Algorithm?

Bucket4j uses the token bucket algorithm. “Tokens” are continuously replenished in a bucket at a fixed rate, and each incoming request consumes one token. When the bucket is empty, the request is rejected. You only need to configure two things: the “capacity (maximum token count)” and the “refill rate.”

Configuring Bucket Capacity and Refill Rate

Use Bandwidth to define the policy. In Bucket4j 8.x, the builder style is used.

Bandwidth limit = Bandwidth.builder()
        .capacity(60)
        .refillGreedy(60, Duration.ofMinutes(1))
        .build();
Bucket bucket = Bucket.builder().addLimit(limit).build();

This configures a limit of “up to 60 requests per minute, with 60 tokens refilled every minute.” refillGreedy replenishes tokens immediately, while refillIntervally refills them all at once per interval. Choose based on whether you want to allow short bursts.

Implementing the Rate Limit Filter with OncePerRequestFilter

See this article for the differences between Filters and Interceptors.

Here is a Filter implementation using IP address as the identifying key:

@Component
public class RateLimitFilter extends OncePerRequestFilter {

    private final Map<String, Bucket> buckets = new ConcurrentHashMap<>();

    @Override
    protected void doFilterInternal(
            HttpServletRequest request,
            HttpServletResponse response,
            FilterChain filterChain) throws ServletException, IOException {

        String clientKey = getClientIp(request);
        Bucket bucket = buckets.computeIfAbsent(clientKey, k -> createBucket());

        ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);
        if (probe.isConsumed()) {
            filterChain.doFilter(request, response);
        } else {
            long waitSeconds = probe.getNanosToWaitForRefill() / 1_000_000_000;
            response.setStatus(429);
            response.setHeader("Retry-After", String.valueOf(waitSeconds));
            response.setContentType("application/json");
            response.setCharacterEncoding("UTF-8");
            response.getWriter().write("{\"error\": \"Too Many Requests\"}");
        }
    }

    private String getClientIp(HttpServletRequest request) {
        // X-Forwarded-For can be spoofed by clients; only use it behind a trusted reverse proxy.
        // In directly exposed environments, using getRemoteAddr() only is safer.
        String forwarded = request.getHeader("X-Forwarded-For");
        if (forwarded != null && !forwarded.isBlank()) {
            return forwarded.split(",")[0].trim();
        }
        return request.getRemoteAddr();
    }

    private Bucket createBucket() {
        Bandwidth limit = Bandwidth.builder()
                .capacity(60)
                .refillGreedy(60, Duration.ofMinutes(1))
                .build();
        return Bucket.builder().addLimit(limit).build();
    }
}

computeIfAbsent() ensures a Bucket is created per IP only on first access. Using tryConsumeAndReturnRemaining() lets you retrieve both the consumption result and remaining token count in one call, so the Retry-After header value can be calculated accurately.

Warning The X-Forwarded-For header can be freely spoofed by clients. Simply sending X-Forwarded-For: 127.0.0.1 can bypass rate limiting, so it should only be trusted when running behind a trusted reverse proxy such as Nginx or ALB. For directly exposed environments, use getRemoteAddr() instead.

Watch Out for ConcurrentHashMap Memory Growth

Continuously adding per-IP Buckets to ConcurrentHashMap means memory keeps growing as the number of unique IPs increases. In a DDoS scenario with traffic from a large number of IPs, there is a risk of OutOfMemoryError, so consider replacing it with a Caffeine cache for production use.

Cache<String, Bucket> cache = Caffeine.newBuilder()
        .expireAfterWrite(1, TimeUnit.HOURS)
        .maximumSize(10_000)
        .build();
// Use cache.get(key, k -> createBucket()) instead of buckets.computeIfAbsent()

Extending to Per-API-Key Limiting

If you want to use an API key as the identifier instead of an IP, simply change what value you extract:

private String getClientKey(HttpServletRequest request) {
    String apiKey = request.getHeader("X-API-Key");
    if (apiKey != null && !apiKey.isBlank()) {
        return "apikey:" + apiKey;
    }
    return "ip:" + getClientIp(request);
}

Replace getClientIp(request) with getClientKey(request) inside doFilterInternal() to switch to API key-based limiting. By separating createBucket() per identifier type, you can implement fine-grained policies — for example, 300 requests per minute for API key holders and 30 requests per minute for unauthenticated IPs. See this article for implementing JWT authentication.

Restricting to Specific URL Patterns

Adding @Component applies the filter to all endpoints. To restrict it to specific paths, use FilterRegistrationBean:

@Bean
public FilterRegistrationBean<RateLimitFilter> rateLimitRegistration(RateLimitFilter filter) {
    FilterRegistrationBean<RateLimitFilter> bean = new FilterRegistrationBean<>(filter);
    bean.addUrlPatterns("/api/*");
    bean.setOrder(1);
    return bean;
}

In this case, remove @Component from RateLimitFilter. Setting setOrder(1) means the filter runs after Spring Security (which typically runs around Order=-100), making it suitable when you want to apply rate limiting only to authenticated requests. If you want to block before authentication, set a negative value.

Choosing Between In-Memory and Redis

This implementation is entirely in-memory, which means Buckets are not shared across instances when running multiple app instances.

	In-Memory (ConcurrentHashMap)	Redis (Bucket4j-Redis)
Best for	Single instance, PoC	Multi-instance, production
Setup	Simple	Requires Redis
Accuracy	Per-instance	Accurate across the entire cluster

If you need to scale out, consider migrating to bucket4j-redis-lettuce for Lettuce or bucket4j-redis-jedis for Jedis. The API interface is nearly identical, so migration cost is low.

Verifying with curl

# Loop and confirm that 429 is returned
for i in $(seq 1 70); do
  curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8080/api/hello
done

Starting around the 61st request, you should see 429 returned, and the Retry-After header included in the response.

Summary

By combining Bucket4j with OncePerRequestFilter, you can implement HTTP-layer rate limiting with just one added dependency.

Bucket4j 8.x is configured using the Bandwidth.builder() style
Only trust X-Forwarded-For behind a reverse proxy; use getRemoteAddr() in directly exposed environments
For long-term operation, consider replacing ConcurrentHashMap with a Caffeine cache
When multi-instance support is needed, migrate to bucket4j-redis-lettuce / bucket4j-redis-jedis

See this article for more on exception handling.

References

Official documentation and references for the topics covered in this article.

How to Implement API Rate Limiting in Spring Boot - Limiting Request Count with Bucket4j and Filter

Differences from Resilience4j @RateLimiter

Adding Dependencies

What Is the Token Bucket Algorithm?

Configuring Bucket Capacity and Refill Rate

Implementing the Rate Limit Filter with OncePerRequestFilter

Watch Out for ConcurrentHashMap Memory Growth

Extending to Per-API-Key Limiting

Restricting to Specific URL Patterns

Choosing Between In-Memory and Redis

Verifying with curl

Summary

References

Related Articles

How to Implement Google Login (OAuth2) with Spring Boot

Spring Boot JWT Authentication with Spring Security (Tutorial)

Understanding Spring Security CSRF Protection Correctly - Configuration Differences Between REST APIs and Web Applications

How to Configure Spring Boot as an OAuth2 Resource Server - Implementing JWT Validation and Scope-Based Authorization

More Articles

How to Implement Type-Safe Dynamic Queries with QueryDSL in Spring Boot

Understanding Spring Security CSRF Protection Correctly - Configuration Differences Between REST APIs and Web Applications