Rate Limiting

Rate limiting and Spike control

Subramanian T

16 Oct 2024 • 5 min read

Introduction

Rate limiting and spike control are essential techniques in managing API traffic to ensure stability and performance. Rate limiting restricts the number of requests a client can make to a service within a specified time frame, preventing abuse and overloading of resources. Spike control complements this by addressing sudden, high-volume bursts of traffic that could destabilize the system. Together, these strategies help maintain service reliability, enhance user experience, and protect backend systems from potential disruptions caused by excessive requests.

Rate limiting

Limit the number of requests(quota) in time window.
Configure the number of request and time period.
Allows the requests to reach the backend only if the available quota is greater than zero.
Using identifier multiple groups of requests can be created. Each group has a separate available quota for its window.

On Exchange here is the s-order-api that I have already been published. I have Implemented this api and deployed the Implemented application to cloudhub.

Added the order api in Api Manager here using auto discovery We paired the Implemented application from Runtime Manager. The api status is Active

This api is not secured and no limit is applied on this api.

Now add the Rate limiting policy.

Here we can create the group of the request using an identifier. So each group of request will have one quota in their time window. Configure Quota to all request.

Expose Headers

Rate limiting in MuleSoft restricts the number of requests a client can make within a specified timeframe. It helps prevent abuse and ensures fair usage of API resources. By exposing headers like X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset, clients can monitor their usage and adjust requests accordingly.

Distributed

Distributed rate limiting manages request limits across multiple instances of an API or service. This approach ensures that the rate limits are consistently applied, regardless of which instance handles the request. It prevents any single instance from becoming overwhelmed and helps maintain performance and availability, while still providing clients with the necessary information through exposed headers for effective monitoring.

After Creating the Policy successfully. While testing the endpoint using postman. After we got the result. Check the header

x-ratelimit-limit is two because we specified two requests per minute
x-ratelimit-remaining | quota is 1
x-ratelimit-reset | reset in this much time period

After a minute while hitting the endpoint.

x-ratelimit-limit is two because we specified two requests per minute
x-ratelimit-remaining | quota is 0
x-ratelimit-reset | reset in this much time period

While invoking it one more time. We got the error the Quota has been exceeded. Because we called it third time within a minute. Once its reset we can get the result again.

Spike control

The Spike Control policy uses a Sliding Window Algorithm which reduces spikes in traffic.
If not quota available, then Spike Control policy allows requests to be queued for processing later. Queuing the request regulates spikes in traffic.
Queuing a request requires retaining a thread and an HTTP connection.
By configuring Queuing Limit, we can specify how many requests can be queued.
Once Queue limit is reached requests are rejected.

Now In Policy I will create spike control click next

Setting delay attempt to 1. It means if there is no quota available It will attempt the request only one time.
Setting Queuing limit I will assign one that means only one request can be queued once there is no quota available
Expose the headers so we can see that what is the quota remaining and how much used and how much time remain to reset the time window
Click the apply button

In Postman, when I send a request, I receive a successful response. Sending the request again will also yield a successful response.

However, within a single sliding window of 5000 milliseconds, only one request can be fulfilled at a time. If I send two requests during this time, the first request will be processed, while the second request will be queued and will wait for the end of the time window. This queued request will be retried after 3000 milliseconds. If the time window elapses and the request has not been fulfilled, it will be processed at that time. If it does not get processed after the 3000 milliseconds, the request will be rejected, and an error will be returned.

Conclusion:

In conclusion, rate limiting and spike control are essential strategies for maintaining the stability and reliability of APIs and backend systems. By effectively managing the number of requests within a specified time frame, these mechanisms prevent system overload, ensure fair usage among clients, and safeguard against unexpected traffic spikes. Implementing these controls not only enhances the user experience by providing consistent performance but also protects critical resources from potential abuse or degradation. As businesses increasingly rely on digital services, adopting robust rate limiting and spike control measures will be crucial for sustainable growth and operational resilience.