ChatGPT API Rate Limit – How to Fix it

Mar 11, 2023

OpenAI has recently launched ChatGPT API allowing users to generate their own AI chatbot using API. OpenAI has introduced a wide range of features and benefits with its new launch of ChatGPT API including the use of the brand-new GPT-3.5 turbo model.

Apart from benefits, another eye-catching feature enabled on ChatGPT API is “Rate limit”. A rate limit is considered to be a restriction set on the API users regarding the accessibility of the

server or request generated in a particular duration.

Even though the rate limit has been set on other OpenAI models as well, still not many people are aware of this feature. Therefore, in this article, we are going to look into everything about the ChatGPT API rate limit, how it works, what an individual can do when they hit the rate limit, and more.

ChatGPT API Rate Limit

What is a Rate Limit

A rate limit is basically a limitation that gets inducted by an API on a user or client setting the number of times the server can be accessed within a precise period of time. This limit varies depending on the subscription plan owned by the user.

What Are Chat Gpt Api Rate Limits?

ChatGPT API contains different rate limits based on the subscription plan utilized by users. Usually, rate limits can be measured using two methods: Request Per minute (RPM) or Tokens per minute (TPM).

The free trial users usually have a 20-request limit per minute, with 150,000 tokens per minute. While paid users contain a rate limit of 60 RPM and 250 TPM in their first 48 hours. After 48 hours the rate limits of paid users turn to 3,500 RPM and 350,000 TPM.

Note: It’s essential to know rate limits can be enabled on users depending on what occurs firsthand. Like, if a user has raised 20 requests using 100 tokens to the codex endpoint, that will fill your rate limit. Regardless of the fact you didn’t use 40K tokens with the 20 requests generated by you.

Why Do Chat Gpt Have Rate Limits?

It’s common for users to receive rate limits from APIs, though the reasons behind having rate limits could be different for individual users which are as follows:

Protection against misuse or abuse of API:

It is highly common for people to continuously raise requests and flood the API causing overload in the service. Therefore, setting a rate limit on the user can help prevent the issues of overloading and disruption caused by services.

Fair-share of access to all the users:

Individuals generating a wide range of requests can cause the API to bog down for other people. By generating a rate limit on one person or organization, OpenAI allows all the users in having their rightful share of opportunities to use the API without any slow service or interruption.

Manages OpenAI aggregate load on its infrastructure:

A hasty boost in API requests can tax the server and lead to performance issues. By generating a rate limit on users, OpenAI can ensure a smooth and consistent experience for all users.

How Do Rate Limits Work?

Rate limits depend on the request and tokens generated by a user per minute. For example, if you have a rate limit of 60 requests per minute and 150K DaVinci tokens every minute, then you will be limited either by reaching the requests/min cap or running out of tokens, whichever takes place first.

Like if your request limit is 60 per minute, allow 1 request per second. So, if you send 1 request per 800ms, before hitting your rate limit, then you will only require to make your program sleep for 200ms to enable another request. However, if you enable a request without letting your program sleep for 200ms then it could lead to a failed request.

What Happens if I Hit a Rate Limit Error?

If you hit the Rate limit error on your ChatGPT API, then it means you have generated too many requests in a short duration and the API will refuse to fulfill your further request, until a specific amount of time has passed. Therefore, you need to wait until API starts receiving your request again.

Rate Limits Vs Max_tokens

All the models offered by OpenAI contain a limited number of tokens provided which can be passed forward as input for developing requests.

Users aren’t allowed to increase the highest token amount taken by a model as each model contains a rate limit set by OpenAI based on the subscription.

Like, if you are utilizing text-ada-001 (another model by OpenAI), then your highest token limitation for this model will be 2,048 tokens per request.

Is There a Limit on OpenAI Free Trial?

Yes, there is a rate restriction set of 20 requests every minute, along with 150,000 tokens per minute for free trial users.

However, if you use 20 requests using 100 tokens, then your rate limit will be regarded as stocked or utilized since you have already enabled 20 requests regardless of the number of tokens used.

What to Do in Case of Rate Limit?

If you have received a rate limitation on your account, then it is advised that you should guide to the OpenAI cookbook.

It incorporates a python notebook that defines details and methods on how you can dodge these rate limit errors. However, here are a few things that you can try in case of a Rate limit:

Stay alert

You should stay alert when developing any programmatic permit or links, computerized social media posts, or bulking processing attributes. Ensure you are only accessing authorized customers and aren’t enabling fraud or corrupted links.

Set a usage limit

Another beneficial method to avoid rate limit errors is by setting a usage limit for yourself with a specific duration for daily, weekly, and monthly use. This way users can avoid over-use and won’t have to worry about Rate limit errors.

Retrying with exponential backoff

One way to dodge rate limit error is by retrying the requests on ChatGPT API automatically with exponential backoff. Automatic retry will allow recovering of rate limit errors without missing any data or crashes occurring. This means performing a short sleep when a rate limit has occurred and then retrying an unsuccessful request.

In case, the request still turns out unsuccessful, then the sleep duration will increase and the process will be enabled again. This process will keep going until the request generated is successful or you have hit maximum retry entries.

Through Exponential backoff, your first retry can be re-tried vastly, while benefiting from longer delays at the first reentries.

However, you should ensure you are not raising unsuccessful requests continuously as it does contribute to the per-minute limit.

Request Increase

If you have hit your rate limit and wondering how to initiate an increase in rate limitation then here’s how you can do it:

When Should I Consider Applying for a Rate Limit Increase?

A suitable time to apply for an increase in rate limit is when users have generated a powerful case of essential traffic data which can help support their request for an increase in rate limit.

Users who request a high rate limit with any supporting data are less likely to be accepted compared to the ones with supported data as OpenAI increases rate limits to enable high-traffic applications.

Therefore, if you have a product launch and ensure you have generated all the essential data through the phased release over ten days. It’s important to be patient as sometimes, the rate limit can take around 7 to 10 days.

Will My Rate Limit Increase Request Be Rejected?

Yes, it is possible your rate limit increase request might get rejected due to a lack of data justification on increasing the rate limit.

Therefore, it’s important you present a strong case for a rate limit increase to make sure your request isn’t rejected.

Related Posts: