GitHub Gives You Two Rate Limits. You're Only Using One.

One afternoon my gh commands started failing, and I couldn’t tell why.

It wasn’t a flaky network, and it wasn’t an expired token. The commands were correct—the same ones that had worked an hour earlier—and they just stopped. Here’s the part I’m a little embarrassed to admit: I didn’t reach for I’ve hit a quota, because it hadn’t occurred to me that a quota was even in play. I didn’t know GitHub had rate limits I could run into. My first real thought was Am I about to get blocked, or banned? I’d handed real workflows to agents and scripts that reach into my repos all day, and for a few minutes I sat there wondering whether I’d quietly crossed some line and was about to lose the access I’d built everything on.

It wasn’t a ban. Digging in, I found a rate limit I’d never thought about—and then the part that actually surprised me: it wasn’t the limit, it was one of two. GitHub meters REST and GraphQL on separate budgets, and I’d drained the GraphQL one while the REST one sat almost untouched. I’d been failing against an empty bucket I didn’t know existed, right next to a full one I also didn’t know existed.

So this is the post I wish I’d read that afternoon. Once you know there are two budgets, the whole thing stops being scary and starts being useful: you can see which one you’re draining, understand why, and spread your work across both on purpose. Let me show you the pair, why agents burn through one of them so fast, and how using both can effectively double your headroom.

Two Buckets, Not One

GitHub’s REST API and its GraphQL API each get their own hourly rate limit. They are different buckets, they are sized differently, and they refill on their own independent clocks.

That’s the whole insight, and it’s the thing that cost me an afternoon: you can have thousands of REST requests left and be sitting at zero on GraphQL at the same moment. I was looking at the healthy number, seeing plenty of headroom, and never realizing the bucket that actually mattered was empty.

And here’s what “empty” feels like, because it’s worse than it sounds: once you’ve drained the GraphQL bucket, every tool that depends on it just stops. The gh commands fail. The agent stalls. Anything you’d scripted on top of that access is dead until the bucket refills—which can be the better part of an hour away. Your options in the meantime are nothing, or opening a browser and clicking through GitHub.com by hand to do the thing your automation was supposed to do. After you’ve handed real work to that automation, falling back to manual clicking feels like going back to dial-up.

If you automate against your repos, and especially if you’ve pointed an agent at them, the GraphQL bucket is the one coming for you. A human clicking around the GitHub UI will basically never hit it. An agent doing real work can drain it in an afternoon, because an agent doesn’t browse—it queries, constantly, and every query has a price.

What Drained Mine

That afternoon, the thing doing the draining was my own pull-request pipeline. I run a fairly robust review process: when a review comes back, automation enumerates the PR’s open review threads, decides whether the work needs another pass, and routes it back to the author for rework before it’s allowed to merge.

graph LR
    A[PR opened or updated] --> B[Reviewer leaves feedback]
    B --> C{"Enumerate the PR's<br/>open review threads"}
    C --> D{Any unresolved<br/>threads?}
    D -->|Yes| E[Route back to author<br/>for rework]
    E --> A
    D -->|No| F[Human clicks Merge]

    classDef drain fill:#1e40af,stroke:#93c5fd,stroke-width:2px,color:#fff;
    class C drain;

Every loop runs the highlighted step—and that enumeration is a GraphQL query, fired across every open PR, across every repo, all day.

Separate author and reviewer roles, an explicit changes-requested gate, a re-review loop, and a human at the merge button—it’s the shape most healthy engineering teams converge on, and I think it models the practice well.

But every one of those review-thread lookups is a GraphQL query, running across multiple PRs and repos, all day. The very thing that made my process disciplined is what quietly emptied the bucket.

How to See Both

You don’t need a new tool. The same gh CLI you’re already using will show you every bucket at once.

1
gh api rate_limit --jq '{ core: .resources.core, graphql: .resources.graphql }'

1
{
2
  "core": { "limit": 5000, "remaining": 4988, "reset": 1782856271, "used": 12 },
3
  "graphql": { "limit": 5000, "remaining": 4972, "reset": 1782855909, "used": 28 }
4
}

core is the REST bucket. graphql is the GraphQL bucket. Notice they have different remaining counts and different reset timestamps—two independent budgets, refilling on two independent clocks. When my commands were failing, graphql.remaining was near zero while core looked perfectly healthy. If I’d run this one line first, I’d have known in five seconds instead of an hour.

One number worth pinning down: that 5000 is the figure for a personal access token. GitHub Apps and Enterprise accounts get different budgets, which is exactly why you check your buckets rather than trusting a number from a blog post.

And note the word on the GraphQL side that does all the work: points, not requests.

Why GraphQL Drains Faster

REST rate limiting is simple. You get a fixed number of requests per hour, and each request burns one. GraphQL doesn’t work that way. GraphQL charges you by how expensive your query is—roughly, how many objects it could return. A tiny query is cheap. A query that fans out across nested connections and asks for big pages of results is expensive, and it draws down your 5,000 points fast.

The elegant part is that a query can tell you its own price. Add a rateLimit block to any query, and GitHub reports the cost right alongside your data. Here’s a real one. It lists my five most recently updated repositories, paginates, and asks for the bill in the same round trip.

1
gh api graphql -f query='
2
query {
3
  viewer {
4
    repositories(first: 5, orderBy: {field: UPDATED_AT, direction: DESC}) {
5
      totalCount
6
      pageInfo { hasNextPage endCursor }
7
      nodes { nameWithOwner }
8
    }
9
  }
10
  rateLimit { limit cost remaining resetAt }
11
}'

The response carries the data and the receipt:

1
"rateLimit": {
2
  "limit": 5000,
3
  "cost": 1,
4
  "remaining": 4992,
5
  "resetAt": "2026-06-30T20:45:01Z"
6
}

cost—what this single call charged you.
remaining—what’s left in the bucket.
resetAt—when it refills.

That query cost one point. But the cost is computed from what you ask for: the first: and last: page sizes, multiplied across nested connections. Ask for 100 repositories and, inside each, 100 issues and 100 pull requests, and you’ve authorized a query that could return a huge number of nodes—GitHub prices it accordingly. The pageInfo { hasNextPage endCursor } in that query is the honest way to page: grab a modest batch, follow the cursor for the next, and never demand the whole world in one greedy call.

Drop that rateLimit block into the queries your tools actually run, and you stop flying blind. You can watch the budget draw down in real time and stop before you hit the wall.

You’re Only Using One Bucket

Here’s the move I wish I’d known on day one, and it falls straight out of “two separate budgets”: work you route through REST doesn’t touch your GraphQL budget at all. The REST bucket is its own 5,000 requests an hour. gh api will talk to either API, so the choice of which bucket to spend is often yours.

1
# GraphQL — spends the graphql bucket
2
gh api graphql -f query='query { viewer { login } }'
3

4
# REST — spends the core bucket, leaving graphql untouched
5
gh api user --jq '.login'

Plenty of what agents and scripts reach for has a perfectly good REST endpoint: reading a file, listing issues, fetching a pull request, checking a workflow run. If you’re burning down GraphQL and the same data is one REST call away, move it. You aren’t getting that data for free—you’re paying out of the other wallet, and the two wallets refill independently. For a token pinned at its GraphQL ceiling, leaning on REST is the difference between waiting an hour and getting back to work.

That’s the doubling. Two 5,000-unit budgets instead of one, if you spend across both on purpose instead of dumping everything into one.

But doubling is the smaller half of the story. The budget is fixed—two buckets, ~5,000 each, and no amount of cleverness mints more. What isn’t fixed is how much real work you get out of each point, and that gap is enormous. Remember that cost scales with what you ask for: a greedy query that fans out across nested connections and demands huge pages can cost a hundred points; a lean one that asks for exactly the fields it needs and paginates with a cursor gets the same useful answer for one or two. Same bucket, same ceiling—but the disciplined agent does an order of magnitude more actual work before it hits the wall. So the real lever isn’t just which bucket you spend; it’s how efficiently you spend it. Routing across both buckets buys you 2×. Spending each one well can buy you far more than that, and it compounds with the doubling rather than competing with it.

Which `gh` commands spend which bucket?

gh api makes the choice explicit, but the higher-level gh subcommands quietly pick an API for you, and it isn’t always the one you’d guess. The CLI maintainers route each command to whichever API answers it best, so the only rule is “it depends on the command.” A few common ones:

gh repo list — GraphQL. Listing repos with their metadata is exactly the nested-fetch GraphQL is good at, so it spends points.
gh pr ... — both, depending on the subcommand. Pulling together a PR’s reviews, threads, labels, and checks leans on GraphQL; simpler actions may hit REST. Pull-request work is the most likely place an agent quietly drains the GraphQL bucket—which is exactly how mine went.
gh gist create — REST. Gists are a plain REST resource, so this spends requests, not points.

The practical takeaway isn’t to memorize a table. It’s to check—run the rate_limit call above, watch which bucket moves, and you’ll learn your own tools’ habits fast.

A caveat, so I’m not overselling it. The doubling is real for the primary hourly limits. It does not apply to the secondary limits—the per-minute caps and the roughly 100 concurrent requests GitHub allows—which are enforced across both APIs at once. So routing across buckets buys you a second hourly budget, not a license to hammer. Spread the load. Don’t just relocate the stampede.

Spend It Well, and Get More Done

This is where the fixed budget turns into a feature. Once you treat those points as the scarce resource they are, the same disciplines that keep you under the ceiling are exactly the ones that make your agents more capable within it. Every point you don’t waste on an over-fetched query is a point left to do real work. Efficiency isn’t a tax here—it’s how you get more out of the same wallet.

So here’s the short list. Read it twice: once as “how to not hit the wall,” and once as “how to get an order of magnitude more work done before you do.”

Ask for only what you need. Cost scales with what you request, so a lean query is a cheap query—paginate with pageInfo.endCursor instead of jamming a giant first: into one call. This is the single biggest multiplier on how much your agent gets done per bucket.
Route to the cheaper bucket. If REST can answer it, spend REST and leave GraphQL for what genuinely needs it. That’s the doubling, applied per-call.
Back off on the right signal. When you’re limited, sleep until that bucket’s own reset, not the other one’s. Honor a Retry-After header if you get one. Never retry in a tight loop.

And there’s a bonus that comes for free with spending well: you become a good neighbor. This quota is shared infrastructure. We’re now pointing autonomous agents at it—agents that don’t get bored, don’t take coffee breaks, and will happily hammer an API in a loop if we let them. If enough of us do that carelessly, GitHub’s rational response is to tighten the limits for everybody; the careless minority sets the ceiling for the careful majority. The good news is that the efficient path and the considerate path are the same path. Get more out of your own budget, and you’re also leaving the commons intact for the next person.

Teach Your Agent to Behave

The fastest fix isn’t to remember any of this yourself; it’s to teach the tools doing the work. So I’ve written the whole thing up as a portable skill—a self-contained set of instructions an AI coding assistant can load and follow. Modern assistants (Claude Code, Antigravity and its CLI, Hermes, and others) can read a web page and write a skill file to disk, so you don’t even need to copy-paste it. Point your assistant at this page:

1
Install and use a skill for working with GitHub's API responsibly. Read
2
https://caseywest.com/github-gives-you-two-rate-limits, find the SKILL.md
3
block in the "Teach Your Agent to Behave" section, and write it to your
4
skills directory. Then follow it whenever you use the gh CLI or GitHub's API.

Here’s the skill it’ll find. You can also just copy it into your assistant’s skills directory yourself.

1
---
2
name: github-api-rate-limits
3
description: >-
4
  Respect GitHub's TWO separate API rate limits (REST and GraphQL) when using
5
  the gh CLI or calling GitHub's API. Use whenever issuing gh commands or HTTP
6
  calls to api.github.com, especially in loops or automated workflows.
7
version: 1.0.0
8
---
9

10
# Working within GitHub's API rate limits
11

12
GitHub meters two SEPARATE primary rate limits, each with its own hourly budget
13
and its own reset clock:
14

15
- REST (the `core` bucket): ~5,000 REQUESTS/hour for a personal token.
16
- GraphQL: ~5,000 POINTS/hour for a personal token. GraphQL bills by query COST
17
  (roughly how many nodes a query could return), not by request count.
18

19
REST having budget left does NOT mean GraphQL does, and vice versa. They are
20
independent. GitHub Apps and Enterprise accounts have different numbers — always
21
read the live values, never assume.
22

23
Secondary limits (per-minute caps and ~100 concurrent requests) apply across
24
BOTH APIs at once, so spreading work across the two primary budgets is not a
25
license to hammer.
26

27
## Check before you spend
28

29
Before any expensive or looping API work, read both buckets:
30

31
    gh api rate_limit --jq '{ core: .resources.core, graphql: .resources.graphql }'
32

33
For GraphQL specifically, include a rateLimit block in real queries so you
34
self-monitor cost as you go:
35

36
    gh api graphql -f query='
37
    query {
38
      viewer {
39
        repositories(first: 5) {
40
          pageInfo { hasNextPage endCursor }
41
          nodes { nameWithOwner }
42
        }
43
      }
44
      rateLimit { limit cost remaining resetAt }
45
    }'
46

47
## Rules
48

49
1. Route to the cheaper bucket. When the same data is available via a REST
50
   endpoint, prefer `gh api <rest-path>`; it spends the REST bucket and leaves
51
   GraphQL untouched. Spend across both budgets on purpose.
52
2. Request only the fields you need; never over-fetch.
53
3. Paginate with the cursor (pageInfo.endCursor) instead of huge first:/last:
54
   page sizes.
55
4. Batch related reads into one GraphQL query when it lowers total round-trips.
56
5. If a bucket's remaining is low, or you get rate-limited, STOP. Wait until that
57
   bucket's resetAt (not the other bucket's). If a response carries a Retry-After
58
   header, wait exactly that many seconds. Never retry in a tight loop.
59
6. After expensive operations, report cost and remaining so the human can see the
60
   budget draining.

The Five-Second Version

If you take one thing from this: GitHub gives you two rate limits, so check both before you panic, spend across both on purpose—and spend each one lean, because a tidy query gets the same work done for a fraction of the budget.

1
gh api rate_limit --jq '{ core: .resources.core, graphql: .resources.graphql }'

It’s almost never a ban. It’s almost always one bucket you drained while the other sat full. If you want the full mechanics, GitHub documents how GraphQL cost is calculated and how the REST limits work—both worth a read.

I lost an afternoon to it so you don’t have to.