Squeezing More Value From Low-Cost Coding Plans -- Models and Context

The budget-friendly LLM subscription landscape is changing fast. Chutes, NanoGPT, and Z.ai have all shaken up their pricing or limits in the last few weeks, and if the angry Reddit threads are any indication, a lot of people are unhappy about it.

I think these changes are actually a good thing. They’re pushing us toward better habits: matching the right model to the task, paying attention to context, and using subagents for noisy work. The providers counting every request the same, whether you used a tiny model or the biggest one, was never going to be sustainable. Now that they’re charging based on actual token costs, we all have an incentive to be more mindful.

I’ll walk through what changed, why it matters, and how I’m adjusting my workflow to squeeze more value out of these plans. I’ll also share which combinations of subscriptions I think make sense for different kinds of users.

Let’s look at what’s actually different.

The $3 Chutes plan no longer includes Kimi K2.5, GLM-5, MiniMax M2.5, and Qwen 3.5-397B. You have to bump up to the $10 Chutes plan to use these models. Chutes has also switched from a simple 300 requests per day limit to new monthly and 4-hour limits based on the API prices of the models you call. Your monthly limit is equal to five times the price of your plan, and your limit during your 4-hour rolling window appears to be 1/12 of that. On my $3 plan, that’s $15 monthly and $1.25 every four hours. I’m inferring this ratio from my plan since I can’t see the $10 and $20 tier limits directly, but they should have limits of $50 and $100 worth of tokens per month.

NOTE: I don’t know if this is official, but Chutes has GLM-5-Turbo in their list of models. It is roughly half the price of GLM-5, but it is running in FP4 instead of FP8, so it won’t be as smart. I manually added this model to my opencode.jsonc, and I am able to use it with my $3 Chutes plan.

Z.ai has also introduced weekly limits, and they now deduct three requests from your quota for a single GLM-5 request. NanoGPT has also changed their limit from 30,000 requests per month to 60 million tokens per week.

You can’t just throw every problem at the best model anymore

When Chutes was deducting the same single request from your quota for Kimi K2.5, GLM-5, or the tiny GPT-OSS-120B, it made absolutely no sense to use the smaller models. I know for certain that most of my tasks can be accomplished just fine with the smaller MiniMax M2.5 model, but the providers weren’t giving me much incentive to use it. They’re charging me the same to use GLM-5, so I may as well just use GLM-5 for everything.

This isn’t the case any longer. I can push nearly ten times as many Qwen3-235B tokens than GLM-5 tokens through Chutes now. I’m not currently on the $10 Chutes plan, so I can’t use GLM-5 at the moment, but you get the idea. That is a massive difference in quantity. If Qwen3-235B can do the job, I’m going to use it.

Here’s an example of how I configure models in my opencode.json to work around provider limitations:

  "provider": {
    "nano-gpt": {
      "models": {
        "stepfun-ai/step-3.5-flash": {
          "name": "Step 3.5 Flash",
          "limit": {
            "context": 256000,
            "output": 256000
          }
        },
        "stepfun-ai/step-3.5-flash:thinking": {
          "name": "Step 3.5 Flash Thinking",
          "limit": {
            "context": 256000,
            "output": 256000
          }
        }
      }
    },
    "chutes": {
      "models": {
        "Qwen/Qwen3-235B-A22B-Instruct-2507-TEE": {
          "name": "Qwen3-235B-A22B-Instruct-2507-TEE",
          "top_p": 0.95, "top_k": 20
        }
      }
    }
  }

Step 3.5 Flash isn’t on models.dev for NanoGPT, so I added it to my opencode.json manually. Qwen3-235B Instruct hosted by Chutes didn’t work correctly with OpenCode until I added top_p and top_k settings manually.

I am currently on the $3 Chutes plan, so I don’t have access to the best models today. Right now, I have OpenCode’s plan agent using GLM-5 on my Z.ai coding plan, the build agent using GLM-4.7 on either Chutes or Z.ai, and my explore agent is using GPT-OSS-120B on Chutes.

GLM-5 is smart, GLM-4.7 is cheaper and plenty capable, and GPT-OSS-120B is much faster and nearly free.

NOTE: I am in blogger research mode over here. I am subscribed to too many coding plans. I am flipping between providers way more than necessary, and I am barely touching my limits. I am doing a bad job settling myself into the bare minimum, but I am getting closer. I will probably be pared down to just two plans next month. I hope!

Managing context is so important now!

This part depends on your plan. Z.ai still seems to only be counting requests, so it doesn’t matter if those requests have 10,000, 100,000, or 250,000 tokens. They still count as one request.

Chutes, NanoGPT, OpenCode Black, and OpenCode Go definitely count tokens. OpenAI and Anthropic are less transparent about what consumes your limits.

You can see your currently used context at the top of the OpenCode window. It never looks all that big, but how you are billed might not be intuitive. Every time an API request is sent by OpenCode, the full context is sent up to the provider. I can’t speak for the rest, but Chutes and NanoGPT seem to count both cached and uncached requests the same against your limits.

It is easy to hit 1,000,000 tokens in a dozen prompts as your context is growing to 120,000.

How I am dealing with this new reality?

I have been using OpenCode to set up or update Podman containers. I wanted to try Qwen 3.5 4B with vision on my test machine with a $56 RX 580 GPU. I just popped into OpenCode, gave it the hostname, explained that it can access it via ssh, and had it update my llama.cpp build.

I was curious how much of my new quota this might eat up, so I used GLM-4.7 via Chutes. These are long jobs, because that old machine compiles slowly. It also generates a lot of compiler and Podman output. I watched lots of round trips go by, and OpenCode built up around 80k tokens of context during the process. At Chutes’ pricing for GLM-4.7, 80k tokens of context sent repeatedly added up to 90 cents of my $1.25 4-hour limit.

I was thinking about that overnight, and wound up adjusting my OpenCode Podman agent the next day. I explained that it should run all compile and Podman jobs via a subagent. There’s no need to pollute the primary agent’s context with 1,000 lines of useless compiler output.

My build didn’t go perfectly. My ssh connection timed out, so I had to ask it to try again. Even with an entire extra build, context was at 37k by the end, and the process only used up 18 cents out of my 4-hour window.

I think this is on the extreme end. You can’t cut down your token usage this dramatically with this one simple trick unless your agent is running jobs with lots of compiler output. The improvements will be much less drastic if you’re assigning common programming tasks to subagents, but there will be improvements.

Vibe Coding My Home Assistant Setup — I Can’t Believe How Well This Works!

What AREN’T we talking about here today?

I am not here to defend a giant corporation. I am also not here to throw them under a bus.

I am not going to try to figure out if it is legal to change the terms of an agreement in the middle of an active subscription. I do hope Chutes will be giving refunds to anyone who asks, but I have no way to know if they will be doing this.

For the purposes of this blog post, I am going to assume their intentions are either neutral or good. This would be a very long blog post if we explored whether or not Chutes is evil!

I think the changes at Chutes and NanoGPT are for the better

I also think that Chutes had to rip off the band-aid immediately. If they gave us warning that the change was coming in a month, then every angry customer would be trying to use every ounce of inference available to them until their month ran out. That would ruin performance for the rest of us.

Using requests as a quota was never going to be sustainable. I have lots of long but easy tasks that can be handled by GPT-OSS-120B. That is a cheap model for Chutes to host. They charge a nickel per million input tokens. I used to eat up the same single request whether I was using GPT-OSS-120B or GLM-5. Why use the cheap model when it costs the same to use one of the best models?

This update encourages all of us to choose an appropriate model for the task. When more of us use smaller models, there is more compute available for everyone to use.

Chutes makes more money. We all, hopefully, enjoy a faster experience. We just have to drop down to MiniMax M2.5, GLM-4.7, or another smaller model and stop using Kimi K2.5 and GLM-5 for every single task.

Z.ai is still PROBABLY a good deal, but I can’t use math to prove it

My Z.ai Coding Pro plan is paid up until March of 2027. I am grandfathered in, so my account does not have a weekly limit. On its own, this would make it challenging for me to figure out how much work you can get done for your dollar with a Z.ai plan.

I definitely think you should be looking at Z.ai’s coding plans. They cost more and offer fewer requests than they used to, but speed has been noticeably improving, and their competitors prices have also gone up. Just like with Chutes and NanoGPT, higher prices and tightened limits have probably driven away the heaviest users.

I only have one problem with Z.ai today. They don’t currently offer GLM-5 on their Coding Lite plan. They say it is coming by the end of March. I have no reason to believe that they are lying.

Until GLM-5 arrives on the lower tier plan, you’re better off using Chutes’s $10 plan or NanoGPT’s subscription at around the same price so that you have access to GLM-5, Kimi K2.5, and MiniMax M2.5.

What should an extremely casual OpenCode user choose?

If you find yourself occasionally bumping up against the limits of the free providers, I would say that you can’t go wrong trying out Chutes’s $3 plan. I use GLM-4.7 for all sorts of tasks. I bet you can get away with using GLM-4.7, too, and that extra 38 million tokens per month that you get for your $3 ought to be enough to cover the gaps in your free usage. You don’t have to stop using free tokens just because you are paying for some!

OpenRouter gives you 1,000 requests per day on their free models as long as you load your account up with $10 in credits. They don’t have as many high-end coding models available for free as they used to, but they still have StepFun-3.5-Flash. It is a fast and capable model that punches well above its price point.

I don’t want to get too specific about which models are available for free at each provider, because that changes so often. I do know that OpenCode Go/Zen, Nvidia NIM, and Kilo Gateway all offer a combination Kimi, MiniMax, and GLM models for free to different extents. You can also get $5 in free credits every month at Vercel, and you can use those credits on both open-weight models and proprietary models.

You can use these more advanced free models for your planning agent, then follow up with paid GLM-4.7 tokens for your build agent.

I could most definitely get by doing exactly this, but my budget isn’t that limited. I may not want to spend $20 a month on OpenAI Codex, but I can definitely spend more than $3 a month.

Someone with a big budget might use Claude Opus for planning and Claude Sonnet for building. Someone like me will use GLM-5 for planning and GLM-4.7 or MiniMax M2.5 for building. If you are really trying to save money, you might get away with GLM-4.7 for planning and Qwen3-235B for at least some of your building tasks. I can definitely get by using GLM-4.7 for both planning and building.

A quick note about Codex Plus

I signed up for a $20 Codex Plus subscription just so I could see what all the fuss is about. OpenAI is the frontier lab with the friendliest relationship with OpenCode, so they’re not going to ban me for using my subscription with OpenCode like Anthropic would.

Codex-5.3 is a faster and more capable model than GLM-5 or Kimi K2.5. Even so, I don’t have problems that GLM-5 can’t solve, so paying extra for Codex wasn’t really getting me all that much. I also got half way to my weekly limit on my first day, and that was during their promotional period where the limits were doubled!

The way you are meant to stretch those limits is to use Codex-5.1-Mini, but I did not have good luck with this model. For my uses, GLM-4.7 or GLM-4.6 goof up way less often. I just couldn’t make this work out well. OpenAI needs a model comparable to GLM-4.7 (not just GLM-4.7-Flash) for tasks where Codex-5.3 is overkill. If they had that, Codex Plus would be a much better deal.

What if I DO encounter a problem that GLM-5 can’t solve? Math says that I could get a pretty long session of Codex-5.3 via OpenRouter for around $2. I could have put the $21.28 that I spent on a month of Codex Plus into my OpenRouter account instead, and I would have eight or ten Codex 5.3 sessions that I could use throughout the year.

What should you do if you’re not a professional, but you are a more than just casual?

I like having at least two subscriptions. OpenCode makes it easy to manage multiple subscriptions, and you don’t even have to think about it unless you have WAY too many subscriptions with overlapping models. You can set your plan agent to GLM-5 on one provider, your build agent to MiniMax M2.5 on another provider, and never touch it again.

There is a particular combination that I am interested in. I would combine a $10 OpenCode Go subscription, an $8 NanoGPT subscription, and maybe throw in a $3 Chutes plan. That trio would cost 28 cents less than I paid for a month of OpenAI Codex Plus. Why these three plans in particular?

Model	Chutes $3	Chutes $10	NanoGPT $8	OpenCode Go $10
GLM-5.1		53m		~~48m~~
GLM-5		53m	240m	~~64m~~
Kimi K2.5		111m	240m	~~133m~~
GLM-4.7	38m	125m	240m
MiniMax M2.7			240m	~~153m~~
MiniMax M2.5		167m	240m	~~200m~~
GLM-5 Turbo (FP4)	30m	102m
Devstral-2-123b			240m
StepFun-3.5-Flash			240m
Qwen3-235B-A22B	136m	455m	240m
Qwen-3.5-397B			240m
GPT-OSS-120B-TEE	300m	1,000m

NOTE: *GLM-5 Turbo is quantized to FP4 at Chutes. It is available on their $3 plan, but I am not sure how much that impacts its capabilities over FP8. OpenCode Go is the only provider on the list that charges less for cached tokens, so you will get a lot more use, but the actual number is hard to predict. In practice, I am getting four to six times more tokens from OpenCode Go than I expected because most of my tokens are cached reads!

OpenCode Go now has better pricing than Chutes. Not only that, but I am enjoying my use of the open-source OpenCode program, so I am excited that I am getting a good deal while also supporting the development of a tool that I use every day. That would be a big win for me.

NanoGPT’s subscription offers four or five times as much GLM-5 usage or around twice as much Kimi K2.5 as the $10 plans from OpenCode or Chutes. That is an amazing value, and GLM-5 on NanoGPT has been nice and fast for me over the last week. NanoGPT also offers Devstral 2. I don’t use Devstral 2 often, but it is a unique model, and sometimes it is nice to use a model with a different perspective.

Adding Chutes’s $3 plan almost feels unnecessary. I do use GPT-OSS-120B for my explore agent, and I also use it for some blogging tasks. What I don’t use is anywhere near 300 million GPT-OSS-120B tokens in a month. I don’t think that I need to take the load off the other two subscriptions, but having an inexpensive backup couldn’t hurt.

If you are disappointed with slow speeds and you are regularly bumping into limits on the budget providers, you might want to check out Synthetic.new. I don’t feel like they are a budget-friendly provider anymore because their base plan is now $30 per month. They are not charging by the token, though, and you get 135 requests every five hours with no weekly or monthly limits. The plan costs a bit more than Codex Plus, and the open-weight models available aren’t as good as Codex-5.3, but I’d blow through my weekly $20 Codex Plus limit in a day, whereas I can keep coming back to Synthetic every five hours.

That recommendation is not what I am actually doing!

I feel that I have to mention two important things here. I ~~haven’t tried OpenCode Go~~ have now [tried OpenCode Go]]ocgo, and it is delightful. I am doing my best to not have every coding subscription active at the same time, because I am just not a heavy enough user to even fully utilize one of these plans. I will be subscribing to OpenCode Go when I cancel my Codex and NanoGPT subscriptions over the next two weeks.

I also have enough money in my Z.ai coding account to extend my Coding Pro plan out to 2028. This is because so many of you that are reading these blog posts have clicked my referral link. It is hard to justify paying for OpenCode Go, NanoGPT’s $8 plan, and Chutes’s $3 plan when my Z.ai plan gives me five times more requests every five hours than I can even use. I can’t even use that many requests in a day, and I rarely use that many requests in an entire week.

I am trying all the plans. I am enjoying learning which models work for me. I am figuring out which providers are doing a good job. I also hope that everything I am learning is helpful to you.

I will definitely be paying for a second provider. I don’t want to be stuck without access to Kimi and MiniMax, and I do use some of the lesser models for grammar checking these blog posts.

I would like my number two provider to be OpenCode Go, because I want to support the company. I am also tempted to keep the $3 Chutes plan, but that doesn’t get me Kimi and MiniMax. If I had to decide right now, though, I would be choosing NanoGPT’s $8 subscription as my second provider. My dollars go further with NanoGPT, and they give me access to more models than OpenCode Go.

OpenCode Go Coding Plan From A Light User’s Perspective

Squeezing value doesn’t mean you have to use up every token!

It is feast or famine in my world. I am either spending a few hours working on a project, or days are going by where I only barely touch OpenCode. I am not trying to burn through all 240 million NanoGPT tokens in a month. I just want to make sure I have tokens available when I need them.

If I set OpenCode to GLM-4.7 on my basic $3 Chutes plan, and I just let the context grow, then I will reach my 4-hour limit in 20 minutes. Managing my context and choosing appropriate models will let me stretch that to an hour or more.

I’m not worried that I might burn through my monthly limit before the month is over. I am more concerned that I will have to wait 4 or 5 hours before I can continue working. Putting in a little extra thought in ahead of time means I will be more likely to have enough quota to finish my task in one day instead of having to finish tomorrow.

I don’t need to pull every token that the limits allow, and I hope you don’t feel the need to do so either.

Conclusion

The budget LLM subscription landscape keeps shifting. Chutes has moved to limits based on model pricing, NanoGPT has moved to token-based quotas, and Z.ai has instituted weekly limits. I think this is the right direction. It pushes us toward better habits: matching the models to the tasks, watching our context, and using subagents for noisy jobs. Those better habits will help everyone.

There are still incredible values out there. NanoGPT’s $8 plan gives you a ridiculous amount of GLM-5 and Kimi K2.5. OpenCode Go is competitively priced and seemingly more reliable. Free tiers can fill in the gaps. Pick one or two providers, configure your agents appropriately, and get started building things!

I’m still figuring out the ideal combination for my own workflow, and I’d love to hear what you’ve settled on. Have the Chutes or NanoGPT changes pushed you toward a different provider? Are you sticking with the $3 Chutes plan and dropping down to smaller models, or did you bump up to the $10 tier? Are you going to migrate to OpenCode Go or NanoGPT? Come hang out with us in our Discord community and let’s compare notes. We’re a friendly bunch of homelabbers, tinkerers, and machine learning enthusiasts all trying to squeeze the most value out of these tools.