OpenCode with Local LLMs – Can a 16 GB GPU Compete With The Cloud?

There was a post on Hacker News yesterday about ByteShape’s success running Qwen 30B A3B on a Raspberry Pi with 16 gigabytes of RAM. I wondered if their quantization was really better. I had tried fitting a quant of Qwen Coder 30B A3B on my Radeon 9700 XT GPU shortly after I installed it, but I didn’t have much luck with OpenCode. The largest quant I could fit didn’t leave enough room for OpenCode’s context, and it wasn’t smart enough to correctly apply changes to my code most of the time.

AI Pat Talking To The Cloud and Local LLM

I am going to tell you the good news up front here. I was able to fit ByteShape’s Qwen 30B A3B Q3_K_S onto my GPU, and llama-bench gave me better than 200 tokens per second for prompt processing and 50 tokens per second generation speed with 48,000 tokens of context.

That is enough speed to be useful, especially since it is almost three times faster in the early parts of an OpenCode session when I am under 16,000 tokens of context. It isn’t a complete dope. It correctly analyzed some simple code. It was able to figure out how to make a simple change. It was even able to apply the change correctly.

OpenCode with ByteShape’s quant isn’t even close to being in the same league as the models you get with a Claude Code or Google Antigravity subscription. When my new GPU arrived, I couldn’t find a model that fit on my GPU, could create usable code, and consistently generate diffs that could consistently be applied using OpenCode. Two months later, and all of this is at least possible!

Will I be canceling my $6 per month Z.ai coding subscription?

Definitely not. First of all, this isn’t even Qwen Coder 30B A3B. There is no ByteShape quant of the coding model. Even if there were, unquantized Qwen Coder 30B A3B is way behind the capabilities of Z.ai’s relatively massive GLM-4.7 at 358B A32B.

My local copy of Qwen 30B A3B does feel roughly as fast as my Z.ai subscription when the context is minuscule, but my Z.ai subscription doesn’t slow down when the context pushes past 80,000 tokens. My GPU doesn’t have enough VRAM to get there with a 30B model, and it would be glacially slow if it could.

Not only that, but my GPU cost me more than $600. Is it worthwhile tying up my VRAM, eating extra power, and heating up my home office when the price of that GPU could pay for 100 months of virtually unlimited tokens from Z.ai?

I am certain that someone reading this has a good reason to keep their data out of the cloud, but it is a no-brainer for me to continue to use Z.ai.

You don’t need a $650 16 GB Radeon 9700 XT to use this model

If it runs on my GPU, then it will run just as easily on a $380 Radeon RX 9060 XT. It will probably run at around half the speed, but it will definitely run, and half speed might still feel fast for some use cases.

This model will also run on inexpensive used enterprise GPUs like the 16 GB Radeon Instinct Mi50. These have nearly double the memory bandwidth of my 9070 XT, yet they sell on eBay for half the price of a 16 GB 9060 XT. The Mi50 has less compute horsepower, and it is harder to get a good ROCm and llama.cpp environment up and running for these older cards, but it can definitely be done. This is a cheap way to add an LLM to your environment if you have an empty PCIe slot somewhere in your existing homelab!

llama-bench of ByteShape Qwen 30B A3B on my 9070 XT

I would expect that you’d get good performance on a Mac Mini, but I can’t test that. You can buy an M4 Mac Mini with 24 GB of RAM for $999. That is enough RAM for your operating system, some extra programs, and it would easily hold ByteShape’s quant of Qwen 30B A3B with way more than the 48,000 tokens of context that I can fit in 16 GB of VRAM.

ByteShape has me more excited about the future!

I don’t even mean all that far in the future. When they came out, 7B models were nearly useless. A year later, and those tiny models were almost as good as the state-of-the-art models from the previous year.

You couldn’t fit a viable coding model on a 16 GB GPU six months ago. Now I can get OpenCode and my GPU to easily create and apply a simple code change. This is a fancy quant, but it isn’t a quant of what is currently the best coding model of its size. There’s room for improvement there.

Not only do smaller models keep getting better every six months or so, but the minimum parameter count for a useful model seems to keep dropping.

I wouldn’t be the least bit surprised if Qwen’s 30B model is as capable at the end of the year as their 80B is today. I also wouldn’t be surprised if someone comes up with a way to squeeze a little more juice out of a slightly tighter quant during the next 12 months.

I’ve tested the full FP16 versions of Qwen Coder 30B and 80B using OpenRouter, and the larger model is noticeably more capable even with my simple tasks. Once again, I wouldn’t be surprised if we’ll be able to cram a model that is nearly as capable as GLM-4.7 into 16 GB of VRAM by the time the calendar flips over to 2027.

The massive LLMs won’t be standing still

It won’t just be these small models that are improving. Claude, GPT, and GLM will also be making progress. They’ll be taking advantage of the same improvements that help us run a capable model in 16 GB of VRAM.

Just because you can run a capable coding model at home doesn’t mean that you should. The best coding model twelve months ago was Claude Sonnet 4. You’d be at a huge disadvantage if you were running that model today instead of GLM-4.7, GPT Codex, or Claude Opus. Just like you’ll be massively behind the curve if you’re running a 30B model in 2027 while trying to compete with the speed and capabilities of tomorrow’s cloud models.

Buying hardware today in the hope that tomorrow’s models will be better isn’t a great plan. There is no guarantee that Qwen will continue to target 30B models. I wouldn’t have been able to write this blog post if the current Qwen model was 32B or 34B, because it just wouldn’t have fit on my GPU.

This is exciting for more than just OpenCode!

I was delighted with some of my experiments with llama.cpp when my Radeon 9070 XT arrived. I tried a handful of models, and I learned that I could easily fit Gemma 3 4B along with its vision component and 4,000 tokens of context into significantly less than 8 gigabytes of VRAM.

Why is that cool? That means we ought to be able to fit a reasonably capable LLM with vision capabilities, a speech-to-text model, and a text-to-speech model on a single Radeon RX 580 GPU that you can find on eBay for around $75. That would be a fantastic, fast, and inexpensive core for a potential Home Assistant Voice setup.

The trouble is that Gemma 3 4B didn’t work well in my test when it needed to call tools, at least with OpenCode.

ByteShape’s Qwen 30B A3B can call tools. Home Assistant wouldn’t need 48,000 tokens of context, so that ought to free up plenty of room for speech-to-text and text-to-speech models.

I tried to test this model on my 32 gigabyte Ryzen 6800H gaming mini PC

I thought about leaving this section out, but including it might encourage me to take another stab at this sometime after publishing.

I thought my living-room gaming mini PC would be a good stand in for a mid-range developer laptop. Having 32 gigabytes of RAM is plenty of room for 100,000 tokens of context with ByteShape’s Qwen quant, and there’d be plenty of room left over for an IDE, OpenCode, and a bunch of browser tabs.

My 6800H gaming mini PC

I copied my ROCm Distrobox container over to my mini PC, and I got ROCm and llama.cpp compiled and installed for what seems to be the correct GPU backend. I am able to run llama-bench with the CPU, but that is ridiculously slow. When I try to use the GPU it SEEMS to be running, because the GPU utilization sticks at 100%, but tons of time goes by without any benchmark results.

I found some 6800H benchmarks on Reddit while I was waiting, and they aren’t encouraging. They say 150 tokens per second prompt processing speed with the default of 4,000 tokens of context. That’s what my 9070 XT manages at 48,000 tokens of context with the ByteShape model. I’d expect to see something more like 20 tokens per second on the 6800H at 48,000 tokens of context.

I would consider my 9070 XT to be just barely on this side of usable. The 6800H wouldn’t be fun to use with OpenCode.

So where does that leave us?

So here’s where we stand at the start of 2026. If you have a reasonable 16 GB GPU sitting in your home office, you can actually run a competent coding assistant locally. This isn’t just in theory either. The speeds feel responsive enough to use. That’s real progress, and ByteShape’s aggressive quantization deserves credit for pushing the boundaries of what fits on consumer hardware.

At the same time, let’s not kid ourselves: my $600 GPU delivers an experience that’s still slower, so much less capable, and significantly more expensive per token than what I get from a $6 monthly cloud subscription. The exciting part isn’t that local models have caught up, because they haven’t! However, that the gap is narrowing at a pace that would have seemed unlikely a year ago.

Whether that matters for your use case depends entirely on whether you value data privacy, offline access, or just the sheer satisfaction of running this stuff on your own silicon. For me, it’s a “both/and” situation: I’ll keep paying for Z.ai because it’s objectively better, but I’ll also keep tinkering with local models because watching this space evolve is half the fun.

If you’re experimenting with local LLMs too, or you’re just curious about what’s possible, I’d love to hear about your setup. Come join our Discord community and share what hardware you’re using, what models you’re running, and what’s working (or not working) for you. The more we learn from each other, the faster we’ll all figure out the sweet spots in this rapidly evolving landscape.

Bazzite On My Workstation – Five Weeks Later

It has been five weeks since I wiped my NVMe drive and installed Bazzite on my desktop workstation. Why five weeks? I figured a blog post about this would be a good way to wrap up 2025! When I wrote about my initial experience with Bazzite, I was only a few days into the migration. I’ve been using the machine daily since then, and I’m far enough in to provide a meaningful retrospective.

My Desk Setup With a Gaming TV

If you’ve been following along, you know that I spent months thinking about this switch. I first considered running Bazzite back in July, tested it on my laptop, ran it on a mini PC in the living room, and finally made the leap to replace my long-running Ubuntu installation. I’m not going to retell the entire story here, but I want to acknowledge that this wasn’t a spur-of-the-moment decision.

I’ll revisit some of the positives I mentioned earlier, but I also want to highlight a couple of unexpected challenges I had to work around!

The short answer: I’m happy

If you want the TL;DR version: I don’t regret switching to Bazzite, and I have no plans to go back to Ubuntu. The switch has been mostly positive, with a few annoyances that I’ve either worked around or learned to live with.

The biggest win has been having current Mesa libraries and AMDGPU drivers available without any effort on my part, and knowing that I will be brought to the cutting edge of Mesa’s ray-tracing performance every six months or so. My Radeon 9070 XT works flawlessly, and I know I’ll be ready for whatever AMD releases next. The gaming experience has been smooth, and I’ve been able to focus on playing games rather than tinkering with drivers.

The immutable nature of Bazzite has only gotten in the way twice, but I have been able to work around it. Almost everything I need is either installed via Flatpak or running inside one of my Distrobox containers.

Gaming performance with the 9070 XT

Gaming has been fantastic under Bazzite, but that was to be expected. One thing I didn’t realize about Bazzite is that it lets you run the command ujust install _mesa-git. This installs a nightly build of Mesa in your home directory and sets up a mesa-git wrapper script for you.

Why is this exciting? Mesa has some fantastic ray-tracing performance improvements that haven’t made it into a release yet. Using mesa-git gave me a bump from 65 to 70 FPS in Control when ray tracing is set to high, and it gave me a massive boost from 35 to 70 FPS when using ray tracing in Spider-Man 2!

Most of my last couple of weeks have been spent playing Arc Raiders. I can enable FSR4 in Arc Raiders with PROTON_FSR4_UPGRADE=1, which improves visuals over FSR3.

I have not managed to puzzle out a correct incantation to use Proton-GE’s easy FSR4 upgrade alongside mesa-git. I was able to mod FSR4 into Control using OptiScaler, but I can’t do that with a multiplayer game with anticheat like Arc Raiders.

Switching to Podman was more effort than I expected!

I was all set to fire up my two or three docker-compose.yaml containers using Podman, but Bazzite doesn’t ship with podman-compose. I believe podman-compose is the only thing I have installed manually using rpm-ostree.

This worked great for my first container, but my most important container runs OpenVPN, so it needed privileges that I just couldn’t assign using podman-compose, so I wound up being lazy. I had OpenCode (my AI coding assistant) convert my docker-compose.yaml to a Podman command line for me, and I am running that using sudo.

The containers I run on my workstation should be running in my homelab. I should just make the slightest effort to move them, and I’ll probably do that early next year. I would have skipped installing podman-compose if I had known that 50% of my containers wouldn’t easily work using podman-compose. I could have just converted the other one to a command line as well!

I didn’t even consider my thermal label printer!

Bazzite ships with CUPS. My cheap thermal printer works with CUPS. It should have been easy to get it working, but it was anything but!

I can install the PPD file, but my thermal printer needs a filter binary, and the CUPS filter directory is read-only. There is no easy and clean way to make this work.

I hemmed and hawed for a day, then I decided the easiest solution was disabling CUPS on the host and setting up a Podman container specifically for the thermal printer. I would document exactly how I did that, but I’m not sure I did it in a way anyone should replicate.

Why did I opt to use a container? I figured this container could easily follow my thermal printer through Bazzite upgrades. I could also move the printer and container to one of the mini PCs in my homelab in the future.

Setting up lvmcache went smoothly!

Mostly. I set up my pair of bulk-storage volumes in stages, starting with the basic, uncached volumes and moving data into place. I added them to fstab with noauto set so they wouldn’t mount automatically. I didn’t want to troubleshoot during a reboot if I made a mistake, which turned out to be a good decision. I had accidentally put a twelve-crypt instead of a crypt-twelve in either the fstab or crypttab at one point. That definitely wouldn’t have booted!

LUKS encryption was happy. The lvmcache on both my NVMe and SATA SSD were both happy. My data was happy. It was easy to flip the noauto to auto in my fstab, and everything has been chugging along ever since.

lvmcache-statistics

I was running with a 600 gigabyte root filesystem and a 300 gigabyte lvmcache on my previous install, but I flipped that around this time. I did this because it should be easy to move some of my ever-growing and poorly maintained directories, like ~/Downloads, to one of the big cached volumes.

Bazzite is fairly locked down. Flatpak programs get grumpy if I move something out of my home directory and connect it back up with a symlink. I assume that I will have to attack this problem with bind mounts, but I haven’t gotten to the point where I need to do that yet!

My Home Assistant shenanigans easier than I expected!

I thought it was going to be a pain to install hass-cli. It wouldn’t be a big deal if I had to run it in Distrobox, but I wanted to get and set Home Assistant variables from scripts that run directly on Bazzite. I was excited to see that homeassistant-cli was available in the Brew package manager’s preconfigured repositories!

I am having no trouble using hass-cli to fetch the state of my espresso machine and office lighting to put the correct color indications on the appropriate macropad buttons on my little Mission Control macro pad at my desk. I have been using JC Pro Macro 2 mechanical keypads for years to control various aspects of my workflow, including integrating with Home Assistant to control studio lighting.

I am not certain how I installed hacompanion! I didn’t write it down!

There is a static hacompanion binary in my ~/.local/bin/ directory, and there is a static binary in their GitHub releases. I bet I downloaded that and dropped it in place!

Home Assistant knows when I am done using my computer, so the lights turn off automatically. I have keys configured on my macro pad to switch between just my monitor, just the TV as the display, or both at the same time. My scripts are able to reach out to Home Assistant to turn on the TV and select the correct HDMI input, and it is able to use kscreen-doctor to configure the appropriate outputs on the GPU.

Cheater automatic Bluetooth headphone switching with some vibe coding?!

I’ve been using the gaming version of the Bose QC30 headphones for the last five years. I use them wired, because there is less latency and I don’t have to remember to charge them. I had no way for the computer to detect whether or not I had the headphones on, so I had to switch audio outputs with a key bind.

My Bose headphones went on a plane trip with my wife, because that’s where the noise canceling shines, so I’ve been limping along using an older set of AKG NC70 headphones. My limping made me impulse by a set of Anker Q20i wireless headphones that were on sale for $35.

I have nothing but nice things to say about the Anker headphones. I haven’t been able to compare them to the older Bose headphones back-to-back, but Anker is obviously trying to imitate Bose here. They look similar. They have the same soft pleather earcups. They have similar features.

The budget noise canceling definitely gives older Bose headphones a run for their money. The Anker headphones are only 10% of the price of the current iteration of my Bose headphones, and they punch way above their price point. I’d buy these Anker headphones again in a heartbeat.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#!/bin/bash

if command -v wpctl &> /dev/null; then
    WPCTL="wpctl"
else
    WPCTL="distrobox-host-exec wpctl"
fi
HEADSET_NAME="soundcore Q20i"
SLEEP_TIME=2

prev_sink=""
headset_connected=false

echo "Adjusting default wpctl settings"

wpctl settings linking.pause-playback false

echo "Done adjusting wpctl settings"

cleanup() {
    echo "Reverting to default wpctl settings"
    wpctl settings linking.pause-playback true
    echo "Done reverting wpctl settings"
    exit
}

trap cleanup EXIT INT TERM

echo "Starting to watch for headset: $HEADSET_NAME"

while true; do
    status=$($WPCTL status)
    
    # Find the currently active sink (marked with *)
    current_sink=$(echo "$status" | awk '/Sinks:/,/Sources:/' | awk '/\*/ { print $3; exit }')
    
    # Check if headset is connected
    headset_sink=$(echo "$status" | awk '/Sinks:/,/Sources:/' | awk -v name="$HEADSET_NAME" '$0 ~ name { print $2; exit }')
    
    if [ -n "$headset_sink" ] && [ "$headset_connected" = false ]; then
        # Headset just connected
        prev_sink="$current_sink"
        echo "Headset connected. Previous sink: $prev_sink"
        $WPCTL set-default "$headset_sink"
        echo "Switched to headset sink: $headset_sink"
        headset_connected=true
    elif [ -z "$headset_sink" ] && [ "$headset_connected" = true ]; then
        # Headset just disconnected
        if [ -n "$prev_sink" ]; then
            echo "Headset disconnected. Switching back to sink: $prev_sink"
            $WPCTL set-default "$prev_sink"
        fi
        headset_connected=false
    fi
    
    sleep $SLEEP_TIME
done

What’s the trouble? Bazzite doesn’t automatically switch to the wireless headphones when they connect, and it doesn’t switch back to the speakers when they disconnect. I found two suggestions on the Internet, and I didn’t manage to get either to work correctly, so I asked [OpenCode and Z.ai][zai] to write me a little daemon script to automatically swap inputs when the headphones are connected.

The ROCm Distrobox experiment

I mentioned in my first post that I had set up a ROCm-enabled Distrobox container and run some benchmarks with llama.cpp. I haven’t done a lot with it since then, but the container is there and it works.

I have played around with a few different models in llama.cpp, and the performance has been right where I expected. My 9070 XT with 16 GB of VRAM is plenty for the smaller models I’ve been testing, and the prompt processing and token generation speeds are snappy. I haven’t found a practical use for local LLMs in my day-to-day workflow yet, but it’s nice to know the capability is there when I need it. I did write about whether local LLMs make sense versus using cloud services and my experience deciding not to buy a Radeon Instinct Mi50 before I upgraded to my 9070 XT.

Setting up the ROCm container was straightforward thanks to the guides I found, and I haven’t had to touch it since. It’s one of those things that just works in the background until I need it. I have been able to grab new models as they are released to give them a try.

Gemma 3 4B with its vision model and a ton of context fits well and runs great in 8 gigabytes of VRAM. In fact, I am wondering if I could squeeze that model, a speech-to-text model, and a text-to-speech model into 8 gigabytes. There are a few older 8 gigabyte gaming GPUs available on eBay for under $100. That would be a neat way to run a voice assistant for the house, wouldn’t it?!

Daily workflow and productivity

My daily workflow is largely unchanged from what it was on Ubuntu, but it’s happening in different places.

The browser is a Flatpak, Steam is native on the host, and Resolve is in its own Distrobox container.

My actual work of writing, coding, and general productivity all happens in a single Debian Distrobox container. Emacs is there, my zsh configuration is there, my dotfiles are there, and OpenCode runs in there. It feels like home.

The split between host and container has been cleaner than I expected. I worried that I would constantly be context-switching and thinking about whether I should be installing something in the container or on the host.

I’ve been getting real work done this whole time. My blog is getting written, code is being written, and I haven’t felt like the migration has slowed me down. That’s the real test – can I still get stuff done? The answer is yes. I wrote earlier about how Bazzite uses Distrobox to containerize things like DaVinci Resolve, and that approach is working well for me.

I have gotten into some trouble when asking OpenCode to write a helper script that is meant to run on Bazzite, because OpenCode is running in a Distrobox container. I have learned that I can explain to the LLM that it needs to run commands with distrobox-host-exec when they are not found, and it usually manages to work things out.

What I’ve learned

Five weeks isn’t enough time to declare victory or declare defeat, but it’s enough time to learn some things.

Immutable Linux isn’t as scary. The idea of not being able to install packages freely on the host system felt restrictive, but I’ve adapted. The Distrobox integration is good enough that I don’t feel limited.

The containerization has benefits beyond what I expected. I can update my Debian container without worrying about breaking Resolve or my ROCM container. I can blow away a container and recreate it in minutes if I need to. I can also duplicate Distrobox containers locally, or I can export them to use on my laptop.

I haven’t used Bazzite long enough for its strengths to really shine. I have gotten myself accidentally stuck on an aging Ubuntu release in the past. Sometimes the timing of a major upgrade is bad, so I hold off, but then don’t manage to get around to it. Sometimes a major Ubuntu upgrade will cause headaches, so I put it off.

Almost every single thing that I have customized won’t be touched by a Bazzite system upgrade. My Distrobox environments are independent. My Flatpak apps don’t care what operating system is running on the host. Upgrading to major Bazzite releases should feel quite seamless, and I am excited about that.

Conclusion

Five weeks isn’t forever, but it’s enough time to know whether a migration was a mistake. Switching to Bazzite wasn’t a mistake.

There have been annoyances, and there are still rough edges (like figuring out how to make my thermal printer work). My setup isn’t perfect. But overall, I’m happier than I was on Ubuntu. The computer works, the games play, the videos edit, and I’m getting my work done.

I still have things to configure. I need to move my workstation containers to the homelab, and I haven’t set up all my cron jobs and automation yet. I’ll probably discover missing software for months. That’s normal – every new OS install has a period of discovery.

But the foundation is solid. Bazzite + Distrobox + Flatpak is working for me, and I’m looking forward to years of stability with minimal maintenance.

If you’re on the fence about trying an immutable distro, I’d say give it a shot. Maybe start with a laptop or a secondary machine like I did. Set up a Distrobox container with your comfort distro and use it for a while. You might find that you don’t miss the old way of doing things as much as you thought you would.

Are you using an immutable Linux distribution like Bazzite? How has your experience been? Or are you on the fence about making the switch yourself? I’d love to hear about your setup, the challenges you’ve faced, and what you’ve discovered along the way. If you’re interested in chatting about immutable Linux, gaming on Linux, homelab setups, or machine learning with AMD GPUs, come join our Discord community! We’d love to hear your stories and help you on your own Linux journey.

Devstral with Vibe CLI vs. OpenCode: AI Coding Tools for Casual Programmers

I am not sure this is going to be as direct of a comparison as the title implies. I am not a scientist. I don’t plan to concoct an experiment to test both tools and models against the same task. There are already benchmarks out there, and I don’t think they matter all that much in real life. What do I want to know? Which tools and models FEEL better to use.

I got curious about this almost immediately after Devstral 2 was released. As I am writing this, Mistral is offering free tokens for what seems like nearly unlimited use of their new Vibe-CLI tool. You can also pay for Devstral 2 tokens on OpenRouter, and they are quite inexpensive. Inexpensive enough that I might have paid less by the token for Devstral 2 had I used it instead of my $3 Z.ai Coding Plan. Maybe.

AI Image of Pat with his Robots

Devstral 2 is a newer model than GLM-4.6, so that gives Mistral a potential edge over Z.ai. Devstral 2 is only a 123B model, while GLM-4.6 is a 355B model. Being three times as big is a huge advantage!

Either model comes in way behind Claude Opus in this race, but both models are much cheaper and at least somewhat faster than a Claude Code subscription.

NOTE: When I tried Devstral 2, GLM-4.6 was Z.ai’s latest model. They released GLM-4.7 while I was putting the finishing touches on this post.

Who is this blog post for?

It is for people like me. I don’t write code 40 hours per week. That isn’t my job. I have been firing up OpenCode to help me bang out a small coding task roughly once every two or three days. I might be firing it up more often than I need to because my Z.ai subscription is new, shiny, and fun.

I don’t write code often enough to justify paying $20 per month for a Claude Pro subscription, and I certainly don’t code enough to justify $100 per month for Claude Max!

Maybe you write code as occasionally as I do. Maybe you use an LLM to help you configure things like Proxmox, Jellyfin, and nginx in your homelab. Maybe you have a $100 Claude Max subscription at work, but you need something to fill in the gap for your occasional coding needs at home.

I definitely believe that a $6 Z.ai subscription was a no-brainer when I wrote that blog post two weeks ago. Maybe paying by the token for Devstral will wind up being nearly as good, a little faster, and manage to cost even less.

Go try this vibe coding stuff while it is free!

It looks like Devstral 2 is going to be free to use for the entire month of December. Google’s Gemini-CLI allows 60 requests per hour against their API for free. Qwen-Code can be used with Qwen’s API for free. There are a lot of free ways to use agentic coding interfaces, and they’re not expiring at the end of the year.

I am sure there are other ways of testing out or even regularly using these sorts of coding tools completely for free. Don’t forget that free things are almost always free for a reason! Mistral is currently free to get you hooked. Other API’s are free when you agree to let them use your data for training.

You might also want to consider where your data is going. I previously talked about how my Z.ai subscription is served from China, and your ethics might not line up with that. This is also true of Alibaba’s Qwen service.

Maybe you would feel better paying a little more for Devstral 2 knowing that they are a French company and their servers are in Europe. Maybe you’d prefer to pay a massive company like Google that is based in the United States.

Thoughts on Vibe and Devstral 2 from a shadetree programmer

I have a lovely JC Pro Macro Pad on my desk. Nearly everything that I use this macro pad for needed to be redone when I migrated from Ubuntu to Bazzite on my workstation a few weeks ago. This is my Mission Control macro pad, and the keys usually light up in a way that indicates the state of the action. The headphone toggle turns red when the speakers are active, and my espresso machine button turns blue when Home Assistant thinks the espresso machine has cooled down.

I needed a new way to control the state of the LEDs. The Arduino gets grumpy if too many processes try to write to the serial port at the same time, so I had OpenCode with Z.ai write me a pair of scripts. One creates and watches a fifo for new commands that the Arduino already understands, and it ships those over the serial port as they come on.

The other script is called macroled, and it has the simple job of converting English color names to RGB values then writing the appropriate commands to the fifo.

1
2
3
4
5
pat@zaphod:~$ macroled 1 red
Set LED 100 to color red (150 0 0)
pat@zaphod:~$ macroled 2 orange
Set LED 102 to color orange (150 150 0)
pat@zaphod:~$ 

I installed Vibe-CLI today, and I asked it to create another script. This one watches my Radeon 9070 XT GPU’s wattage. If the wattage cap is set to 250 or below, it sets the macro pad key to blue. If the cap is over 250, the key turns red. Blue for cool and quiet, red when full power is available. It adds more green to the mix as actual power consumption rises.

I’m not entirely happy with how these colors wind up mixing, but this gives me a visual indicator of both my maximum available GPU performance and how hard I am hitting the GPU.

Devstral 2 did a fantastic job here. We had to go back and forth several times. I decided that I didn’t like the color getting diluted when the GPU was only using 20 or 30 watts, so I asked Vibe to only mix in the green when the GPU goes above 50 watts. I also went back and forth a couple of times swapping colors around and changing maximum brightness.

This was a small task, but small tasks are what I usually need to work on. Vibe and Devstral 2 did as good of a job here as I would expect to see from OpenCode and GLM-4.6.

Is vibe coding OK? How do you define it?

The early uses of vibe coding seemed to be used in a derogatory manner towards non-programmers producing LLM-generated code that they didn’t understand. It doesn’t feel derogatory any longer, and it seems to encompass a wider variety of processes.

AI Image of Pat with a Cow talking to a Robot

I have seen quite a few attempts at definitions, but nobody seems to agree on the boundaries. I personally decided that I feel that I am vibe coding when I don’t touch the code in a text editor. I take a peek at most of the shell scripts that OpenCode spits using cat or less, but I almost never open them in Emacs. I am being a little safer by making sure there are no sneaky rm -rf commands in there, but I’m not changing anything.

I think that counts as vibe coding.

OpenCode and Vibe write better shell scripts than I do!

Listen. I can write a good shell script. The fact is, though, that I usually don’t. I hack something together that works. I might sneak in some error checking around the areas that were causing me problems while attempting to get things to work correctly, but most of the short scripts that I write have nearly zero good error checking.

The vibe coded scripts are more likely to break things down into functions. They’re more likely to check for error codes. They’re more likely to stop and let you know why something didn’t work right when you run them. The vibe-coding machine does a MUCH better job of making sure the scripts output extra text to make sure you can see what is going on as they run.

Is my script going into production on a server? I will put in the effort to do all these things and more. Is the script just setting the color of an LED on my macro pad? I will leave all of this out. The robots will beat me here every time.

Is the Z.ai Coding Lite Plan still a no-brainer?!

I almost had to guess at this, but Z.ai just added a usage view to their subscription dashboard. I have used 26.7 million tokens in the last 30 days.

Devstral 2 is currently free, and has done a good job for me here, but what about in January when it isn’t free? I see in my OpenRouter account that paid Devstral 2 would cost $0.05 per million input tokens and $0.22 per million output tokens. Assuming Devstral 2 would have matched GLM-4.6 on token count, which is a MASSIVE assumption, I would have paid $1.33 for my input tokens at OpenRouter. I think it is safe to round up to a million output tokens and say I would have paid $0.60 in that direction.

That adds up to a bit less than the $3 that I paid, because of the half price deal, for my first month on my Z.ai Coding Lite plan.

That isn’t QUITE a no-brainer anymore, right?! I’m happy with what I’ve paid for. GLM-4.6 is a more powerful model. I suspect there will be jobs that GLM-4.6 can easily handle where Devstral 2 might fail, and $3 per month isn’t a lot of money. Not only that, but so far my usage is trending upwards.

A part-timer could probably use free API services and tools for the foreseeable future

Everyone wants your money. They all want you to subscribe. They all want to get you hooked on their tool and model.

I suspect that one company or another will have a free coding plan for the next year or two. Alibaba wants you to use Qwen. Google wants you to use Gemini. Mistral wants you to use their new Vibe tool with Devstral 2. If you have a good experience while it is free, then you might become a paying customer when it isn’t. You might even use their models in the programs you’re writing.

I completely understand this. Even with my light use, I have gotten used to OpenCode and GLM-4.6. Devstral was easy to work with using Vibe, but everything felt a little weird. I don’t mind paying $36 or $72 for the year knowing that I will be able to use my preferred tools for the next 12 months.

That is currently MY preferred tool. You might like Vibe, Qwen-Code, or Claude Code more than OpenCode. You can probably slot Z.ai’s subscription into Vibe or Qwen-Code like you can with Claude Code or OpenCode, but maybe it isn’t just the tool you like. Maybe you feel more comfortable working with the Devstral or Qwen Coder models. That’s fine. You should try everything!

Maybe you shouldn’t limit yourself to just one model!

If you are an occasional user like myself, I think it is just fine to lock yourself in to a single model for 3, 6, or even 12 months. Especially if the price is right.

What if you actually are an extremely heavy user? Should you just spend $200 every month on the biggest Claude Max subscription? Maybe not!

I just learned about the oh-my-opencode plugin for OpenCode. It is extremely opinionated and absolutely bananas! It is preconfigured to call the best-suited model for each task. It uses GPT-5.2 for design and debugging, Gemini 3 Pro for frontend development, Claude Sonnet 4.5 for documentation and codebase exploration, and Grok Code for fast codebase explortation. That is at least three different APIs or subscriptions.

You might still be better off with separate lesser subscriptions even if you aren’t using oh-my-opencode. I keep reading that Claude is better at implementing straightforward solutions, while OpenAI’s Codex is better at debugging complicated problems. People also seem to feel that Z.ai’s GLM-4.6 is good enough for handling most of the grunt work.

Maybe upgrading from the $20 to the $100 Claude subscription isn’t the best move when you start reacing the 5-hour limit. It might be better to spend $20 on Claude and add a $20 Codex plan to the mix to attack those problems where Claude falls short. You can probably get more than double the work done with Codex when you run out of Claude tokens.

When that still isn’t enough, you can add a Z.ai subscription to the mix. A tool like OpenCode can connect to all three subscriptions, and switching between them is just a few keystrokes away.

If you are already subscribed to the $200 tiers of both Claude and Codex, and you are maxing them out, then none of this applies directly to you. You’re way beyond the audience of this blog post!

The important thing to remember is that you’re not locked in. If you pay for a plan that is undersized, you can always upgrade, and you are free to mix and match. I am excited that I landed on OpenCode, because I can plug it into all sorts of different backends, and I can configure different agents to use whatever API might be appropriate.

Conclusion

The landscape of AI coding tools is changing faster than ever. What was a clear “no-brainer” subscription a month ago now has serious competition from free tiers and pay-per-use models. Devstral 2 with Vibe-CLI has proven to be a capable setup, while OpenCode with GLM-4.7 remains my go-to tool. The key takeaway is that there’s no one-size-fits-all solution. What matters most is finding the combination that fits your workflow, budget, and privacy comfort level.

I’d love to hear about your experiences with these tools. What’s your current AI coding setup, and are you happy with it? Have you tried Vibe-CLI, OpenCode, or similar tools? Are you more concerned with cost, performance, privacy, or ease of use? Come join our Discord community where we discuss AI coding tools, homelab setups, and all things tech. It’s a great place to share your experiments and learn from others navigating the same decisions.

I Am Running Bazzite Linux On My Workstation

I’m probably stretching the word “workstation” a little further than I should. I’m talking about the machine I’m typing on right now. It’s my gaming PC, video editing machine, and the place where I sit when I work on blog posts. It feels like a reasonable word to use to convey the situation in which I’m running this immutable Linux gaming distribution.

Fake Pat with a 9070 XT

I created this image with state-of-the-art image-combining AI last year, but I used Flux Context to swap the Radeon 6700 XT in last year’s image for my new Radeon 9070 XT. This image made me giggle last year, so I knew I had to bring it up to date!

I first tried Bazzite on the Ryzen 6800H mini PC that we use for gaming on our living-room TV. I had a good experience, and that got me thinking that an immutable distro might be a good fit for me moving forward, so I installed Bazzite in desktop mode on my Ryzen 5700U laptop. Bazzite makes some difficult tasks easy, like getting OBS Studio with hardware encoding working, DaVinci Resolve Studio playing well with ROCm and OpenCL with a Radeon GPU, and keeping itself reasonably updated with cutting-edge gaming drivers and libraries. The productivity stuff that I use is simple to set up compared to those things that touch the hardware so deeply.

Things have been going well on my laptop, and I knew I would eventually move forward with loading Bazzite on my desktop PC, but I’ve been procrastinating. I decided to order a 16 GB Radeon 9070 XT yesterday, and my aging Ubuntu install just doesn’t have new enough Mesa libraries for an RDNA 4 GPU, so I had incentive to bite the bullet.

I’m honestly only around 24 hours into my fresh Bazzite installation as I’m writing this paragraph. My plan is for this blog post to be an actual log on the web of what I’m doing, how things are going, and the quirks I’m working around. I’ve been running Sawfish as my window manager for something like 20 years. I have code to arrange my windows into columns, and I sometimes tile terminal windows vertically in those columns. I rely on all sorts of weird muscle memory and custom scripts, and I’ve been building this memory for decades.

I’m popping back in from a few days in the future to write this paragraph. I’m realizing that this is almost turning into a list of all the little oddities that a long-time Linux user switching to an immutable distro might encounter along with my workarounds. I think this writeup is more valuable than I expected it would be, but the audience of people who will find value here might be extremely small!

The installation went smoothly

I store my Steam games on a volume backed by lvmcache that lives on my primary NVMe drive, and I store my recorded video footage on a volume with lvmcache on a separate and slower 1-terabyte SATA SSD. I had the NVMe split up with a little over 600 GB for boot, root, and home, and just shy of 300 GB for the lvmcache. My root volume was usually less than half full, so I decided to flip that around.

I haven’t set up the other volumes for Steam or video footage on the mechanical disk yet, and I haven’t configured the lvmcache. Everything I need to make it work is there. It just isn’t the priority yet. Configuration in /etc is not immutable, so I can add things to fstab and crypttab.

NOTE: My lvmcache is set back up, my Steam games are stored on the slow hard drive behind the NVMe cache, and the relevant bits are in my fstab and crypttab, but they are both set to noauto. Today was just not the day that I wanted to potentially troubleshoot a volume failing to mount during boot!

Everything worked. I installed the game I’ve been playing most recently on Steam, and it was running as smoothly or more smoothly under Wayland than it was on X11. I set up a basic OBS profile, and I was able to record my 3440x1440 screen at full resolution using VAAPI hardware-encoded H.265 without a problem. Maybe I’ll be able to try hardware-encoded AV1 when the new GPU gets here tomorrow!

I couldn’t use Bazzite at my desk without Distrobox!

Bazzite is an immutable distro. Just about the only acceptable way to install software is via Flatpak, and almost everything available as a Flatpak package is a GUI application. There aren’t really any command line tools in the Flatpak world, and you don’t want to try to shoehorn dozens of packages into your base install. I wouldn’t want to work without things like Emacs and zsh, and I need Ruby and rbenv to publish my blog using my ancient Octopress setup.

I knew this migration was coming. I set up a Distrobox Debian installation on my desktop more than a month ago with the intention of configuring it to be the place where I live at the command line. I used it as an opportunity to upgrade from Emacs 26 to Emacs 30, and I got most of my important Emacs packages and configuration working in there. I even used Distrobox’s FAQ to learn how to export my image on my workstation and transfer a copy over to my laptop.

Almost everything that I need to get by every day is installed on that Debian image.

I had some concerns about this when I was setting up OpenCode in the Distrobox container last week. All my Distrobox images share my home directory with the host, and OpenCode spilled its installation all over my home directory. That includes the executable file. I expected this was going to make things a little ugly, because calling into the Distrobox container from the host might wind up getting circular.

It turns out it isn’t going to be a big deal, because my terminal is now set to open my Debian Distrobox session by default. I’ll never run OpenCode, or any of the handful of similarly installed program, on the Bazzite host. There won’t be any development happening up there. Everything will be happening inside Distrobox.

I’m not at 100% of my usual operating capacity inside my Debian Distrobox, but I’m getting there. I’ve been using fasd for more than a decade, and it has been deprecated for a long time. I’ll have to look into replacing it with zoxide or fzf. Without fasd and autojump, my Go command no longer works, and I can tell you that I type g A LOT. Modernizing this is near the top of my list!

The majority of my productivity happens inside that Debian Distrobox, but I also have an Ubuntu 18.04 Distrobox just for Octopress and my blog. I’m not sure why I had to go back that far, because everything had been working on my much newer Ubuntu install last week, but going backward was easier than puzzling out the problem.

DaVinci Resolve Studio is working great

Another day has passed, and my Sapphire Pulse 9070 XT arrived this morning. I was holding off on running ujust install davinci-resolve because I wasn’t sure if it would install a different version of ROCm depending on the model of GPU I had installed.

I didn’t dig all that deep. I loaded a video. Playback worked. Simple edits worked.

I usually export a short video from an existing project when I upgrade Resolve to make sure the important bits are indeed working, but I don’t have any projects handy. This upgrade seemed like a good time to get rid of six years of podcasting projects that I haven’t touched in ages.

I will update this section if I run into any trouble exporting footage.

KZones is better AND worse than my custom window-management scripts

KZones is a lot like FancyZones on Windows. You define zones, you can drag windows into those zones, and KZones does the job of sizing the windows to precisely fit your grid. You can also configure keyboard shortcuts to move the current window into a particular zone.

I’m staring at the same sort of zones that I used to look at most of the time when using Sawfish. The screen is split into three even zones. Emacs is in the middle, a web browser is to my left, and a big OpenCode window is to my right. I’ve enjoyed my upgrade to a single ultrawide monitor because I get to have one wide window right in the middle.

Blog editing!

I have a fourth zone that occupies the same space as the center and right zones for those times when I need something bigger that isn’t quite full screen.

I set up another KZones layout where the center window is around 45% of the width. I haven’t used this much, but KZones has a shortcut key to cycle through layouts and another shortcut to snap all the windows into the new zones. I kind of wish that would happen automatically, but two keystrokes isn’t bad.

There’s one thing I miss about my old configuration. I had things set up so pressing a shortcut once would put the window in the expected zone, but pressing it again when the window is already in the zone would move it to a different zone. That meant I had one key that would put a window in the center zone AND the wider 2/3 zone. Not a huge problem, but my muscle memory keeps trying to do this.

Gaming is fantastic

I’ve only installed one game, and it was already running slightly better on the Radeon 6700 XT than it was on Ubuntu. The graphical settings didn’t change, but the game asked me whether it should run with Vulkan or DX11 when I fired it up, and I can no longer verify that I was definitely using Vulkan before!

I dropped in the Radeon 9070 XT this morning, and everything just worked. This is not a surprise, but I’m definitely happy that I received a functional piece of hardware, and that things are running smoothly.

I’m playing Ghost Recon: Breakpoint on the hardest difficulty with no AI teammates. I played the game on the same day both before and after installing Bazzite, and I really do feel like the game felt more responsive on the same hardware. I have no equipment here to accurately test latency, so I have no way to know whether or not it is just my imagination. I do wonder if I’m noticing the new Vulkan anti-lag feature. The release notes indicate that it is enabled by default, and I see it listed in vulkaninfo on my machine.

I am able to record 144 FPS at 3440x1440 gaming footage using gpu-screen-recorder without any trouble. I was warned by gpu-screen-recorder that AV1 and H.265 may be problematic. It was correct about AV1. My game had some stutters and the recording was mostly dropped frames, but the H.265 footage came out perfect and I couldn’t even tell that the replay buffer was active.

Thunderbird was being problematic

The problem is that I’m doing it wrong. I dropped my old .thunderbird directory into /home/pat/ and pointed the Flatpak installation of Thunderbird there. It would lose the location of my profile every time I rebooted, and sometimes it just didn’t want to open my profile unless I brought a fresh copy back over.

A Google search suggested that I’m supposed to drop my old .thunderbird into ~/.var/cache/org.mozilla.Thunderbird/cache/thunderbird. I almost did what they asked, but I didn’t like how deep that directory was, and the usual location is already part of my backup plan.

It seemed like it would be easier to just run apt install thunderbird in my Debian distrobox, so I did that! I ran distrobox-export -a /usr/share/applications/thunderbird.desktop in the Debian container to make the application available to KDE up on Bazzite, and I was good to go.

Was this the right way to fix this? Probably not, but I got to test exporting an application from a Distrobox container instead of just a binary. That was fun!

A quick test of llama.cpp in Distrobox

Another day has gone by, so I believe I’m on my third day with Bazzite. I already wrote the conclusion section, but I thought my quick test with llama.cpp was worth including. This also means I get to procrastinate a little longer on hyperlinks and image editing for this post!

It has been a long time since I messed around with llama.cpp or ROCm. I knew I wanted to set myself up for this stuff in a Distrobox machine, but I wasn’t sure where I should begin, and I had no idea which ROCm stuff would work with my new 9070 XT GPU. I found this how-to on setting up a ROCm Distrobox machine, and they also had a how-to on setting up llama.cpp for ROCm on the same site.

1
2
3
4
5
6
7
8
9
10
11
12
📦[pat@almalinux-rocm `llama.cpp`]$ ./build/bin/llama-bench -m models/Qwen_Qwen3-14B-Q6_K_L.gguf -d 7000 --cache-type-k q8_0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, gfx1201 (0x1201), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | ngl | type_k |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | --------------: | -------------------: |
| qwen3 14B Q6_K                 |  11.63 GiB |    14.77 B | ROCm       |  99 |   q8_0 |   pp512 @ d7000 |        617.73 ± 5.10 |
| qwen3 14B Q6_K                 |  11.63 GiB |    14.77 B | ROCm       |  99 |   q8_0 |   tg128 @ d7000 |         29.31 ± 6.14 |

build: 134e6940c (7149)
📦[pat@almalinux-rocm llama.cpp]$

A friend in our Discord community recommended that I try Qwen 3 14B at Q6_K_L. He said it is a good model to fit into 16 GB of VRAM, though context would be a little tight. I managed to just barely squeeze 7,000 tokens of context at Q8 in my llama-bench run. I don’t know if there is anything I should do to optimize my settings, but I’m not unhappy with 600 tokens per second of prompt processing with my VRAM filled to the brim.

I did interact with the model, and it conversed with me just fine. When I gave it this entire blog post and asked for a summary, the model just kept repeating nonsense. I wound up swapping out Qwen3 14B Q6 for Qwen3 8B Q6. It was able to summarize this blog post just fine, and I was able to push the llama-bench up to 12,000 tokens of context.

I’m excited to have a working ROCm Distrobox image and a functional llama.cpp setup. That’s a good start, and it bodes well for future machine-learning shenanigans!

My zsh fix makes me feel dirty

I set Bazzite’s default terminal app to open new sessions inside my Debian Distrobox image. I have my shell inside that container set to /usr/bin/zsh. If I kill the container, my first session in the container fires up zsh just fine. Every subsequent connection winds up running bash.

1
2
3
4
5
# .bashrc
 
if [ -e /usr/bin/zsh ]; then
  exec /usr/bin/zsh
fi

I found a lot of suggestions on Google, but none of them worked. The only suggestion I didn’t try was deleting my Distrobox image, starting from scratch, and passing in a SHELL=/usr/bin/zsh during creation. I’m not doing that today.

I wound up with a massive kludge of a fix. Since Bazzite doesn’t ship with zsh, I added a check to my .bashrc that checks for the binary’s presence. If it’s there, it will switch to zsh. If not, it will just continue to use bash.

I guess the upside is that I’ll get a free upgrade if Bazzite decides to ship zsh.

Conclusion

This isn’t truly the conclusion. I still have plenty of rough edges to sand down, and I’ll absolutely be running into missing software or configuration for months. I’ve never set up a brand new machine that had everything ready to go in the first week. I always run into a rare task that I’m not quite prepared to solve at some point in the future! I should probably note here that I upgraded the same Ubuntu installation at my desk from 2009 until 2023. I wrote a five-week follow-up post covering how things have been going since this initial setup.

I’m in a good place. Games work. Emacs is upgraded, though in a similar state of configuration completeness as my Bazzite installation. If you’re reading this, then my blog published successfully. I can also play games and edit video. All my major tasks are covered, so I’m ready to move forward!

Are you using an immutable Linux distribution? How has your experience been? I’d love to hear about your setup, the challenges you’ve faced, and any workarounds you’ve discovered. If you’re interested in learning more about Bazzite, immutable Linux distributions, or just want to chat about gaming on Linux, feel free to join our Discord community! We’d love to have you as part of the conversation and help you on your own Linux journey.

Is The $6 Z.ai Coding Plan a No-Brainer?

I’m not going to make you wait until the end to learn the answer. I’m going to tell you what I think right in the first paragraph. I believe you should subscribe to the Z.ai Coding Lite plan even if you only write a minuscule amount of code every month. This is doubly true if you decide to pay for a quarter or a full year in advance at 50% off.

I’m only a week and seven million tokens deep into my 3-month subscription, but I’m that guy who only occasionally writes code. I avoided trying out Claude Code because I knew I would never get $200 worth of value out of a Claude Pro subscription. I also now know that I could have paid for a full year of Z.ai for less than the cost of two months of Claude Pro.

OpenCode with Z.ai

I saw a Hacker News comment suggesting that Z.ai’s coding plan GLM-4.6 is about 80% as good as Claude Code. I don’t know how to quantify that, but OpenCode paired with GLM-4.6 has been doing a good job for me so far. Z.ai claims that you get triple the usage limits of a Claude Pro subscription, but what does that even mean in practice?

Let’s start with the concerns!

Z.ai is based in Beijing. Both ethics and laws are different in China than they are in the United States or Europe, especially when it comes to intellectual property.

I’m not making any judgments here. You can probably guess just how much concern I have based on the fact that I’m using the Z.ai Coding Plan while working on this blog post. I just think this is important to mention. Do you feel better or worse about sending all your context to OpenAI, Anthropic, or Z.ai?

Is Z.ai having performance problems with their subscription service?

This blog post has been up for a month, and I sure seem to be recommending a paid service here. A few people in our Discord community are using the service, so I asked how things are going for everyone there. I’ve also been keeping an eye out for peoplee complaining in places like Reddit.

I had a weekend where I was getting connection errors every half dozen prompts or so, but the service was running at its usual speed. It was a pain having to hit the up arrow in OpenCode and send the same prompt again, but it didn’t really slow me down.

There are a few posts on Reddit complaining that the Z.ai GLM service has gotten so slow that it is unusable. The replies usually have a few people saying things are sometimes a little slower than usual.

Are these growing pains due to a large influx of new users snapping up the $2.40 per month rate? Will their capacity grow to meet their demand? Will things settle down on their own? Are things even all that bad? Will the increasing RAM and GPU prices make it hard for Z.ai to increase capacity? We don’t know, but these are the questions I’d ponder a bit before paying up from for a 12-month subscription.

I am only one anecdote, but everything is still completely usable for my limited needs. I am still happy that I am subscribed, and I would still risk $29 on a full year’s subscription to the Coding Lite plan to lock in at $2.40 per month for the first year.

Are the limits actually more than twice as generous as Claude Pro?

I assume that the statement is true. The base Claude Pro subscription limits you to 45 messages during each 5-hour window, while the Z.ai Coding Lite plan has a 120-message limit in the same window. That is very nearly three times more messages, but are these actually equivalent?

I haven’t managed to hit the limit on the Coding Lite plan. The fact that I haven’t hit the limit should be a good indicator of how light of a user I am!

I suspect that this is one of those situations where your mileage may vary. We know that Claude Opus is a more advanced model than GLM-4.6. Opus is more likely to get things right the first time, and Opus may need fewer iterations to reach the correct result than GLM-4.6.

I’d bet that they’re comparable most of the time, and you really do get nearly three times as much work out of Z.ai’s plan, but I would also assume there are times when you might eat through some extra prompts trying to zero in on the correct results. If you’re curious about how GLM-4.6 stacks up against other affordable options, I’ve since written a comparison of Devstral 2 with Vibe CLI vs. OpenCode with GLM-4.6 that looks at how these tools feel in practice for casual programmers.

I’m not sure that an accurate answer to this question matters, since Claude subscriptions cost three or six times as much.

What have I done with OpenCode and Z.ai?

My Li’l Magnum! gaming mouse project is written in OpenSCAD. I have a simple build script that should have been a Makefile, but instead it is a handful of for loops that run sequentially. This wasn’t a big deal early on, but now I am up to three variations of eight different mice. Running OpenSCAD 24 separate times is taking nearly four full minutes.

Instead of converting this to a Makefile, I decided to ask OpenCode to make my script parallel. OpenCode’s first idea was to build its own job manager in bash. I said, “No way! We should use xargs to handle the jobs!” GLM-4.6 agreed with me, and we were off to the races.

OpenCode with Z.ai

I watched OpenCode set up the magic with xargs. I eventually asked it to combine its large number of functions into fewer functions by passing variables around. I had OpenCode add optional debugging statements so I could verify that the openscad commands looked like they should.

We ran into a bug at some point, and OpenCode had to start calling my build script to make sure STL and 3MF files showed up where they belonged, but OpenCode didn’t know that my script only builds files that have been modified since the last build. After telling OpenCode that it needed to touch the *.scad files before testing, it started trying and testing lots of things. This is probably a piece of information that belongs in this project’s agents.md file!

I had something I was happy with during my first session, but I wound up asking OpenCode for more changes the next day. We lost the xargs usage at some point, but I didn’t pay attention to when!

There is still a part that isn’t done in parallel, but it is kind of my own fault. I have one trio of similar mice that share a single OpenSCAD source file. I have some custom build code to set the correct variables to make that happen, and OpenCode left those separate just like I did.

I’m pleased with where things are. Building all the mice now takes less than 45 seconds.

You can wire Z.ai into almost anything that uses the OpenAI API, but the Z.ai coding plan is slow!

I almost immediately configured LobeChat and Emacs’s gptel package to connect to my Z.ai Coding Lite plan. I was just as immediately disappointed by how slow it is.

Everything seems pretty zippy in OpenCode. Before subscribing, I was messing around with GLM4.6 using the lightning-fast model hosted by Cerebras. I am sure Cerebras is faster while using OpenCode, but it isn’t obviously faster. OpenCode is sending up tens of thousands of tokens of context, and it is doing that over and over again between my interactions.

This is different than Emacs and LobeChat. I wasn’t able to disable reasoning in LobeChat, so I wind up waiting 50 seconds for 1,000 tokens of reasoning even when I just ask it how it is doing. I assume the same reasoning is happening in Emacs when I highlight a paragraph and ask for it to be translated into Klingon.

I assume the Coding Plan is optimized for large context, so I wound up keeping Emacs and LobeChat pointed at my OpenRouter account. Each of these sorts of interactive sessions only eat up the tiniest fraction of a penny. I am not saving a measurable amount of money by using my free subscription tokens here.

OpenCode Stats

Six million input tokens would have cost at least $6 at OpenRouter, and I am only two weeks into my first month!

It is tools like OpenCode, Claude Code, or Aider where you have to make sure you’re using an unlimited subscription service. I can easily eat through two million tokens using OpenCode, and that could cost me anywhere from $1.50 to $10 on OpenRouter. It depends on which model I point it at!

I am using OpenCode with Z.ai Coding Lite right now!

I messed around with Aider a bit just before summer. It was neat, but I was hoping it could manage to help me with my blog posts. It seemed to have no idea what to do with English words.

How well OpenCode worked with my Markdown blog posts using Cerebras’s GLM-4.6 was probably the thing that pushed me over the edge and made me try a Z.ai subscription. I can ask OpenCode to check my grammar, and it will apply its fixes as I am working. I can ask it to add links to or from older blog posts, and it will do it in my usual style.

OpenCode with Z.ai

I can ask OpenCode if I am making sense, and I can ask it to write a conclusion section for me. I already do some of these things either from Emacs or via a chat interface, but I have always had to do them very manually, and I would have to paste in the LLM’s first pass at a conclusion.

I could never burn through $3 in OpenRouter tokens in a month using chat interfaces—I probably couldn’t do it in a year even if I tried! Even so, OpenCode is saving me time, and I will use it for writing blog posts several times each month. That is worth the price of my Z.ai Coding Lite subscription.

Do you need the Z.ai Coding Pro or Coding Max plan?

If you do, then you probably shouldn’t be reading this blog! I am such a light user, and I suspect my advice will apply much better to more casual users of LLM coding agents.

That said, the more expensive plans look like a great value if you are indeed running into limits all the time. The Coding Pro plan costs five times more, and you get five times the usage limit. You also get priority access with 40% faster access to the models, and you also get upgraded to image and video inputs. The Coding Max plan seems like an even better value, because it only costs twice as much again, but it has four times the usage.

Z.ai has built a pricing ladder that manages to include some actual value for your money. Even so, the best deal is to pay only for what you ACTUALLY NEED!

I would also expect that if you’re doing the sort of work that has you regularly hitting the limits of Z.ai’s Coding Lite plan, then you might also be doing the sort of work that would benefit from the better models available with a Claude Pro or Claude Max subscription. I have this expectation because I assume you are getting paid to produce code, and even a small productivity boost could easily be worth an extra $200 a month.

Conclusion

The Z.ai Coding Lite plan offers exceptional value for casual coders and writers like myself. At just $6 per month (or $3/month with the current promotional discount), you get access to an extremely capable AI coding assistant. While it may not match Claude’s raw power, it is more than useful enough to justify its price, even if you only use it a few times a month.

The integration with OpenCode, which is ridiculously easy to set up, creates a seamless workflow that is easily worth $6 per month, and the generous usage limits mean I am unlikely to worry about hitting caps. For light users, hobbyists, or anyone looking to dip their toes into AI-assisted coding without breaking the bank, Z.ai’s Coding Lite plan is genuinely a no-brainer. If you use my link, I believe you will get 10% off your first payment, and I will receive the same in future credits. Don’t feel obligated to use my link, but I think it is a good deal for both of us!

Want to join the conversation about AI coding tools, share your own experiences, or get help with your setup? Come hang out with us in our Discord community where we discuss all things AI, coding, and technology!

The Li’l Magnum! Ultralight Fingertip Gaming Mouse 2.0 Is Almost Here!

What does it take to upgrade a 3D-printed mouse mod from version 1.0 to 2.0? With software, you usually increment the major number when you’re making a change that makes the program incompatible with the old version in some major way.

Li'l Magnum! mice in different colors

I have been experimenting with some rainbow color-changing filaments. Getting a nice color change is a challenge when the shell only weighs three grams!

There are a lot of minor changes to the Li’l Magnum! in version 2.0, but I also made significant changes to the button paddles. The thinning of the paddles might not technically qualify as a compatibility-breaking change, but a few of the mice had to have their button offset lowered by one layer to regain solid pre-engagement.

What has changed since version 1.0?

Let’s start with a list of what’s new!

  • Much lower default click force
    • Configurable from 20 grams to 40 grams
  • Modeled-in supports for the grips
  • No slicer-generated supports required when using modeled-in supports
    • Better overhang angles on all grip arms
  • OpenSCAD-generated sub-parts
    • Exactly two layers of PETG support for multimaterial
      • Larger build plate contact surfaces on most built-in supports
    • Separate button parts to apply extra top layers

I believe we are just at a point where the Li’l Magnum! is a better mouse overall. Most of the models are slightly lighter. All the models feel a little more solid. While the button paddles have more flex, I expect they will be even more durable.

I love having configurable button pressure!

I took a few Li’l Magnum! mice with me to display at our booth at Texas Linux Fest last month. I wasn’t sure what to expect. This isn’t a gaming crowd, but I did expect to run into a lot of tech enthusiasts. More than a few people assumed that the Li’l Magnum! must have a motor so it can run around on the floor like a mouse.

I was extremely excited when I ran into one actual gamer who plays first-person shooters, and he immediately knew what the Li’l Magnum! was for. Not only does he play shooters, he has four or five times as many hours as I do in Team Fortress 2. I was so excited that I ended up sending him home with my VXE R1 Pro Li’l Magnum!.

His first piece of feedback was about how stiff I made the buttons, and he is right. I purposely configured it for a short press travel while ensuring I wouldn’t accidentally click when I didn’t intend to.

OpenSCAD view of the configurator for Li'l Magnum button force

I ended up thinning out the paddle between the plunger and the front of the mouse. I printed dozens of test mice. I worked hard to get that overhang in the flexible spot to print reasonably clean. I also set up the customizer so that you can choose your own click force separately for each mouse button. That means you can make it easier to shoot while also making it harder to accidentally set off your stickybomb trap with a stray right click.

Are the click-force settings really as precise as the customizer says? Definitely not. Reliably measuring 18-grams of force with the mouse on a scale is hard. Every spool of PLA varies slightly. If your printer prints the overhangs more poorly, your force will be even lighter. The actual click force will also be influenced by the stiffness of your mouse’s microswitches.

Think of the force measurement in the customizer as a guideline.

How much force does it take to hit the buttons?

It is challenging to accurately measure the click of a button with a scale, but I did my best. I think I have a good way of explaining the click feel by comparing things to my Logitech G305, because the click force of a normal mouse like the G305 gets lower when you click closer to the front of the mouse. You have more leverage out there!

The old version of the Li’l Magnum! was pretty stiff. It was like clicking the G305 just behind the mouse wheel. This is where someone with an extreme claw grip might be clicking their G305-sized mouse.

The default clicks for version 2.0 are quite light. Clicking my own Corsair Li’l Magnum! feels like clicking the G305 out near the front tip of the mouse. Adjusting the customizer upward by two or three notches would make my clicks feel similar to clicking the G305 near the center of the wheel.

Upgraded grips

I am extremely pleased with the modeled-in supports for the grips. The supports connect to the grip with tiny 0.4-mm diameter nubbins. The supports break off easily, and the nubbins can be knocked off with your thumbnail or a metal tool. Please don’t use anything sharp!

In order for this setup to work, I had to chamfer the bottom of the grips to bring things to a point for the nubbins to connect to. I had no idea how much softer and more pleasant that chamfer would make the grips feel. I don’t notice it on the finger side, but the thumb grip feels nicer.

OpenSCAD view of the Li'l Magnum V2

The new supports for the grips break off easily, and a quick scrape with a metal tool leaves the underside of the grip soft and smooth!

We can blame this on the Corsair Sabre V2 Pro and Dareu A950. I made sure to line up the arms on every other mouse with the bottom of their grips. That means that the bottoms of the grips were always printed as bridges. I had to put one of the Corsair’s arms a little higher, requiring me to print the grip on tree supports, which I didn’t like.

Now that the base of the grips is always supported, I don’t have that limitation. I moved almost every arm upwards by at least one millimeter. You can’t always feel the difference, but in theory this should make every pair of grips just a little more rigid.

No slicer supports needed!

If you can’t print your Li’l Magnum! with multimaterial supports, you will still need to enable tree supports in your slicer. If you are using multimaterial supports, there is nothing left on any of the Li’l Magnum! models that needs to be supported.

Dialing in the Li'l Magnum! button overhangs

The red mouse on the left has the original button angle, while the mouse on the right is slightly steeper. This drastically improves the quality of the unsupported overhang, and it helps achieve just the right feel for the clicks!

The connectors that join the paddles to the grips are entirely bridges and reasonable overhangs. The connector across the front is a bridge. Everything should print fine on a modern printer.

The Dareu A950 Wing and Corsair Sabre V2 Pro are now the ultimate Li’l Magnum! donor mice

I bought a Corsair Sabre V2 Pro the same day they showed up on Amazon for $99. It is a fine mouse even without modding. It looked like it had extremely light internals, and I was pleased to learn that this was indeed correct. I’ve been gaming with it ever since it arrived, and most of my Li’l Magnum! builds with the Corsair have weighed 15.2 to 15.4 grams. I even have one test print that came in at 14.92 grams!

We have confirmation from at least two people that the $52 Dareu A950 Wing fits perfectly in the Li’l Magnum! shell. The PCB is nearly identical to the Corsair, because Corsair seems to be putting their branding on Dareu’s existing mouse.

There are some differences. They use different software to configure the mice. The Dareu uses a 30,000-DPI PAW3950 sensor, while the Corsair uses a Corsair-branded 33,000-DPI sensor.

Li'l Magnum subobjects

Subobjects are labeled in your slicer, and the labels include basic print-setting reminders

The list price for the Dareu on Amazon is $20 lower than the Corsair at $80. The Dareu regularly goes on sale for around $60 and has gone on sale for as little as $52.

These prices make it hard to recommend any other mice for your Li’l Magnum! build. If you are really on a budget, the VXE R1 SE is still the lowest price. Unfortunately, they only sell the R1 SE with a massive 500-mAh battery, so your Li’l Magnum! build will come in at over 25 grams.

If you are in the United States, then you’re going to pay $36 for a 25-gram Li’l Magnum!. You could spend $20 to $30 more on the Dareu and get a better sensor and the absolute lightest possible Li’l Magnum! build. You can probably still get an R1 SE for under $20 outside of America, so the math might be different for everyone else.

The price gap between the cheapest donor mouse and the most impressive donor mouse has gotten so small. It means that the mice in between the R1 SE and the Dareu A950 Wing are mostly pointless. If you already have a VXE Mad R or a VXE R1 Pro, then I think you should print a Li’l Magnum! shell. You already have a great donor mouse.

Now there are only two mice to buy. The cheapest VXE R1 you can find or the Dareu A950.

You don’t need to shave off every possible gram

One of Optimum’s Zeromouse builds was down around 17 grams, but every iteration since then has gotten heavier. I think there is a reason for this.

I notice that my 25-gram Li’l Magnum! is heavier than the rest. I can swap out its battery to bring it down to 21 grams. I can assure you that it’s difficult to notice the difference between a 15-, 17- and 21-gram Li’l Magnum!.

You can probably pick up on it when you’re really paying attention. You’ll notice it when you lift the mouse to recenter your aim. You probably won’t notice a difference while aiming. I think it is more important for me to have a fingertip mouse than it is for me to have a 15-gram mouse.

Chasing numbers and specs can be fun. I don’t want to stop you from having fun finding lighter and lighter mice. It might even be an inexpensive hobby for you.

One of the reasons I designed the Li’l Magnum! is so that you don’t have to spend $180 to find out whether or not you like ultralight fingertip mice. You shouldn’t feel like you’re missing out if you can only afford the cheapest Li’l Magnum! donor mouse.

What makes the Li’l Magnum! special?

The Li’l Magnum! is an open-source project. You can download and modify the OpenSCAD source code. It will still be here even if I’m gone.

The Li’l Magnum! is parametric. All the surfaces that you touch while gaming are adjustable in the customizer on MakerWorld. Does your thumb sit farther back? You can move the grip. Do you need a stiffer right click? Do you want an angle on one of the grips? You can easily make it happen.

I am also aiming directly at consumer 3D printers and PLA plastic. There are other printing processes that are great for printing skeletal mouse mods, and there are other materials that could be a bit more suitable for the Li’l Magnum!.

I tried PETG early on. It is a much more appropriate material for the buttons to have flex, but that extra flex of PETG also means that the buttons want to pivot, and the side grips wind up being really soft. I would have to add material and weight to the mouse to switch to PETG, and fewer people are able to print PETG at home. I figured it was best to focus on the easier material to print.

The Li’l Magnum! supports eight different donor mice so far, and it is relatively easy to add support for new mice. The important pieces that come in contact with a new mouse are mostly parametric. Most of the work is figuring out where the screw holes and microswitches are located on the new mouse PCB.

The Li’l Magnum! isn’t just my project. It is our project!

I’d rather you print your own, but you can buy a shell from my Tindie store

I run all my Li’l Magnum! prints on my Bambu A1 Mini. I use the AMS Lite to print multimaterial supports, but you can print a perfectly good Li’l Magnum! without the AMS. You’ll just need to file the bottom of the plungers a bit. You can spend $250 on a printer, and you can print a Li’l Magnum! for you and all your friends. I can assure you that you’ll find other fun uses for your printer.

I charge about $20 for a Li’l Magnum! print in my Tindie store. Your friend with a 3D printer can print one for you for free. You can for sure find 3D-printing services that will print the STL for less.

Why should you pay a little extra for a Li’l Magnum! from my store? I think the biggest reason is that I have the print settings for a Li’l Magnum! optimized to give you the right balance between rigidity and weight. The default print settings will give you a shell that weighs around three grams more than my own settings. The settings aren’t a secret.

I also guarantee that my prints fit the mice they are supposed to fit. If you own a Dareu A950 Wing, and I send you a Dareu A950 Wing Li’l Magnum! shell, then you are going to be able to make it work. Sometimes the manufacturer changes the PCB. We have already seen this happen with the MCHOSE L7. I will either work with you to adjust the model, or I will give you a refund.

I am not here to make a living selling mice. I’ll be happy enough if the Tindie sales earn enough money to keep buying more donor mice to keep the project moving forward.

Wrapping up

That’s the Li’l Magnum! 2.0. We’ve tweaked the button feel, made the grips more pleasant, and optimized the print settings to make the whole process smoother from your slicer to your desk. This is less about a giant leap and more about numerous small refinements that add up to a much nicer experience.

But here’s the real secret: this project has never been just about me or my ideas. It’s been shaped by every piece of feedback. Sometimes feedback is about the feel of the mouse. Sometimes the feedback is about a slightly different mouse model fitting just fine. This thing is a collective effort, and that’s what makes it so special.

The best part of all this isn’t the grams we’ve shaved off; it’s the community that we are building up around a shared interest in tinkering and making gaming gear truly our own.

Let’s keep building together!

I genuinely believe the coolest ideas for the Li’l Magnum! are still out there, waiting to be discovered by someone in our community. Maybe that’s you!

I’d love to see you join our friendly Discord community. It’s the central hub where we all hang out, share prints, troubleshoot builds, and brainstorm what’s next.

Whether you’ve just printed your first shell, you’re an old hand at modding mice, or you’re just curious and have questions, you are welcome. Let’s see what we can build together.

What are your thoughts on the new version? What donor mouse are you planning to use? Do you have a donor mouse in mind that I haven’t thought of yet? Come tell us about it on Discord!

Contemplating Local LLMs vs. OpenRouter and Trying Out Z.ai With GLM-4.6 and OpenCode

My feelings about local large-language models (LLMs) waffle back and forth every few months. New smaller models come out that perform reasonably well, in both speed and output quality, on inexpensive hardware. Then new massive LLMs arrive two months later that blow everything out of the water, but you would need hundreds of thousands of dollars in equipment to run them.

Everything depends on your use case. The tiny Intel N100 mini PC could manage to run a 1B model to act as a simple voice assistant, but that isn’t going to be a useful coding model to put behind Claude Code, Aider, or OpenCode.

OpenCode for Blogging

Most of what I ask of an LLM is somewhere in the middle. The models that fit on my aging 12-gigabyte gaming GPU were already more than capable of helping me write blog posts two years ago, and even smaller models can do a more than acceptable job today. I don’t need to use DeepSeek’s 671-billion parameter model for blogging, because it is only marginally better than Qwen Next 30B A3B. If you are coding, this is a different story.

I believe I should tell you that I started writing this blog post specifically because I subscribed to Z.ai’s lite coding plan. Yes, that is my referral link. I believe that you get a discount when you use my link, and I receive some small percentage of your first payment in credits.

Z.ai is offering 50% off your first payment, so you can get half price on up to one full year of your subscription. It works out to $3 per month. I aimed for the middle and bought three months for $9. I will talk in more detail about this closer to the end of this blog post!

Why would you want to run an LLM locally?!

I would say that the most important reason is privacy. Your information might be valuable or confidential. You might not be legally allowed to send your customers’ data to a third party. If that is the case, then spending $250,000 on hardware to run a powerful LLM for your company might be a better value than paying OpenAI for a subscription for twenty employees.

Reliability might be another good reason. I could use a tiny model to interact with Home Assistant, and I don’t want to have trouble turning the heat on or the lights off when my terrible Internet connection decides to go down.

Price could be a good reason, especially if you’re a technical person. You can definitely fit a reasonable quantized version of Qwen 30B A3B on a used $300 32 GB Radeon Instinct Mi50 GPU, and it will run at a good pace. This doesn’t compete directly with Claude Code in quality or performance, but Qwen Coder 30B A3B can be used for the same purposes. Yes, it is like the difference between using a van instead of a Miata when moving to a new apartment, but it is also a $300 total investment vs. paying $17 per month. The local LLM in this case would start to be free before the end of the second year.

Local LLM performance AND available hardware are both bummers!

You certainly have to use a language model that is smart enough to handle the work you are doing. You just can’t get around that, but I believe the next most important factor is performance.

People are excited about the $2,000 Ryzen AI Max+ 395 mini PCs with 128-gigabytes of fast LPDDR5 RAM. There are a lot of Mac Mini models that are reasonably priced with similar or even better specs. They are excited because you can fit a 70B model in there with a ton of context, but a 70B model runs abysmally slow on these DDR5 machines. Prompt-processing speeds as low as 100 tokens per second and token-generation speeds below 10 tokens per second.

While these mini PCs with relatively fast RAM can fit large models, they really only have enough memory bandwidth to run models like Qwen 30B A3B at reasonable speeds. The benchmarks say the Ryzen 395 can reach 600 tokens per second of prompt processing speed and generate tokens at better than 60 tokens per second.

I send 3,000 tokens of context to the LLM when I work on blog posts. Waiting 30 seconds for the chat to start working on a conclusion section for a blog post isn’t too bad, and it will only take it another minute or two to generate that conclusion. I am used to my OpenRouter interactions of this nature being fully completed in ten seconds, but this wouldn’t be the worst thing to wait for.

My OpenCode sessions often send 50,000 tokens of context to the LLM, and it will do this several times on its own after only one prompt from me. I cannot imagine waiting ten minutes, or potentially multiples of ten minutes, to start giving me back useful work on my code or blog post.

Waiting ten minutes for a 70B model would stink, while waiting one minute for Qwen 30B A3B would feel quite acceptable to me.

On the other end of the local-LLM spectrum are dedicated GPUs. You can spend the same $2,000 on an Nvidia 5090 GPU, but that assumes you already have a computer to install it in. The RTX 5090 should run Qwen 30B A3B at a reasonable quantization with prompt-processing speeds at least five times faster than a Ryzen Max+ 395.

I have a friend in our Discord community who is running Qwen 30B A3B on a Radeon Instinct Mi60 with 32 GB of VRAM. These go for around $500 used on eBay, but the older Radeon Instinct Mi50 cards with 32 GB of VRAM used to go for around half that, but the prices have been inching up. There are benchmarks of the Mi50 on Reddit showing Qwen 30B A3B hitting prompt-processing speeds of over 400 tokens per second while generating at 40 tokens per second. That’s not bad for $500!

There just isn’t one good answer. This is all apples, oranges, and bananas here. You can either run big models slowly or mid-size models quickly for $2,000, or you could run mid-size models at a reasonable speed for $500. You would need to figure out which models can meet your needs.

Local LLMs might be fantastic if you can fit within the constraints!

I recently upgraded my computer with a 16 GB Radeon 9070 XT. I upgraded to Bazzite at the same time, and set up Distrobox containers to keep a few things separated. One of those Distrobox containers is a ROCm setup for mucking about with large language models.

I already know that my minimum viable OpenCode model is likely to be Qwen Coder 30B A3B at Q8. That’s around 30 GB of VRAM without context, and OpenCode needs at least 16,000 tokens of context. The only way I am running a model that size would be at a medium pace on a $2,100 Ryzen AI Max 395 mini PC.

I have managed to puzzle out an important nugget of useful information. I can fit Gemma 3 4B at Q6 with its vision model and 4,000 tokens of context in just under 8 gigabytes of VRAM. I can push that up around 16,000 tokens of context if I run the context at Q8 and still fit in around 8 gigabytes of VRAM.

Gemma 3 4B Multimodal running locally on my 9070 XT

I think this is neat. I have been saying in our Discord community that it would be nice to have Gemma 3 4B running locally, and I’ve been betting that it would fit on a used $100 8 GB Radeon RX580. It’d be a tight fit, but I could drop the max context a little and bring the context quantization down to Q8 if I had to.

A lot of us in the homelab community are likely to have a spare PCIe slot in one of our servers. Spending less than $100 to add an always-on LLM with a decent multimodal model with image recognition capabilities might be awesome. You could ship surveillance camera images to it. You could forward it photos of your receipts. You could tie it into your Home Assistant voice integration.

Having a reasonably capable model that doesn’t fail when your Internet connection drops out might be nice. Sure, it isn’t going to fix your OpenSCAD project’s build script, but it can still do really useful things!

You can try most local models using OpenRouter.ai

I am a huge fan of OpenRouter. I put $10 into my account last year, and I still have $9 in credits remaining. I have been messing around with all sorts of models from Gemma 2B to DeepSeek 671B and everything in between. Every time I have the urge to investigate buying a GPU to install in my homelab, I head straight over to OpenRouter to see if the models I want to run could actually solve the problems that I am hoping to solve!

I used OpenRouter this week to learn that Qwen 30B A3B is indeed a viable LLM for coding with things like Aider, OpenCoder, and the Claude Code client. That gives me some confidence that it could actually be worthwhile to invest some of my time and money into getting a Radeon Mi60 up and running.

The only trouble is that the Qwen 30B that I tested in the cloud isn’t as heavily quantized. I would need to run Qwen 30B at Q4_K_M, and the results will be degraded at that level of quantization. That may be enough to push the model beyond the point where it is even usable.

Testing the small models at OpenRouter helps you zero in on how much hardware you would need to get the job done, but it most definitely isn’t a perfect test!

Tools like OpenCoder rip through tokens!

Listen. I am not a software developer. I can write code. I occasionally program my way out of problems. I write little tools to make my life easier. I do not write code eight hours a day, and I certainly don’t write code every single day.

I have found a few excuses to try the open-source alternatives to Claude Code, like Aider and OpenCode. They eat tokens SO FAST.

OpenCode burns through tokens

Don’t trust the cost! Some of those 3.2 million tokens over the two-day period were using various paid models on OpenRouter, while more than half were free via my Z.ai coding plan

It took me 11 months to burn through 80 cents of my $10 in OpenRouter credits. Chatting interactively to help me spice up my blog posts only uses fractions of a penny. One session with OpenCode consumed 18 cents in OpenRouter credits, and I only asked it to make one change to six files. I repeated that with two other models, and I used up as much money in tokens in an hour as I did in the previous 11 months.

This is why subscriptions to things like Google AI, Claude Code, or Z.ai with usage limits and throttling make a lot of sense for coding.

Blogging with OpenCode

This week is the first time I have had any success using one of the LLM coding tools with blog posts. I tried a few months ago with Aider, and I had limited success. It didn’t do a good job checking grammar or spelling, it didn’t do a good job rewording things, and it did an even worse job applying the changes for me.

OpenCode paired with both big and small LLMs has been doing a fantastic job. It can find grammar errors and apply the fixes for me. I can ask OpenCode to write paragraphs. I can ask it to rephrase things.

OpenCode for Blogging

I don’t feel like my blog is turning into AI slop. I don’t use sizable sections of words that the robots feed to me. I ask it to check my work. I sometimes ask it to rewrite entire sections, or sometimes the entire post, and I sometimes find some interesting phrasing in the robot’s work that I will integrate into my own.

I almost always ask the LLM to write my conclusion sections for me. I never used their entire conclusion, but I do use it as a springboard to get me going. The artificial mind in there often says cheerleading-things about what I have worked on. These are statements I would never write on my own, but I usually leave at least one of them in my final conclusion. It feels less self-aggrandizing when I didn’t actually write the words myself.

Trying out Z.ai’s coding plan subscription

A handful of things came together around the same day to encourage me to write this blog post. I decided to try OpenCode, it worked well on my OpenSCAD mouse project and my blog, and I learned about Z.ai’s $3-per-month discount. I figured out that it would be easy to spend $1 per week in OpenRouter credits when using OpenCode, and I also assumed that I could plumb my Z.ai account into other places where I was already using OpenRouter.

Z.ai’s Lite plan using GLM-4.6 is not fast—I was using OpenCode with Cerebras’s 1,000-token-per-second implementation of GLM-4.6 via OpenRouter. I was only seeing 200 to 400 tokens per second, which is way better than the 20 to 30 tokens per second that I am seeing on my Z.ai subscription. They do say that the Coding Pro plan is 60% faster, but I have not tested this.

Z.ai Performance In LobeChat

These are the stats from one interaction with GLM-4.6 on my Z.ai subscription using LobeChat

I wound up plumbing my Z.ai subscription in to my local LobeChat instance and Emacs. The latency here is noticeably slower than when I connect to large models on OpenRouter. My gptel interface in Emacs takes more than a dozen seconds to replace a paragraph, whereas DeepSeek V3.2 appears to respond almost instantly.

It isn’t awful, but it isn’t amazing. I would be excited if I could use just one LLM subscription for all my needs, but my LobeChat and Emacs prompts each burn an infinitesimally small fraction of a penny. I won’t be upset if I have to keep a few dollars in my OpenRouter account!

I was concerned that I might be violating the conditions of my subscription when connecting LobeChat and Emacs to my account. Some of the verbiage in the documentation made me think this wouldn’t be OK, but Z.ai has documentation for connecting to other tools.

OpenCode performance is way more complicated. I am not noticing a difference in my limited testing. This may be due to GLM-4.6 being a better coding agent, so I might be using fewer tokens and fewer round trips for OpenCode to get to my answers. I’ve since written a detailed comparison of Devstral 2 with Vibe CLI vs. OpenCode with GLM-4.6 that looks at how these tools feel in practice for casual programmers.

I have only been using my Z.ai subscription for two days. I expect to write a more thorough post with my thoughts after I have had enough time to mess around with things.

Conclusion

Where does all this leave us? After spending so much time digging into both local LLM setups and cloud services, I firmly believe that there isn’t one right answer for everyone.

For my own use case, I might eventually land on a hybrid setup with both a local setup in my homelab and a cloud subscription for the heavy lifting. For now, I’ll keep using OpenRouter for short, fast prompts and testing new models. The inexpensive Z.ai subscription, while a little slower, will do a fantastic job of keeping me from accidentally spending $50 on tokens for OpenCode in a week—that $6 per month ceiling will be nice to have!

The most important thing I learned is that you should test before you buy. OpenRouter has saved me from making at least two expensive hardware purchases by letting me try models first. For anyone else trying to figure out their own LLM setup, I’d recommend the same approach.

If you’re working through these same decisions about hardware, models, or services, I’d love to hear what you’re finding. Come join our Discord community where we’re all sharing what works (and what doesn’t) with our different LLM setups. There are people there running everything from tiny local models, to on-site rigs costing a couple thousand dollars, to running everything in the cloud, and it’s been incredibly helpful to see what others are actually using in the real world.

The LLM landscape changes so fast that what’s true today might be outdated in three months. Having a community to bounce ideas off of makes it much easier to navigate without wasting money on hardware that won’t meet your needs.

Do Refurbished Hard Disks Make Sense For Your Home NAS Server?

This seems like a question that could be easily answered with math, but there is a big problem. This question has a lot in common with the Fermi paradox, because there are so many important numbers that would need to go into the equation, but we just don’t have the data to plug into those variables.

What was the life of these refurbished hard drives like? Did they get tossed around in shipping? Did they live in a properly cooled datacenter, or were they overheating for five years? Is the reseller being truthful?

Juggling hard drives

We are going to do some simple math in this blog, but we are also going to be leaning at least slightly in the direction of vibes, because I am going to explain to you when I FEEL comfortable using refurbished hard disks.

Refurbished prices and trusted vendors

The people in our Discord community have been buying what feels like a substantial number of refurbished drives from Server Part Deals and GoHardDrive.com. They both tend to have good prices, especially when there is a big sale. They both often offer 2-, 3-, or sometimes 5-year warranties, and friends in our Discord community have had no trouble exercising those warranties.

We have seen 12-terabyte SATA hard disks for $112, or around $9 per terabyte. We have seen 16-terabyte SATA disks for $180, or about $11 per terabyte. This is a pretty heavy discount, because a good sale price for a brand-new 16-terabyte SATA drive is around $250, which works out to near $16 per terabyte.

Things to remember when choosing the size of your drives

Smaller disks have been available to buy for more years than larger disks. That means your refurbished 12-terabyte drives COULD BE three years older than the oldest refurbished 16-terabyte drives. This is probably one of the reasons why smaller drives tend to be offered at a better price per terabyte.

I believe warranties are important. The statistics that Backblaze publish have always told us that annual failure rates tend to double at somewhere around five years of age. That isn’t a massive jump these days, because you’re only moving from around a 2% failure rate to 4%, but it is relevant.

All the hard disks in my network cupboard

A good warranty isn’t just your safety net; it’s a vote of confidence from the reseller. I feel better about the product when the vendor backs it up with a warranty of three or five years. When that 12-terabyte hard drive only has a one-year warranty, it makes me wonder what they know about the service life of that drive that they aren’t telling me!

I wouldn’t personally buy any hard drives smaller that 12 terabytes.

Plan for failures!

Maybe your plan was to build a 6-drive RAID 5 or RAID-Z1 using 8-terabyte hard drives to net yourself around 40 terabytes of usable storage. A quick Amazon search tells me that new 8-terabyte hard drives cost $200 each, so you would be spending $1,200 on storage.

What if we bought six 12-terabyte refurbished drives during the sale three months ago. These drives came with a 5-year warranty and cost $112 each. We could spend $672 on six drives, put them in a RAID 6 or RAID-Z2 array, and have around 48 terabytes of usable storage.

That is 20% more storage and an entire disk of extra redundancy for barely more than half the price. We have hedged our bets a little, bought a little extra room to grow, and even saved enough money to buy a cold spare to keep on hand.

You NEED a good backup strategy!

Let’s start from the other end. When can you get away without having backups? When you are collecting movies and TV shows from the high seas. What happens if you accidentally dump your NAS server in the bathtub and lose every episode of Knight Rider that you downloaded? You just download them again next week. No big loss.

What about if the only copy of the pictures of your late grandmother are stored on that server? You’re not going to be taking those photos again.

Whether you are using brand new hard disks or refurbished, RAID is not a backup. It won’t protect you if there’s a bug in Immich that wipes out you photos. It won’t help you if ransomware encrypts all your files. It won’t help you if your SATA controller or driver goes bananas and corrupts every single drive. It won’t help if lightning takes out the entire server.

It is a good thing you’re saving money buying refurbished drives with big warranties. You can use some of that money you saved to build an off-site backup server.

The more redundancy that you have, and the more separate backup copies that you have, the less the quality of your hard drives will matter.

When would I use a fresh hard drive?

I might be in a somewhat unique position. My data is too big to fit on an NVMe, too expensive to store in the cloud, but still easily small enough to squeeze onto a mid-size mechanical hard drive. This is exciting to me, because you can buy an Intel N150 mini PC for about the same price as one of those hard drives. That means I can just attach a mini PC to every hard drive I buy, and I can always inexpensively add one more on-line backups to my setup.

That means that any remote hard drives that I have should be as durable as is practical. I don’t want to have to drive two hours to replace a hard drive when it inevitably fails, so I probably shouldn’t use a hard drive that already has five years of mileage on it. It is probably worth an extra $100 to reduce my odds of a remote failure in that case.

My remote backup isn’t that far away, and Brian joins us here for pizza night almost every weekend. If my hard drive dies on a Tuesday, I can have a replacement at my door by Thursday, and Brian can haul my mini PC back to me on Saturday.

My vibe math on this situation only applies because my off-site backup storage is a single hard drive. If you’re building a RAID for your off-site backups, you might be able to leverage refurbished drives to squeeze in an extra drive of redundancy and a hot spare while still spending less money. That would surely feel like a win for me!

Conclusion: Trust the math, but back it up with a plan!

After all that, where do we land on refurbished drives? It’s less about a simple mathematical formula and more about having a holistic strategy. The value is undeniable. Getting robust, high-capacity storage for a fraction of the cost of new drives is a game-changer for cash-constrained homelab situations.

The key is to approach things smartly. By purchasing from reputable vendors with strong warranties, sizing up your drives to avoid the oldest stock, and making sure you have a strong backup plan, you can confidently leverage refurbished drives as your storage setup.

Whether your drives are brand new or refurbished, a drive failure will result in catastrophic loss without a backup. The significant savings from going refurbished can and should be reinvested into building a resilient backup solution. A combination of local AND off-site backups would be ideal.

What are your thoughts? Are you ready to take the plunge on some refurbished drives, or does the idea still make you nervous? I’d love to hear your experiences and plans. Join the conversation in our Discord community. We’re always talking about deals, storage setups, and the best ways to keep our data safe.

The Ultimate Li’l Magnum! 15-gram Fingertip Mouse? Using The Corsair Sabre Pro V2 or Dareu A950 Hardware

I have been patiently waiting for the release of the Corsair Sabre Pro V2. It is high-performance, ultralight gaming mouse at a reasonable price from a major manufacturer. You could probably pick one up off the counter at Best Buy, Target, or Walmart. I am super excited about the idea of being able to snag a donor mouse for your custom 15-gram fingertip mouse build near your home.

Li'l Magnum! with Corsair Sabbre Pro V2 guts

The specs are great: up to 8-KHz polling, and 30,000 DPI sensor, nice mechanical microswitches, and a web configurator that works on Linux. I said that the price is reasonable, and I do believe $100 is a reasonable price for a gaming mouse. The problem I have here is buying a brand new mouse for $100 only to immediately take it apart to stick in a 3-gram 3D-printed shell.

You could spend $60 more on a 20-gram G-Wolves Fenrir Asym. The specs are comparable, but you get an injection-molded shell with side buttons. I don’t think the extra five grams are a deal breaker, and you’re getting something that is ready to go. Though you might have to pay a bit for shipping.

If I were in competition, and I do not feel that I am, I would consider the 20-gram G-Wolves mouse my most direct competitor. Probably because it is the mouse I would try next if I had to buy an off-the-shelf mouse.

You don’t have to buy the mouse from Corsair!

I am excited about supporting a modern Corsair mouse. It is $100 today, but there will be sales, and I expect it will be on the shelves for a few years. Someone will stumble across this blog post in four years, realize they already have an old Corsair Sabre collecting dust in their parts bin, and they might breathe new life into that mouse. That is all good news for the future.

What about today? Someone in our Discord community informed me that the Dareu A950 Air probably uses the exact same PCB and electronics as my Corsair mouse. Not only that, but when I posted my progress on Reddit, someone in the comments pointed out that their Dareu A950 Wing also uses the same PCB.

Li'l Magnum! test prints

The blue parts are partial prints to correctly position the screw holes. The yellow prints are complete test prints that I used to align and set the height of the button plungers.

What’s even better than that? The friendly person on Reddit printed a Li’l Magnum! shell and said that their Dareu A950 Wing’s components are a perfect fit for the Li’l Magnum! shell!

The price tracker says the Dareu A950 Wing is usually $64 with 2-day shipping on Amazon, and it has gone as low as $50 in the past. This brings the price down into the territory of the VXE R1 mice, but you get upgraded to lighter electronics, a better sensor, and faster polling rates.

The Dareu A950 at $63 easily makes for the best value Li’l Magnum! with the best specs so far, at least on paper.

Do you need the lightest mouse we can get?

No. I don’t think anyone should be working ridiculously hard and giving up features or strength to make the absolute lightest mouse possible. I am personally just about as happy with my $23 VXE R1 SE Li’l Magnum! at 25.3 grams as I am with my $100 Corsair Sabre Li’l Magnum! at 15.4 grams.

It is difficult to do a completely blind test at my, because the heavier mouse is extremely obvious every time you recenter your mouse. It isn’t more challenging to life the mouse, it is just easy for your brain to register that one mouse weighs 66% more than the other.

The important thing is that I forget that my mouse got heavier after 15 minutes of gaming. My suspicion is that as long as your mouse isn’t too much heavier than your thumb, going any lighter is going to have extremely diminishing returns.

Can it be fun to chase grams? Absolutely. If you enjoy that sort of thing, go for it.

We don’t have a reliable third-party latency test of the Corsair or Dareu mouse!

I don’t think this is terribly important. The cheapest gaming mice manage to come in at something under 1.5 milliseconds of click latency.

There is a full review with latency testing of the MCHOSE L7 Ultra at rtings. It was tested at 0.9 ms of click latency when wired, or 1.4 ms over the 8-KHz wireless link. This is a mouse supported by the Li’l Magnum!, and it is neat that we have a mouse with actual testing.

In practice, I can’t tell the difference between my Li’l Magnum!s with a MCHOSE L7 Ultra, VXE Mad R, or the Corsair Sabre. They all feel the same. If I lost all my Li’l Magnum! builds in a fire tonight, I would order a Dareu A950 Wing from my hotel. I don’t care that it hasn’t been tested by a reputable third party.

I am grumpy about Omron optical switches

My VXE Mad R and MCHOSE L7 both use Omron switches. Out of those four switches, two felt really crummy out of the box. Someone in our Discord community reported a bummer of a right click switch on their L7 as well.

I’ve replaced my disappointing Omron switches with fresh switches, but even the best Omron switches don’t feel great to me. The worst part is that they aren’t compatible with older 3-pin mechanical switches, so I can’t just grab my favorite switches and solder them onto a Mad R or L7. I just have to hope I can find a pair of nice Omron switches.

Li'l Magnum! with Corsair Sabre screw

Those tiny M1.5 screws that ship with the Corsair mouse don’t have a lot of bite, and the Phillips size is tiny and fragile. You do have to screw it down snug and flat, but take your time and make sure you don’t strip the screws!

I have been waiting patiently for a replacement for my 16.4-gram VXE Mad R. I wanted to be down under 20 grams, keep my 8-KHz polling, but I wanted mechanical switches. The Corsair Sabre is definitely the successor to my own Mad R, and it is even more exciting that the Dareu A950 Wing manages to come in at the same price point while beating the Mad R on weight by more than a gram.

I am not an aficionado of mouse switches. My favorite of my collection of budget gaming mice are probably the blue shell red dot switches in my VXE R1 SE, because they are the heaviest and loudest. The clear shell white dot switches in the Corsair sound and feel like they land somewhere between the blue shell switches and the pink shell white dot switches in the VXE R1 Pro.

I am not unhappy with any of these mechanical switches.

Which Li’l Magnum! should you build?!

The tariffs in the US are really bumming me out. They haven’t ruined budget fingertip mice, but they’ve goofed up the floor. You used to be able to build a 25-gram VXE R1 SE for barely over $20 or a 21-gram VXE R1 Pro for just under $30. Either will cost you over $40 today in the United States, and that puts you inches away from a Dareu A950 Wing, which really is looking like the ultimate Li’l Magnum! now.

First of all, I think you should build with what you have. I have designed Li’l Magnum! shells to fit any of the VXE R1 models, the VXE Mad R, all the MCHOSE L7 models, and even a weird $9.60 mouse from Amazon. The best mouse to build your Li’l Magnum! around might be a mouse that you already have!

If you are outside the United States, you might still be able to snag a VXE R1 SE, R1, or R1 Pro for less than half the price of a Dareu A950 Wing. Those all make delightful fingertip mice with fantastic specs, especially for the price, and especially if you can get the models with the smaller 250-mAh battery.

If you are in the United States, I think you should spend the extra $20 or $30 and build your Li’l Magnum! around the Dareu A950 Wing. That is a small price to pay to upgrade to the best available components for the lightest possible Li’l Magnum! build.

I designed the first Li’l Magnum! shell so I could avoid paying $170 for a Zeromouse shell and the Razer mouse to steal the guts from. I didn’t want to pay that much. I expected that I would wind up using it for a week, hating it, and it would wind up collecting dust in the back of a drawer for the next five years. It also helped that the Zeromouse is never in stock.

That isn’t the case, though. I love my ultralight fingertip mouse. I will never give it up, and I am excited that you now have the ability to make the same discovery as I did. You don’t have to pay $160 for a G-Wolves Fenrir or Zeromouse Blade to do give it a try.

Verion 1.0 was just uploaded!

I wrote a lot of words here the other day, because the version 0.9 upload wasn’t quite ready. It was a serviceable mouse, but I created a problem while fixing another. The Corsair PCB is extremely thin and super easy to accidentally flex, and the microswitch pins were getting hung up on some of the supports when installing the PCB. That made it too easy to break your PCB, so I did my best to move those supports to make some room.

Moving those supports out of the way allowed the PCB to flex too much when pressing the left click, and that made the click feel slightly mushy. Only just barely. I might not have noticed if I didn’t have four other Li’l Magnum! mice near my desk to check it against. It didn’t feel terrible, but it didn’t feel like it should.

I added about 0.1 grams of bracing under the left click, and it is now extremely solid. Version 1.0 is up on Printables and MakerWorld, and it should be available in my Tindie store by the time you are reading this.

If you’re interested in how the Li’l Magnum! project has evolved, I’ve since released the Li’l Magnum! Ultralight Fingertip Gaming Mouse 2.0 with significant improvements and refinements based on all the feedback from the original design.

Why should I even buy this from your Tindie store?!

I would really prefer that you didn’t. You’re a gamer. You’re a geek. You should own a 3D printer, and the Bambu A1 Mini is only $250. If you’ve been looking for an excuse to pick up an awesome new hobby, this might be it.

Maybe you don’t have room for a printer. Maybe that’s out of your price range. Maybe you just don’t want to fart around with figuring this sort of stuff out. Maybe you don’t have a friend with a 3D printer.

Li'l Magnum!

My prices are a definitely higher than random places on the Internet where you can just have any STL file printed for you. I have dialed-in print settings for the Li’l Magnum!, so I get you the lightest shell possible. I use multimaterial supports, so you get perfect clicks. I also promise that when you order the correct shell for your mouse that it will actually fit your mouse’s PCB, and I will attempt to adjust the model or give you a refund if the shell doesn’t work with your mouse.

You are also funding the development of future Li’l Magnum! models and improvements. I am trying very hard not to become a collector of gaming mice, but I am already up to having seven different Li’l Magnum! mice on hand. I don’t want to spend more of my own money on mice that I will never use, but I do want to make ultralight fingertip mice more accessible to everyone.

Conclusion

I am excited. I’ve been waiting for the right mouse to build my ultimate Li’l Magnum!, and it is here. When looking at the photos of the PCB before the hardware arrived, I expected the Corsair to tick every box except the weight. I figured this would be a gram or two heavier, and I was delighted to learn that this extra thing PCB wound up being the lightest set of guts that I’ve used so far.

I think every FPS enthusiast should have the opportunity to try an ultralight fingertip mouse. I don’t expect everyone to enjoy the experience as much as I do, but I for one can’t imagine going back to a big, heavy mouse ever again.

What do you think? Do you own a different interesting gaming mouse that you feel deserves a Li’l Magnum! model? I bet we could work out a deal that gets you a free Li’l Magnum! shell while also helping me avoid collecting yet another mouse. Are you already using a fingertip mouse? What do you think of the experience? Tell us about it in the comments, or join the Butter, What?! Discord community to chat with me about it!

Did I Accidentally Build The World’s Most Power Efficient NAS and Homelab Combo Server?

There is a serious problem with the question in the title. It all hinges on what you feel qualifies as a NAS or a homelab. We could serve a README.MD over WebDAV on an ESP32 and call it a power-sipping NAS, and if that is what you had in mind, then the answer to the question in the title is a definitive “No!”

I don’t have Guinness on speed dial, and I doubt that I am literally breaking any actual records either on purpose or by accident, but I am somehow accidentally landing in the top one percent category after ordering a 6-bay Cenmate USB SATA enclosure back in June.

6-Bay Cenmate USB enclosure with my N100 router mini PC

I have not staged any cool pictures for this blog, but it has been ready to publish for almost a month now. This is a photo from one of the previous blogs. I will attempt to correct this in the near future!

I knew the first time that I picked it up after filling it with 3.5” hard drives that the Cenmate enclosure is dense, but I didn’t do the math to understand exactly how dense my enclosure paired with an N100 or N150 mini PC actually is until almost two months later. I have a NAS that hold six 3.5” SATA drives that takes up just barely more than six liters. That is less than a third the size of a Jonsbo N2 case.

I may very well have built the lowest price, lowest power, most dense homelab and NAS setup. I don’t know that you could beat it without unless you buy used parts instead.

NOTE: I don’t ACTUALLY have this NAS built and running in my home, but it isn’t just hypothetical. I do have all the necessary parts on hand to measure the cost, power consumption, and volume. I definitely don’t have the six 26 TB hard disks here to max it out to 156 terabytes!

What about power consumption?

I already know that my Trigkey N100 mini PC that I bought for $143 averages around 7 watts on the power meter. That is running Proxmox with a few idle virtual machines and LXC containers booted.

When I first plugged the empty 6-bay Cenmate enclosure into both my power meter and my mini PC, I learned that the enclosure only uses 0.2 watts of additional power. That is as close to a rounding error as it gets.

At this point I have an empty 6-bay, 6.2-liter Intel N100 NAS with 16 GB of RAM and a 512 GB NVMe that cost me $325, and it is idling away at 7.2 watts.

Plugging in hard disks adds about as much power consumption as you would expect. The meter goes up by 8 watts when you plug a 3.5” hard drive into a bay, and hammering the disks with a mean benchmark brings that up to 9 watts per drive. Your mileage may vary here, because every make and model of hard disk runs a little differently.

My fully-loaded 6-disk NAS idles at about 55 watts, and it maxes out at around 62 watts when the CPU or GPU are under maximum load.

NOTE: These wattages are gathered from notes and blogs. I’m going to plug six real hard disks back in, and power the Trigkey N100 mini PC and Cenmate enclosure using a single power-metering smart outlet to get a proper, correct, real number soon. I am in the middle of torture testing the Cenmate enclosure with massive IOPS on a stack of SATA SSDs, and I don’t want to stop that to re-verify these numbers.

Couldn’t we beat this “record” with a Raspberry Pi?

Yes. A Raspberry Pi would drop the price by $50 to $70, and it would drop the idle power consumption by 3 or 4 watts. It might even be slim enough to bring the total volume down to an even six liters!

I don’t think this is a good trade. Proxmox on an x86 machine is fantastic, and gives you a lot more flexibility and way more horsepower. It is hard to beat an Intel N100 or N150 when you’re transcoding with Plex or Jellyfin. Most Intel N150 mini PCs come with twice as much RAM as the most expensive Raspberry Pi, and they ship with a real NVMe installed, so you don’t have to boot off a fragile SD card. The mini PC will also already be installed in a case, and it comes with its own power supply.

We are starting to see Intel N150 mini PCs with one or sometimes two 2.5-gigabit Ethernet ports down near $150. That is a nice feature to get effectively for free, and the best part is that an Intel N150 is fast enough to encrypt Tailscale traffic at around 2.4 gigabits per second. That is something a Raspberry Pi can’t manage, and that is extremely important for my setup.

I don’t like focusing on volume and liters

Volume is not a terribly interesting measurement for most home users. We could build a custom two-liter server that is a few inches wide, an inch tall, and 32” deep. That would be awful! It would hang off the front of your desk!

In the olden days, you would be excited if your physical shop had 100’ of frontage along Main Street. There’s a similar concept that applies to the linear footage of your desk. It almost doesn’t matter how tall something is, as long as it isn’t too wide or too deep, then it’ll fit well on the surface of your desk.

A 4’ tall but narrow server might look silly on your desk, and that might be too tall to even hide under your desk.

I feel that my build is very well suited to sitting on the edge of your desk. It is only about five inches wide and eight inches deep, and it is still less than a foot high.

This might be the laziest way to build a DIY NAS!

Two power cables and one USB cable. That’s it. Just place the two boxes on or near each other and plug them in. That’s the hardware setup for this DIY NAS. Slide in as many hard disks as you need, and you’re ready to set up your software.

It almost feels like cheating.

Aren’t USB hard drive enclosures scary?!

I am currently doing my best to torture test my Cenmate enclosure. I have been running continuously running fio randread tests averaging 60,000 IOPS across a RAID 0 of old SATA SSDs. The test has been running for 14 days straight without a single error as I am writing this paragraph.

USB storage was sketchy in the USB 1.1 and USB 2.0 days. Things have gotten a lot more solid in the last few years. Professional-grade video cameras write RAW video directly to USB SSDs. Professional video editors are working directly with the footage over USB, or many of them are copying that footage to other USB SSDs and working from that copy.

That entire world loves Apple laptops, and Apple laptops don’t have any options for large amounts of storage besides the USB and Thunderbolt ports. These things have to be well made now.

You don’t have to follow my Intel N100 blueprint!

Mini PCs, simple external USB hard drives, and 6-bay USB enclosures are a lot like Lego bricks. Need a lot of storage? Plug in a bigger Cenmate enclosure. Still not enough storage? Plug in a second one? Need more RAM or CPU power? Use a beefier mini PC!

An example would be the Acemagician M1 that I use as a Bazzite gaming machine in the living room. It also idles at around 6 watts when running Proxmox. It costs twice as much as an Intel N150 mini PC, but it is also more than three times faster and can hold twice as much RAM.

The price will go up a bit, so we wouldn’t be building the lowest cost 6-bay NAS anymore, but you definitely get some upgrades for your money. The Intel N100 does manage to beat the Ryzen 6800H in the Acemagician M1 by a small margin, and my 6800H uses 50 watts of power to while transcoding for Jellyfin. My Intel N100 transcodes faster, and that mini PC uses less than 15 watts while doing it. This is not a big deal unless you watch movies 12 hours every day.

The Acemagician M1 is a good value for your homelab if you can get it on sale. I paid around $330 for mine. It is a good fit because it has two DDR5 SO-DIMM slots, two m.2 NVMe slots, and 2.5-gigabit Ethernet. That’s about as good of a combination as you can get in this price range.

You don’t have to build a NAS, you can directly attach the Cenmate enclosure to your computer!

I could write an entire blog post listing tons of good reasons why you might want to have a NAS on your home network.

I can’t do the topic justice in a couple of paragraphs, but I can say this! When the cost of turning a 6-disk enclosure into a NAS is only an extra $150 or so, there isn’t much excuse not to do it.

Even though it is inexpensive, you don’t have to do it. Maybe you just need a place to store footage when you edit videos at home. Maybe you need storage for your daily or weekly backups. You might already have to plug your laptop into a docking station when you sit at your desk at home, and your Cenmate enclosure can just stay plugged into the dock. This is a fine workflow to have.

What if you want to set things up so you can have remote access to that footage when you aren’t at home. Your home Internet connection may not be fast enough to edit video directly, but being able to grab a video file in a pinch could save you a drive. That’s a good reason to set up a NAS with Tailscale.

Conclusion

Should you build your DIY NAS out of a mini PC and a USB enclosure? I don’t know! My NAS needs are simple to the extreme. I don’t need my NAS to have a management interface. I manually set up my RAID arrays and the two shares or NFS exports I might need. I have absolutely no idea what TrueNAS does when you plug in an enclosure like this. Since it is USB-attached-SATA, I assume TrueNAS will treat them just like any SATA disks, but I haven’t tested this.

I just think it is neat that my lazy and simple set of LEGO-style pieces here wound up being nearly the most power efficient and storage-dense setup that anyone could make even make with off-the-shelf parts, and USB enclosures like the ones from Cenmate fit my use case extremely well. I enjoy having the extreme level of flexibility.

What do you think? Can you build a more densely packed NAS that uses mechanical hard disks? Can you do it without spending too much more money? Will your build sip even less power? Will it sip enough less power to make a difference on my monthly electric bill? You should join our friendly Discord community to tell us about your build, or to give me a link to your write-up so I can point people to it!