Can We Compete With The Zeromouse For Under $25?

| Comments

I am just going to put the tl;dr right up at the front. I wanted something like Optimum Tech’s Zeromouse. I tried the fake Logitech G304 upgrade, and it was awesome using a 29-gram mouse, but the electronics were jittery and terrible. I wound up discovering the 49-gram VXE Dragonfly R1 Pro for about $45. I ordered one. I love it. And I managed to design a skeletal shell that brings the mouse down to under 24 grams. I would still consider it a prototype, but it works great.

My 3D-printed skeletal fingertip ultralight mouse

I have ordered a $22 VXE Dragonfly R1 SE. It should fit the same 3D-printed skeleton with minor tweaks. It might wind up being a couple of grams heavier, but I am excited that you will be able to print your own Zeromouse-style ultralight mouse for under $25!

I am not even sure how to refer to this style of mouse. “Ultralight” seems appropriate, and I have heard them referred to as fingertip mice. I am partial to calling them skeletal gaming mice, but I may have made that up on my own!

The 49-gram VXE R1 Pro is amazing

I was gaming with a 100-gram Logitech G305 last month. My ultralight-mouse shenanigans clued me in that I could pair a tiny piece of aluminum foil with a 7-gram USB-C rechargeable AAA battery, and that dropped the G305 to 79.5 grams. That was an upgrade that I noticed, and I thought it felt fantastic. After spending a week with the VXE Dragonfly R1, the Logitech G305 feels like a brick.

I don’t think you need to modify your VXE Dragonfly at all. Mine was 49 grams out of the box, and it feels amazing. The battery lasts about a week on a charge for me.

My VXE Dragonfly R1 Pro and my Logitech G305 side by side

My 49-gram VXE R1 Pro next to my 100-gram Logitech G305

The lightweight VXE R1 Pro feels a little cheap, but I don’t think that can be helped. The Logitech feels premium because of the heft. You can’t shave weight off every single part of the mouse without making things feel a little flimsy.

If you are still using a heavy, old-school mouse, I most definitely think you should try one of these mice. I don’t even think you need to swap it into one of my 3.7-gram shells. A 50-gram mouse is a HUGE upgrade, and you don’t even have to adjust to a weird style of mouse. The VXE R1 has a similar enough size and shape to my Logitech G305.

The R1 Pro isn’t even the lightest mouse from VXE. I just gravitated toward this model for this project because it is quite light, and I was tickled by the fact that there is an $18 model. You can buy a 36-gram VXE Mad R mouse with an 8K receiver for $46, and with a 36-gram mouse, do you even need to mod it down to 21 grams?

If you are planning to use my 3D-printed skeletal shell, then you want the VXE R1 Pro and not the Pro Max. The Pro Max has a larger battery that adds an additional 6 grams.

I think Optimum Tech’s Zeromouse Blade is worth every penny

Though I am saying this as someone who hasn’t tried the real Zeromouse. It is 3D printed using laser-sintered nylon. Nylon is a much more premium material, and the printing process generates a more premium-feeling product than my Bambu Lab A1 Mini makes at home. Not only that, but the custom lightweight electronics in the Zeromouse are known to be high quality and extremely low latency. I believe $150 is a reasonable price for a custom piece of hardware like the 19-gram Zeromouse Blade.

I wasn’t even aiming at the new Zeromouse Blade when I ordered the VXE mouse. I was just hoping to make something similar to the previous iteration of the Zeromouse.

Aside from the previous revision of the Zeromouse shell being out of stock, I was disappointed that it required an $80 Razer V2 donor mouse. That is an outdated model now, which does make it cheaper than the Razer Pro V3, but it might not be available to purchase for all that much longer. I’d hate to design a mouse shell around something that is going away!

I also didn’t like the idea of spending $150 on a mouse that is so much different that what I have been using for 35 years. What if I hated it? What if I used it for an evening, threw it in a drawer, and never looked at it again? That would be a bummer!

I already tried a cheap fingertip mouse with a terrible sensor, and I enjoyed it a lot. My VXE Dragonfly mod is my second ultralight mouse. I can tell you that I would happily pay $150 for the new Zeromouse Blade, but that would be a lot less fun than designing my own and sharing it with you!

Where are we so far?

I am honestly amazed at how quickly this is progressing. I messed around in OpenSCAD for a couple of days before even taking the mouse apart, and I had all the screw holes within 0.5 mm of correct before I even had the PCB in hand to test out. My first full print was a little too small, but my second print was an actual functional mouse!

It took two more partial test prints to both perfectly line up the button positions with the paddles and to get a snug lateral fit on the PCB. Hugging the sides of the PCB significantly increased the rigidity of the mouse, and it feels quite solid now. That is the model that I uploaded to Printables and MakerWorld before starting to write this blog post.

Lots of progress on the skeletal fingertip mouse

A couple of days later, yesterday as I am writing this paragraph, I came up with a better solution for keeping the mouse wheel in place. I printed the tiny new part that slips over the microswitch, and I snipped out the old wheel support brackets from my existing print. This makes it SO MUCH easier to assemble a working mouse, it weighs slightly less, and I knew it would allow me to tighten up the wiggle in the button paddles.

I significantly tightened up the paddles this morning, put a slight angle on the finger grips, and eliminated the wheel supports. That print was only 0.1 grams heavier, way sturdier, and seemed to fit great, but the extra rigidity in the paddles meant that my buttons were both stuck in the on position! I am waiting for that part to print as I write this paragraph. I want that plunger length dialed in perfectly!

I feel like I am just about in a place where I get away with calling the STL file version 1.0, but the OpenSCAD source needs a lot of tidying up.

I think I am doing pretty well at aiming for six days of testing!

We have a 23-gram, 26,000-DPI, low-latency mouse that you can build for less than $50. I think that is fantastic, and I wouldn’t be surprised if I can get that down under 22 grams with a few slicer tweaks and a switch from PLA to ABS filament. The entire skeleton is 5.7 grams when printed in PLA, so there isn’t a ton of room for improvement here.

UPDATE: Printing my skeletal fingertip mouse with single walls and fewer top layers brought my total mouse weight down to 21.4 grams, and the whole thing feels nearly as rigid as it did with default print settings. I haven’t switched to ABS yet, but that ought to save another 0.7 grams or so. I don’t believe this is necessary.

I am confident that it won’t take much effort to tweak this design to fit the $18 VXE Dragonfly R1 or the $22 R1 SE. I imagine that those mice will feel indistinguishable during gameplay, except that I don’t know how much their circuit boards weigh. I do know for sure that we will have to print a small, dark piece of plastic to cover the giant LED in those cheaper mice!

I have had to shim the buttons

Some of the skeletons I printed felt fantastic. Others have buttons that felt spongy. One of the skeletons with all the same offsets was so tight that the buttons were constantly pressed without even touching the mouse.

The trouble is that the little plungers sit on top of support material, and they don’t always come out exactly the same. I have tiny support pegs under the switches, and I figured out that I can cut a few tiny pieces of electrical tape to use as shims between the PCB and that peg to dial in the feel of the buttons.

3D-printed skeletal fingertip mouse at 23.33 grams

The mouse I am using at the moment has one layer of tape under the right button and three layers of tape under the left. Each layer is about 0.15 mm, but the rubbery tape is a bit squishy, so it might take up less space in practice.

I have been printing my prototypes on my Bambu A1 Mini at my usual layer height of 0.16 mm. I can definitely switch to 0.1 mm layers for more precision, and I can also switch to a PETG support interface layer with zero gap to set the height of the plunger with more accuracy. That will help for my own mouse, but I have no idea what the tolerances are on your PCB!

NOTE: I may have mostly figured out the solution to this problem. We’ll see if it holds true. I am excited that the easy fix also dropped the total weight of the mouse to 21.4 grams! The explanation of the problem and why the solution works would be almost as long as this blog post.

My hopes for the OpenSCAD source code

I would love it if we could just punch some new mounting hole locations and button positions into the OpenSCAD code and generate a skeletal mouse for any similar gaming mouse PCB, but I wasn’t nearly careful enough in the layout of the code to allow for that.

Underside of the 3D-printed skeletal fingertip mouse

I ordered a huge pack of 6-mm PTFE skates. I don’t have many of these ancient football-shaped skates on hand. Thankfully, the textured bottom surface makes them easy to remove, or I would have run out of them on the first day!

I will be stoked if I can clean things up enough that you could go to MakerWorld’s parametric model maker so you can enter your own button height and length, thumb and finger pad positions and lengths, and tweak some settings to make the buttons stiffer or looser. I’m not far from that being possible, so I plan to set that up once I put the finishing touches on my own mice.

I would like to add a curve to the button paddles and finger grips, and that should be configurable in the parametric configurator as well.

How do I like gaming with a 23-gram fingertip mouse?

It is weird, and I am most definitely not quite acclimated to it yet. If you think switching to a featherweight mouse will instantly improve your FPS-gaming abilities, you will be extremely disappointed. You are almost definitely going to get worse before you get better!

My old Corsair mouse was nearly 100 grams. My recently purchased Logitech G305 is around 100 grams with a single AA Eneloop battery installed, but I have been gaming with it using a 7-gram USB-C rechargeable lithium-ion battery and a small piece of aluminum foil. That has brought the weight down to 79 grams.

Playing Team Fortress 2 with the stock 49-gram VXE R1 Pro already made the 79-gram Logitech mouse feel like an absolute brick. Switching from the 23-gram skeletal mouse back to the brick feels even more dramatic.

Pat's time played in Team Fortress 2 by class

I have both mice set to 3,200 DPI, and I can execute reasonably precise 90- and 180-degree turns with either mouse, but the skeletal mouse FEELS faster. It is especially noticeable when an enemy gets close, and you have to quickly adjust your aim to lock on and track them. I keep overshooting with the lightweight mouse. I had the same problem when I switched from the G305 to the R1 Pro, but it is more extreme with the skeletal mouse.

It takes so much less effort to get the mouse to start moving, so I am tending to push the mouse too hard for small and medium moves.

I am not good at playing scout in Team Fortress 2, but I assumed this would be the class where I would most notice the missing weight of the mouse, and I am definitely doing worse playing scout at 23 grams than I was at 49 grams. That said, I am pretty good at landing pills with the demoman, and I feel like I am doing a reasonable job there.

I suspect that the answer is going to be lowering my sensitivity. It will be easy to adjust to needing to move my mouse farther to turn around, and a lower sensitivity will help me be more precise on the smaller movements. I just figured that I should give myself more time to acclimate before making changes.

Do you want to know what the weirdest thing is about these light mice? You can’t spin the scroll wheel unless you are gripping the mouse. When you attempt to spin the wheel on a loose mouse, you wind up just pushing the mouse around!

Conclusion

It is exceedingly difficult to not print just one more mouse. I keep saying that I am going to stop, but then a couple of hours go by, and I print a slightly different mouse. I am really going to stop now. At least for a while. I feel like I need to put a week or two into actually using a good skeletal fingertip mouse. That way I will have a better idea of what improvements I should actually be making.

I am aware that I have not yet met the expectations of this blog post’s title. I have only modded a $45 mouse down to 21 grams. The $23 mouse is on its way, though, and I am confident it will be easy to tweak the design to make its hardware fit my shell!

What do you think? Are you excited about the idea of a $25 fingertip mouse? Or do you think something like the 36-gram VXE Mad R is light enough? Do you prefer using a heavier mouse? Are you planning on trying out my model, or are you already using a different fingertip mouse mod? Tell me about it in the comments, or join our friendly Discord community where we talk about 3D printing, video games, and many other geeky topics!

Should You Enable Smart Queue Management To Mitigate Bufferbloat?

| Comments

I haven’t had Smart Queue Management (SQM) enabled on my Openwrt router in a very long time. I am not even sure if what I was using in the olden times was called the smart queue management, and it has been so long that I don’t even remember exactly what I used to have enabled on all my old asymmetric cable and DSL connections. All I remember for certain was that I used to use some sort of quality of service (QoS) package for OpenWRT that would let me set numbers for my upload and download speeds that were a little lower than my actual speeds, and this would keep my link from getting saturated and driving up latency.

My bufferbloat test results after dialing in SQM on my OpenWRT router

It was more than a decade ago when I upgraded to my first symmetric fiber Internet connection at home, and that massive boost in upload speed on my 35/35 megabit connection made enabling QoS almost entirely useless. We have had several upgrades over the years, and today we have a symmetric gigabit Ethernet connection at home. We could go higher, but then we would need to pay more AND upgrade the router!

I have once again enabled OpenWRT’s SQM. One of my friends in our Discord community was bragging about his A+ score on the bufferbloat test score after configuring fq_codel on his OPNsense router, and that made me curious whether or not I should be doing the same.

Do you need queue management with a fast symmetric Internet connection?

Queue management has a dramatic impact when your Internet connection’s download speed is 10 or 20 times faster than your upload speed. There are several reasons for this, but you can see the most obvious culprit if you watch your network traffic while running an Internet speed test. You will probably see around 2% as much traffic on your uplink while your download test is running.

The additional latency when filling up your cable modem’s download capacity isn’t a bufferbloat problem. This is just a consequence of not having enough upload bandwidth available. If you have a cable modem, you should probably be running some sort of queue management.

Bandwidth utilization during download speed test

You can see my meter showing 16.7 megabits of upload bandwidth being used while the speed test is pulling 831 megabits on the downlink

In my case that is around 17 megabits of uplink while seeing 800 megabits down on the speed test. This is nearly enough upload traffic to fill up Comcast’s gigabit cable service, and I’m not even sending any actual useful data up.

I don’t have to worry about this, because I still have more than 800 megabits of unused uplink bandwidth.

This is probably the tl;dr for this blog post. I have never noticed additional latency from other people in the house streaming movies or downloading large files. It never shows up on my Smokeping graphs, and I never feel it when playing something latency sensitive like Team Fortress 2.

I doubt I will ever notice that I have enabled SQM on my OpenWRT router, but I am glad that I have it enabled now just in case!

Enabling SQM made a huge difference, so why won’t I notice it?

The ping times only go up when I max out my available upload or download bandwidth. We could stream 10-bit 4K HDR content to every screen in the house and only use a fraction of our available bandwidth. Most sites on the Internet don’t manage to saturate my connection, though I can max out my uplink when publishing a YouTube video, and Steam can just about fill the whole downlink when updating a game.

Bufferbloat results before enabling SQM

I disabled SQM after I had everything dialed in just to run another test and take this screenshot. I don’t remember the original test result being nearly this bad!

Either of these things usually only take a few minutes. Those few minutes would have to coincide with me playing a latency-sensitive multiplayer game or participating in a video call. The odds of that happening are slim.

What OpenWRT settings did I change?

I think it was worth trading a couple of dozen megabits of bandwidth for much improved worst-case latency, but it was a bit of work to get things dialed in right, and my exact settings won’t work for you. I do think it is worth telling you what worked well for my setup.

You should follow the advice on traffic shaping from the OpenWRT wiki. That’s what I did!

I tried to use the cake module, but my aging Linksys ACM3200 just doesn’t have enough single-core CPU performance. The cake SQM immediately maxed out one of my cores during a speed test, and my download speed was limited to 560 megabits per second.

My router has plenty of horsepower for running fq_codel and the simple.qos script. From there, I just kept bumping up my download and uploads speeds on the basic settings tab until bufferbloat test test results showed my latency increasing. When I they did, I backed off a bit.

I got curious while writing this blog post, and I asked Google if there were any options for getting cake to use more than one CPU core. I didn’t expect to have much luck, but I figured there was a chance that there is something newer and better than cake. I found nothing of the sort.

Packet Steering in OpenWRT LuCI settings

I did find something useful. There is a checkbox in the global network options to enable packet steering. Ticking that box allowed me to inch my upload and download speeds up by 20 or 30 megabit before my latency started to increase again. That was a nice little bump, but as it suggests in the OpenWRT interface, your mileage may vary here!

Conclusion

I feel that turning on SQM traffic shaping and giving up 5% of my maximum bandwidth to reduce my worst-case latency by dozens or even hundreds of milliseconds is a fantastic trade. That is such a small price to pay to never have to worry about a random download or update somewhere in the house goobering up my aim in Team Fortress 2!

I bet I spent an hour or two dialing in my downlink and uplink numbers to get as much bandwidth as I possibly could without significantly pushing up the latency when congested, but you don’t have to work that hard. Just enabling SQM and entering what you believe are your actual upload and download speeds will probably get you 90% of the way to perfect, and that is probably way better than you were doing before!

If you found this blog post insightful, I would love to hear your thoughts and experiences with Smart Queue Management (SQM) and OpenWRT! Have you noticed a difference in your network performance after enabling SQM? What settings worked best for you, and do you think queue management is necessary for high-speed symmetric connections?

Join our Discord community where we share tips, tricks, and experiences with our own homelab setups! Engaging in discussions with fellow enthusiasts could help you fine-tune your setup and understand your network better. Let’s optimize our networks together!

Being Able To Print ABS On My Bambu A1 Mini Is Delightful

| Comments

This is going to be a short post, because I don’t have a lot to say. I am sure you have been seeing the hubbub lately about how you can print ABS on top of a layer of PLA while keeping the bed temperature way down at 60C. I am mostly just here to tell you that it works, and it works extremely well!

I have been digging my own rabbit hole the last couple of weeks. I discovered a superlight skeletal mouse shell replacement for a fake Logitech G304 mouse that cuts the weight from around 90 grams to 45 grams.

Skeletal mouse printed in ABS on top of PLA on my Bambu A1 Mini

I modified the design a bit to make it more comfortable, then I removed some parts to drop a few grams. That wasn’t light enough for me, so I started tweaking top layers, bottom layers, and perimeters to shave even more weight. I knew the next step was to print in ABS to save another 2 or 3 grams.

The trouble is that my dedicated ABS printer is my Sovol SV06. It is set up with a 0.6-mm nozzle, so it tends to lay down more plastic than necessary for spindly parts. Even with tweaks that was going to wipe out all my savings from the switch to the lighter ABS filament.

The other problem is that the Sovol doesn’t dial in flow rate automatically like my Bambu A1 Mini. I used to print exclusively with ABS when I owned a wooden 3D printer in the olden days, and I absolutely hated removing support material from ABS prints. I knew dialing in the flow rate perfectly would help, and the Bambu would do that for me.

Why can’t the Bambu A1 Mini print ABS?!

You need to get your print bed up to 100C to keep ABS in place. I tried printing ABS with the bed set to its maximum temperature of 80C when I first got the printer, and I had partial success. Very small parts printed fine, but my parts that were around an inch square at the base came loose from the bed after a dozen layers.

You can get around this problem by setting the first layer to print in PLA. That lets you leave the bed at 60C or 65C, and the PLA has no trouble adhering to the bed for the entire print job. In fact, it seems that the PLA first layer does a better job adhering to the bed than ABS will manage with a 100C bed!

You don’t have to have an AMS Lite to make this happen. You can manually swap filament after the first layer.

Filament use

I wound up reducing the weight of the mouse by almost three grams by switching from PETG to ABS

I don’t even know the correct way to print ABS with the AMS Lite. The printer and slicer didn’t even let me select ABS filament when I tried it last year, and I have been printing my mouse in PETG. I just edited the PETG profile settings that I was already using. I bumped the minimum temperature to 255C, bumped the flow rate up to 19 cubic mm/s, and I just let the machine believe it was printing PETG.

That probably isn’t ideal. The ABS profiles for the Bambu A1 surely have tweaks that do a better job of controlling cooling for bridging, but this definitely got the job done.

I couldn’t believe how easy it was to remove the tree supports!

In the days when I only printed ABS, I avoided supports at all cost. In modern times, supports aren’t so bad with PLA and PETG. Both the slicers and the printers are way better than they used to be, and they’ve improved significantly in just the last couple of years. Supports usually just pop right off of my print on the A1 Mini, while I used to have to attack the same sorts of prints with a pair of pliers on my Prusa MK3S.

I only just realized this week that I may not have printed an ABS part using supports since tree supports were added to Prusa Slicer.

I was amazed. The ABS tree supports almost fell right off my print. It was almost like using a PETG support interface layer on a PLA print. It was a delight!

But my printer’s bed isn’t limited to 80C, is this of any use to me?!

Yes! It seems that printing ABS on a layer of PLA has better adhesion and less chance of warping than printing ABS directly on the bed. This is true with either an open printer or which a chamber.

I imagine there is a point where a heated chamber and a properly hot bed will help keep a tall and complicated ABS part from splitting at the layer lines half way up, but I think I am always going to default to using a layer of PLA on ABS prints whenever the option is available.

Conclusion

I am not breaking any new ground here. JanTec Engineering might have been the first to show us the technique, and Thomas Sanladerer did some really good testing to show us which filament combinations do and don’t stick to each other.

I just figured it might be a good idea to write this blog, raise my hand, and say that this also worked well for me!

ABS is an important filament for me. It is one of the few common 3D-printing materials that survives sitting on the dashboard in the Texas sun. I don’t print ABS a lot, but when I need it, I actually do need it. Being able to print ABS on my Bambu A1 Mini just gives me one less excuse for hanging onto my Sovol SV06!

Should You Run A Large Language Model On An Intel N100 Mini PC?

| Comments

The simple answer is no. The Intel N100 is slow and only has single-channel RAM. You’re not going to get any work done chatting at a reasonably sized and capable LLM on your Celeron N100 mini PC.

A better answer is more complicated, because you can run a large language model on anythin with enough memory. I set up a Debian LXC on one of my N100 mini PCs, installed Oobabooga’s text-generation webui, and downloaded Qwen 2.5 in both 1.5B and 0.5B sizes in 5-bit quants to see how things might run.

AI Man talking to a tiny llm robot

What’s the tl;dr? They ran, and don’t use much RAM, but they are both quite slow. When I just go back and forth asking small questions, it starts responding in just a couple of seconds. It took nearly four minutes to write me a conclusion section when I paste it an entire blog post. Short questions average 5 tokens per second with the 1.5B model, while the task with 2,500 tokens of context averaged around 1 token per second.

I did not attempt to finagle Oobabooga into using the iGPU. I suspect that might slightly improve time to first token, but the bottleneck here is the slow RAM.

It is probably also important to say right up front that you don’t have to use a Celeron N100. You can get a Ryzen 5600U or 5800U mini PC for around double the price, and those have dual-channel RAM, so they should run language models almost twice as fast. You should also keep in mind that twice as fast as slow is probably still slow!

What do I usually do with my local LLM?

I sometimes use Gemma 2 9B 6-bit to help me write blog posts. It is the first local model that does a nice job generating paragraphs for me, and it runs somewhere between 12 and 15 tokens per second on my 12 GB Radeon 6700 XT. Time to first token is usually just a second or two even when there is a good bit of context, though it does take a little longer when I paste in an entire 2,500-word blog post.

I find myself using OpenAI’s API more often these days. The API is cheap, more capable than my local LLM, and so much faster.

OpenAI’s extra speed isn’t all that important, but it does help. My local LLM will output a conclusion section for my blog a little faster than I can read it, but being able to see the entire output from OpenAI in three seconds is definitely an improvement.

My eyes will skim into the second or third paragraph, and they might see something incorrect, too corporate sounding, or just plain goofy. I don’t have to read every word to know that I need to ask OpenAI to try again. This doesn’t save a ton of time, but it does save time.

Why do I let the robots write the conclusion section for my blog posts?

The truth is that I rarely use more than a paragraph or two of the conclusion that the LLM gives me, and what I do use winds up being moderately edited.

The conclusion section of a blog post tends to be a little repetitive, since it will at least be a partial summary of everything that I already wrote. Letting the artificial mind take a first pass at that saves me some trouble.

My Oobabooga Proxmox LXC stats while running Qwen 2.5 0.5B

Not only that, but sometimes the LLM will write things in the call to action that I would never say on my own. I have been embracing that, and I have been letting some of those sentences through.

Gemma 2 9B is the smallest model I have tried that does a good job at this. In fact, I often feel better about the words coming out of Gemma’s mouth than I do about the paragraphs that ChatGPT writes!

How awful is Qwen 2.5 1.5B compared to Gemma 2 9B?

I don’t want you to think I did some sort of exhaustive test here, and I certainly didn’t run any benchmarks. I just asked Qwen to do the same things I normally ask Gemma.

I will say that I was extremely impressed with how coherent Qwen 1.5B was when I asked it to write a conclusion for one of my blog posts. It included details that were drawn from the entire post, and it wrote paragraphs that mostly made sense.

Qwen 2.5 1.5B trying to write a conclusion for one of my posts

Qwen 2.5 1.5B’s attempt at writing a conclusion for a blog post about a 33-gram mouse

Some of the conclusions that it drew were incorrect, but those could be easily fixed. Qwen 1.5B just didn’t sound as friendly or as real as Gemma 2 9B.

Why not run Gemma 2 9B on the Intel N100?

Things aren’t quite this simple in the real world, but the speed of an LLM is almost directly proportional to the amount of RAM or VRAM the weights consume.

Qwen 0.5B runs between two and three times as fast on my Celeron N100 as Qwen 1.5B. Odds are pretty good that Qwen 1.5B will run four to six times faster than Gemma 2 9B.

Qwen 1.5B is already too slow for me to work with.

What was I hoping the Intel N100 would be fast enough to accomplish?

Just because I wouldn’t wait around for four minutes to have Qwen 1.5B write worse words than ChatGPT can write in two seconds doesn’t mean running an LLM on this mini PC would be useless.

I have been grumpy with Google Assistant for a while. I often use it for cooking timers, and it used to show me those timers on my phone, but whether the timer display would show up was inconsistent. Now I can’t remember the last time I saw that work. Sometimes I can ask about the kitchen timer from the bedroom, and sometimes I can’t.

I was livid last week when I asked how much time was left and Google happily responded with, “OK! Consider it cancel.” I already don’t like that her grammar is poor, but now I had to guess how much longer my potatoes needed to be in the oven.

Many of the parts to assemble your own local voice assistant exist today. There are projects building remote WiFi speakers with microphones using inexpensive ESP32 development boards. Whisper AI does a fantastic job converting your speech into text, and it can do it pretty efficiently. There are plenty of text-to-speech engines to choose from.

I know that Qwen 0.5B isn’t built with function calling in mind, but that is all right. I just wanted to know how quickly a model like this could respond to simple questions that I might ask a local voice assistant.

The Celeron N100 did a fine job. She responds to short questions almost as quickly as my cheap Google Assistant devices, though I was definitely cheating because there is no speech to text step happening here!

Does a Celeron N100 have enough horsepower to process audio and run an LLM?

Sort of. This is the first time I have ever bothered to turn on the whisper_stt and either the coqui_tts or silero_tts engines. At first, I was a little disappointed. Then I switched whisper_tts from the small.en model to the tiny.en model.

My test question was speaking, “How is the weather?” into my microphone. The tiny.en model could process that text in under 1.5 seconds. Then Qwen 2.5 0.5B could generate a response in roughly 2 seconds. The unfortunate part was that either of the included text-to-speech engines required dozens of seconds to generate audio to be played back.

Asking my local LLM on the Celeron N100 questions that it cannot answer

I am confident that the latter problem can be solved. I remember using tran.exe on my 7.16 MHz Intel 8088 more than thirty years ago. It may not sound as good, but we can definitely generate a voice response WAY faster than coqui_tts in 2024.

I understand that Qwen can’t tell me about the weather, but this is the sort of thing I ask Google Assistant all the time. How else will I know if it is raining? I can’t go ride my electric unicycle in the rain!

I was just trying to figure out if my Intel N100 mini PC would be capable of running a competent enough LLM to be the voice assistant for my home. I do believe that it could do the job. We would just need to find a tiny LLM that is tuned for function calling.

I have high hopes for the future of small models!

First of all, inexpensive computers are getting faster all the time. The Raspberry Pi was the best deal in tiny, power-efficient computers five years ago. Then the Pi got way more expensive, and mini PCs with the Intel N5095 started getting cheap. It is hard to beat an Intel N100 today for low-price, low-power, and relatively high-performance compute, but in a couple of years it will surely be displaced by a newer, faster generation of processor.

Smaller LLMs are getting smarter. I started my machine-learning journey a little more than a year ago messing around with Llama 2 7B, and it was barely competent at the blog-related tasks I was hoping to squeeze out of it. Llama 3 8B and Gemma 2 9B are both miles ahead and extremely useful.

As long as the small LLMs keep improving, and low-power mini PCs keep getting faster, we will reach a point where these meet in the middle before long. Then we will have a fast and capable voice-activated intern sipping 7 watts of electricity.

Conclusion

Did I find an answer to the question I didn’t even ask? Yes.

I was initially curious just how big of a large language model we could run on the Intel N100 mini PCs in my homelab, and that led me to wonder if this was enough horsepower to handle the hardest parts of what our Google Home Mini devices are doing. I think the Intel N100 is up to the task, and it isn’t even offloading the hard work to the cloud like the Google devices are.

I think I am going to leave this running. Having Qwen 0.5B loaded is only eating up 800 megabytes of RAM on one of my Proxmox servers. Maybe that will give me an excuse to poke at it every once in a while to see how useful it is, but I suspect that the high time to first token will keep pushing me away.

What do you think? Do you have a use case where you could tolerate waiting several minutes for a lesser LLM, or would you prefer to use the cloud? Do you have a use case for a small model on a power-sipping server? Tell us about it in the comments, or join our Discord community where we talk about homelab servers, home automation, and other related topics!

Can We Make A 33-Gram Gaming Mouse For Around $12?

| Comments

UPDATE: This is a fun read, but don’t print this! I have put together a 21.5-gram fingertip mouse for the VXE Dragonfly R1 mice. This is built around a proper high-DPI, low-latency gaming mouse that starts at about $18. The $8 fake Logitech mouse isn’t even close.

I got here in a rather haphazard way. My inexpensive but surprisingly awesome 10,000 DPI low-latency Corsair Katar Pro wireless mouse was having connectivity issues. The same day, there was a deal on a Logitech G305 Lightspeed mouse for $30, so I immediately ordered one. These mice have a similar shape, similar weight, and the Logitech G305 might have slightly better latency and definitely has a way better sensor.

The skeletal fake G304 superlite mouse that I am actually using

This is the mouse I am currently using. It has the extra finger rest on the side, and that 0.17-gram bar across the front makes the buttons feel SO MUCH less flimsy!

It took less than 24 hours for the Logitech mouse to arrive, and I discovered in those 24 hours that my Corsair mouse was stuck in a low-power mode. I did some button-holding shenanigans while turning the mouse back on, and it is now happy. It now has a permanent home in my laptop bag.

One of the awesome things about the Logitech G305 is the ridiculous number of 3D-printed mods available on Printables and Makerworld. One was a mod inspired by Optimum Tech’s 26-gram wireless mouse. The trouble is that this Logitech G305 mod isn’t for a REAL Logitech mouse. I had to order a fake Logitech mouse to try it out!

I immediately printed the superlight skeletal mouse in PETG on my Bambu A1 Mini, and I ordered a pair of fake Logitech G304 mice from Aliexpress for about $8 each. I had to order two in order to get free shipping.

Can you do this mod without a soldering iron?

Maybe. It sure looks like the battery holder has a cutout that is meant for squeezing the spring-loaded battery terminal through, but I couldn’t manage it. Instead of forcing it, I spent 20 seconds desoldering and resoldering the ground wire.

Let’s get the disappointing part out of the way immediately

These are the best $8 mice I have ever bought. I beat the crap out of a Roboquest level on the Guardian 2 difficulty level. The plastic mold isn’t as nice as the real thing, but it is really quite a good fake!

The fake does not have a 12,000 DPI sensor. Not even close. It is also possible that the Logitech G305 has a much higher DPI than claimed. I had to drop my sensitivity in Roboquest from around 150 to 60 when I plugged in the real Logitech mouse. I had to raise the sensitivity all the way up to 450 when I tried playing with the superlight fake Logitech G304.

I don’t have scientific equipment on hand for measuring DPI accurately, but I did do enough messing around with the settings on my real Logitech G305 to say that the fake is probably using a 1,600 DPI sensor.

Even so, I played pretty well considering how weird my grip on this new mouse was. I might have been playing better than I was the night before.

The skeletal fake G304 next to a real Logitech G305 and a fake Logitech G304

I will need to play for a few weeks to understand how I feel, but here is the important part. While the raw specs of the $8 mouse are obviously disappointing, they aren’t disappointing enough in real life to immediately give up this project.

I should also mention that the product pages for all the fake Logitech mice made it seem like they work with Logitech’s software, and that made it seem like they might pair with real Logitech Lightspeed USB dongles. If that were the case, then they would also share Logitech’s low latency.

My fakes are in no way compatible with Logitech’s hardware or software.

I was hoping that the fake Logitech mice would be better than Bambu Lab’s mouse kit. It seems like they are quite similar, but at least the fake Logitech mouse uses much nicer switches!

How light is a 45-gram mouse?

I didn’t order any fancy skates, and my cheap PVC leather mouse pad isn’t exactly the most friction-free mousing surface. I just used my heat gun to loosen the adhesive on the skates that came installed on the fake mouse, and I stuck them where it made the most sense on the skeletal mouse.

My 45-gram Logitech G304 fake from Aliexpress

I did one thing that immediately made me understand just how light this skeletal mouse is even at its starting weight of 45.6 grams. I tried to scroll with the mouse wheel, but the mouse moved backwards instead of turning the wheel!

I have to have a bit of grip on the mouse to keep it in place while scrolling. The stiffness of the detents in the fake Logitech mouse’s wheel feel pretty similar to the real Logitech mouse.

My hopes and goals

I was hoping that the mouse would indeed have a 12,000 DPI sensor. I was already ham-fistedly tweaking the model in Orcaslicer to shave a few grams off before my mouse hardware even arrived. Removing the USB dongle holster, carving out a bit of excess material, and adjusting some slicing parameters immediately saved me nearly four grams.

That doesn’t sound like a lot, but my skeletal mouse started at 45 grams. That is already quite light, but that is almost twice as heavy as Optimum Tech’s $150 dollar wireless mouse. I don’t expect to be able to shave 19 grams off this cheap setup, but I would be pleased if I could close half the gap.

I think it will be fun to tweak the 3D model to take off some weight, but I am not so sure I want to spend any more money on this project. It sounds like I can shave off another four grams with an upgrade to a USB-C rechargeable AAA battery.

Skeletal Logitech G304 Fake at 35 grams

You can see some printing imperfections on the mouse-button paddles due to pushing the number of top layers a little too low in order to save weight!

I wasn’t sure that I wanted to sink even a few more dollars into this 1,600 DPI mouse, but who am I kidding? Of course I ordered a four pack of 7.6-gram AAA batteries at $4.50 each. If I am going to mess around with an ultralight mouse, I may as well go as far as I can. Especially since it is only going to cost me $5.

I had assumed that I would build my own parametric mouse in OpenSCAD from the ground up if I enjoyed the skeletal mouse enough. It would be nice to be able to enter in angles and positions to help dial in the grip, wouldn’t it?!

While I enjoyed the idea of building that parametric model around the $8 mouse, it just is not a fast enough mouse to put that much design effort into. The real Logitech G305 might be worth that kind of effort. We will have to see how the long-term experiment goes!

I am still excited. Spending $8 to $12 on a 35-gram mouse is amazing, and the fake Logitech is performing well enough that I can’t complain too much.

Why isn’t 45 grams light enough?!

I bought my nice Logitech G305 for $30. I ordered a pair of $8 fake Logitech mice the next day. Then Woot had a refurbished Pulsar X2 Mini for $40. That is a super low-latency 26,000 DPI mouse that weighs only 51 grams, and it cost less than what I paid for my new mouse and the pair of experimental units.

The Pulsar X2 Mini isn’t a weird skeletal mouse. It looks and feels like a normal mouse. I feel that I have to beat that by more than six grams to justify the effort of using a strange mouse!

I imagine 45 grams is light enough. It just isn’t light enough for me.

Difference between 1,600 DPI and 12,000 DPI

My inexpensive Logitech G305 doesn’t even have a high-resolution sensor these days, but the fact is that the returns are diminishing as the DPI goes up. You probably don’t need a 50,000 DPI mouse, but it is definitely a bummer using a 1,600 DPI mouse.

When I set my real Logitech G305 to 1,600 DPI, you can see just how irregular the movement is in game when moving the mouse slowly. It seems identical to the $8 mouse.

3400 hours in Team Fortress 2

I am most definitely not a professional first-person shooter player, but I have put in a lot of time. I hope my skill level is at least a little above average!

I feel that this validates how awesome a lightweight mouse can be. I do not notice the jerkiness of 1,600 DPI while I am dodging bullets and twitching my aim all over the place. I do believe I notice a difference in how easy it is to get my crosshairs pointed where I want.

Maybe. I will play using my 33-gram mouse for a week or so. Then I will switch back to the 97-gram 12,000-DPI mouse.

I might be at the limit at 33 grams!

I removed the dongle holster. I took material off the side buttons. I reduced top and bottom layers throughout the model. I limited the 3D-printed part to one perimeter for the first two millimeters. The only place the mouse has grown is my addition of a spot to grip with my ring finger.

I don’t think I can remove any more material from this design without the buttons getting too wobbly. I have already tried and failed.

I left the battery cover off. That saved 1.5 grams.

I desoldered one of the side buttons that I don’t use. That saved me 0.6 grams. The mouse felt a little too floppy when I removed both. I opted to just solder one button back on instead of modifying the model.

My AAA Eneloop battery weighs 11.7 grams. I replaced that with a 7.42-gram lithium-ion AAA battery.

I reprinted the mouse skeleton in ABS while I was finishing up this blog post. That skeleton came in at 6.89 grams, which is a whopping 1.5 grams lighter than the PETG version!

The ABS version with one of the side buttons desoldered using a lithium-ion rechargeable AAA battery comes in at 33.6 grams. Should I reprint the battery holder to attempt to shave off another 0.25 grams?!

Why Roboquest?!

Roboquest is fun, it is extremely replayable, and you have to move constantly and aim for small critical spots on the enemy robots. This seems like the right sort of game to put a 33-gram mouse to the test!

I played a lot of Roboquest when it was released, but I stopped playing because you can’t save your game. It is a fantastic game because it loads fast and you get right to the action, but it is a bummer that you have to commit 30 to 60 minutes on one complete run without stopping.

The recent update saves your game between levels. This means I can play for 5, 10, or 15 minutes then take a break, so I have been playing a lot the last few weeks!

What’s next?

Assembling this lightweight mouse dropped me directly into the center of a conundrum. I thought it might be fun to design my own ultralight mouse to fit my hand, but I sure don’t want to put any work into designing something to fit the $8 mouse if it is only 1,600 DPI.

Designing something around the real Logitech G305 might be interesting. It isn’t the fanciest wireless gaming mouse, but it is no slouch and often goes on sale for $30. Designing a lightweight skeletal frame for a proper low-latency mouse with an actual 12,000 DPI sensor would be fun, and it is even better if you can put the whole thing together for less than $40!

My skeletal superlight fake Logitech G304 at 33.46 grams

NOTE: I looked at some teardown photos of the real Logitech G305, and I am not certain that it’d be a good fit for a mouse of this style. The primary mouse buttons seem to be built into the top half of the shell. It would be easier to build something around a mouse where all the buttons are soldered directly to the PCB!

I guess this is where the conundrum lives. Putting the crappy mouse on a massive diet has somehow made for a fantastic FPS gaming experience. Every time I switch back to the heavy 12,000-DPI low-latency mouse, I feel like it is a way better experience. Then I try the 33-gram mouse again, and aside from the noticeably less fluid motion, I play and aim at least as well.

I am already at the point where I feel that I just HAVE to try a 33-gram skeletal mouse with premium electronics!

There are inexpensive, high-DPI, low-latency, ultralight mice on the market!

I have been searching for a nice, inexpensive mouse with the buttons attached to a single PCB just like the $8 mouse from Aliexpress, and I think I have found some good candidates.

Every mouse in the VXE Dragonfly R1 series looks awesome. The base model has great specs, weighs 55 grams, and only costs $19! The lightest mouse in the series has a better sensor, better electronics, weighs only 48 grams, and only costs twice as much at $40.

I think that sounds pretty fantastic without even modifying the mouse, and I am extremely excited about these prices. I hate the idea of asking you to buy and dismantle something like a $170 Razor V3 Pro only to learn in two weeks that the skeletal mouse frame just isn’t going to work for you.

I feel way better about the idea of asking you to take apart a $20 or $40 mouse!

UPDATE: I bough the R1 Pro. I designed a fingertip shell for the R1 Pro. The R1 Pro is awesome. Skip this fake Logitech mouse mod.

Conclusion

I am only on the first step of this journey, so I don’t have much advice yet. I can say that if you are on a really tight budget, or if you need a spare mouse to keep in your laptop bag, you can definitely do worse than grabbing one of these fake Logitech G304 mice from Aliexpress. I think you’d be better off spending a bit more on the real Logitech G305 if you can.

I can also say with certainty that just printing the skeletal shell from Makerworld and sticking the $8 mouse guts inside has been a fun project all on its own. If you are looking to have some fun and wouldn’t mind spending eight bucks on a 3D-printing experiment, then you shouldn’t wait for me to figure out the next step in my journey. Just go buy the cheap mouse from Aliexpress and have some fun!

I would like to hear from you! Are you using something like Optimum Tech’s 16-gram wired mouse or 26-gram wireless mouse mod? Are you already using a more mainstream lightweight mouse? Or are you like me, and you’ve been using a heavy 95-gram mouse all this time? Tell me about your gaming setup in the comments, or stop by our Discord community and tell me about your mouse!

My Networking and NanoKVM Pouch For My Laptop Bag

| Comments

In days that have long gone by, I used to carry all sorts of useful gear in my laptop bag. At the time, I was working a job where I would spend most of my time at my desk, some of my time in meetings, but also a not insignificant amount of time fiddling with things in our ice-cold datacenter. I would actually make use of the tools and gadgets fairly regularly.

Then 9/11 happened, the TSA sprung into existence, and I had to be careful to remove things like my Swiss Army CyberTool from my laptop bag before flying.

My network and NanoKVM pouch

Side B of my network and NanoKVM pouch. This is the more interesting side!

Fast-forward twenty more years, and I don’t often need all the cool tools and gear that I used to carry, but I can’t stop myself from wanting to carry them. Not only do I want to tote around everything I might possibly need, but now I use two or three different laptop bags.

When I ride my wheel to the park, I might take my small shoulder bag that only has room for the essentials, and I don’t want extra gear weighing me down. When I take my laptop in the car, I am way more likely to take my bigger backpack that has room for everything I could possibly need, and the car doesn’t care if I pack an extra five pounds.

Should I buy two of everything and keep both bags well equipped? Should I weigh the giant bag down with every possible piece of gear? I’ve decided this is a bad idea. Some of the equipment that I carry isn’t exactly cheap, but I also don’t want to fill the bag any more than I have to, because I often grab my laptop bag when I want to also carry something that ISN’T my laptop. Having a big empty pocket in my bookbag to stick a PlayStation 4, a bag of CNC-cut parts, or a pair of pants is handy!

Pouches seem like the best option

You can use a 3D printer or a CNC router to make awesome hard cases with intricate slots that perfectly fit all your tools, toys, and connectors. The trouble is that when you make things exactly the right size, then they may only fit that one specific version of an item. If you lose your mini screwdriver, or you upgrade your gigabit Ethernet USB dongle to 2.5 gigabit Ethernet, your new device might not fit in your toolkit anymore.

You can squeeze just about anything you need into a pouch. I believe I chose a pouch that is a little too small for my extra networking gear. I probably should have bought the slightly bigger pouch.

Side A of the network and NanoKVM pouch

This is Side A. This is where the cables and couplers live.

If you underpack your pouch, the air doesn’t take up much space. That means you can probably squeeze it into a smaller space inside your backpack. A CNC-machined case made out of hardwood always takes up the same amount of space no matter how much stuff you put inside.

My pouch weighs in right around 18 ounces. That doesn’t seem like a ton of stuff, and it wouldn’t weigh down my small laptop bag all that much. This would have seemed like absolutely no weight at all a decade ago, but this pouch weighs half as much as my entire 14” 2-in-1 laptop. Why lug it around when I don’t need it?

My plan was to pack two separate pouches!

I planned to pack a NanoKVM pouch to replace my massive Pi-KVM kit, and I planned to pack something that I was calling my networking pouch.

If the latter pouch just had a long flat-pack Ethernet cable and a 2.5-gigabit Ethernet USB dongle, I would have just put one of each in both my laptop bags. I was also including a travel router, extra cables, power for the GL.iNet Mango router, and my smallest USB battery bank. That stuff alone grew into a bulky enough assortment that I didn’t want to duplicate it for both laptop bags.

My network pouch next to a Raspberry Pi 5

The new pouch is quite a bit bigger than the pouch from the NanoKVM blog, but it holds so much more gear now!

It made sense to me to separate out the NanoKVM, but then I realized that the NanoKVM is ridiculously tiny compared to what I needed to carry to use a Pi-KVM. I also recognized that there would be a lot of cable overlap between these two kits, and I was also excited about the idea of being able to add WiFi to the NanoKVM using the Mango router in a pinch.

One bigger pouch seemed smart, and this kit would be way more useful to anyone who needs to borrow it!

Where do you get an organizer pouch?

I don’t know about you, but I have all sorts of zippered containers tucked away in my closet. So many toys, gadgets, and headphones come with nice storage and travel containers that I never use.

I was using the large zippered pouch that came with my awesome Amazon box teardown tool to hold my bulky Pi-KVM kit. Two decades ago, I started using the little pouch that came with a Norelco shaver to keep network converters, USB adapters, and other small things from spilling out into my laptop bag.

Test fitting my network pouch

The Mango and NanoKVM sure didn’t want to stay put in that giant pocket! Yuck!

There are handy containers everywhere. When I started organizing my network-only toolkit, I was using a pouch that came with a diabetes blood sugar testing kit. It was actually a fantastic size! The trouble was that it didn’t have enough pockets or straps to hold all my stuff, so I was just kind of stuffing it all in there. When you just stuff things into the pouch, everything wants to fall out when you unzip it!

I wound up buying a $9 BAGSMART organizer from Amazon. I chose it because of the size of the pockets inside, and it was two inches taller but maybe an inch thinner than the pouch I was already using. This worked great as my network-only pouch, but there wasn’t nearly enough room to add a NanoKVM and a thick HDMI cable.

Then I found a pouch that was roughly double the size: same length, same width, but it has two zippers. I chose the “small” model, because the dimensions were almost identical to my BAGSMART pouch. That and the photos made me believe the pockets would be roughly the same size. They aren’t even close. While the BAGSMART pockets were big enough to fit the GL.iNet Mango, the new pouch’s pockets weren’t even close.

3D Printed Gear Rack

My 3D-printed gear rack. The little ears on the sides slide into the large pocket to keep the whole thing in place.

Maybe stepping up to the “medium” organizer pouch would solve this problem, but I was already committed. I borrowed a design idea from Earthling EDC, and I 3Diprinted a gear rack that I could strap the Mango, the NanoKVM, and my smallest USB power adapter to. This is great for me, but it might be better for you to just buy the bigger pouch!

What is in my pouch?

Should we just start with a bullet-point list?

I know I will never use the crossover or rollover cables again. I have had them in my laptop bag for nearly 20 years, and they used to come in handy all the time. I can just plug them into one of the RJ-45 couplers, and I can use any available network cable as a long crossover or rollover cable.

Just about every network device has supported automatic MDI-X for a long time, so I couldn’t even guess when I last needed to use a crossover cable. I don’t think I have seen a rollover serial port in just as long, and I may never see one again. Neither of these cables take up much space.

I imagine the use case for the pair of Ethernet cables would be obvious. Carrying two cables is handy. That way, I can plug my laptop into the router, and I can plug the router into the wall. I could also use one of the handy RJ-45 couplers if I need 15’ of reach.

Flat-pack network cables are awesome at home or on the road. They take up less space in your bag, but they’re also easier to tuck in along the edge of a room if you have to run a cable across your home office.

There are some extremely thin HDMI cables on Amazon, but I haven’t tried them. I believe the HDMI cable that I packed in my kit shipped with the Steam Link. It isn’t quite as thin as the newer cables on Amazon, and I have to admit that I am tempted to tidy up my bag a bit with a new cable, but Valve’s HDMI cable is thin enough that it isn’t taking up a ridiculous amount of space.

Why carry a travel router? And why the GL.iNet Mango?

This entire pouch would be so much smaller and simpler if I didn’t pack the travel router, but this little OpenWRT router is one of the coolest little network tools that I own!

The Mango is one of the cheapest and smallest OpenWRT routers out there. The diminutive size is awesome for this pouch, but sometimes cost is just as important. I used to wind up leaving a lot of things behind. You might replace somebody’s bad network cable with one you have in your bag. You may give away your USB flash drive full of operating system installers just so someone can sneakernet some data. There’s a good chance I will leave a cheap bit-driver set behind for someone who doesn’t have any tools.

I figure that there is a good chance I might leave my travel router behind. I might leave it for friends to use, or I may leave it behind for more nefarious purposes!

I bought my Mango on a whim for $20 a few years ago. It is pretty underpowered. It only has 2.4 GHz WiFi, the LAN and WAN Ethernet ports are only 100 megabit, and it only has enough horsepower to run Tailscale’s encryption at about 8 megabits per second.

Mango and Slate AX

Just the gl.iNets Mango and Slate AX for size comparison

That last part is the exciting bit. I can plug this into a network jack at an undisclosed location. I could power the Mango with my USB power bank, connect it to the Starbuck’s WiFi, and connect to it remotely via Tailscale to use as a jump host or an exit node.

I can set up an impromptu WiFi network so I don’t have to stay close to the wall. I can loan it to a friend if they remote access to their own network. In fact, I have the admin credentials, WiFi password, and subnet information printed on the bottom of the router. That makes it easy to hand the Mango over to someone else who might need to use it.

The convenient interface that GL.iNet puts in front of OpenWRT is a delight. It takes one or two clicks to do things that would take dozens of clicks in OpenWRT’s LuCI GUI, but they still give you access to LuCI on the advanced tab.

Shoehorning Tailscale onto the Mango was a bit of a challenge. It just doesn’t have enough flash storage on board to even store the binaries! I wound up storing Tailscale on a petite USB flash drive. That wound up being handy. I can just pull the drive out if I don’t want the router to be able to connect to my Tailnet, and I used the remaining 15.8 gigabytes on the flash drive to store ISO images that are all bootable using Ventoy.

The NanoKVM does not have a WiFi chip. This is probably the best reason I had to combine the NanoKVM stuff into the same pouch as the GL.iNet Mango. I can always plug the NanoKVM into the Mango, then configure the Mango to bridge to either nearby WiFi or my phone. That would allow someone remote access to the KVM without needing to find an RJ-45 jack.

Should you buy a GL.iNet Mango?

Probably not, but maybe! You can often find the Mango on sale for $23, but you can usually find the much more modern SFT-1200 Opal router on sale for $31.

I like my Mango. The Mango is still cool, and it is still usually $10 cheaper than the Opal!

My Mango is less than a quarter the size and weight of the Opal, but it is old and slow. The Opal could easily handle router, firewall, and probably QoS duties for my home’s 1000/1000 fiber internet connection. I also bet that pair of external antennas attached to the Opal’s much more modern WiFi chipset offer better range.

My gl.iNet Slate AX and my network pouch

My Slate AX next to my network pouch as a stand-in for the Opal that I don’t own!

The Opal is an upgrade by every measure except size and weight, and those upgrades over the Mango are worth way more than the extra $10 you have to pay for one.

I don’t have an Opal SFT-1200 here, but I do have a similarly sized Slate AX GL-AXT1800. The Slate would take up an entire half of my pouch, and it wouldn’t even be able to close correctly. I feel like this would be a very different and much larger kit if I were to upgrade it to the Opal.

Do you even need a network kit like mine at all?!

Most people don’t. Almost everyone is always going to just connect to WiFi, and if they can’t connect to WiFi, then they are going to just give up. That describes me in 99% of situations.

You are here reading this blog post, and you are still reading. You might do the sort of work that requires you to crawl around under desks, hang out in wiring closets, or spend time in heavily air-conditioned server rooms. I suppose the last one doesn’t happen much anymore, and that makes me a little sad.

If you do this sort of stuff all the time, then your laptop bag is already packed with the tools you need. Maybe you are like me, and you used to need these sorts of items regularly, but now you only need them occasionally. Your pouch may not look exactly like mine, but I bet it is a good idea to separate things into their own container just so you don’t have to carry them around all the time.

Separating these things out into their own pouch is also handy because I can easily lend them to a friend or coworker.

Conclusion

Pouches are awesome. I think that’s the most important thing I have to say here. As long as the things inside aren’t delicate, they are a great way to stuff more stuff in your bookbag.

I am still tempted to create a custom hardwood tool case for my laptop bag with perfectly machined slots for all my important tools. It would be a delightful project for my CNC machine, and it would be so much fun to show something like that off, but it would only be able to hold a particular set of tools.

I think that you should build out your own networking toolkit pouch, but I don’t want you to copy mine! I would like to see you pick and choose the tools you will actually use, and I would love to hear what you equip your toolkit with!

Did I pick out a good set of tools for myself? Or did I miss something important? Is there an awesome device, cable, or tool that I should have included in my NanoKVM and networking pouch? Tell me about it in the comments, or join our Discord community and tell us about the tools we are missing out on!

Degrading My 10-Gigabit Ethernet Link To 5-Gigabit On Purpose!

| Comments

It has been about eight weeks since I upgraded to core of my home network. I added a Mokerlink 2.5-gigabit switch to both my network cupboard and my home office on the opposite side of the house. The 5-port switch in my office has a pair of 10-gigabit SFP+ ports, while the 8-port switch in the cupboard has only one. This gives me a nice, fast connection between all the devices in my office and my network cupboard, and a cheap 10-gigabit Ethernet card in my desktop PC has given me a full 10gbe link to the other side of the house.

The specifications for 10-gigabit Ethernet require Cat 6A wiring to reach the full distance, but you can use Cat 5e or Cat 6 depending on the quality of your wiring. I made sure to order a set of three Xicom 10-gigabit SFP+ modules with the hope that they would if I wasn’t be able to negotiate 10 gigabits across the existing Cat 5e wiring.

Testing the limits of my 10 gigabit Ethernet connection

I have to admit that I wasn’t terribly surprised when everything lit up at a solid 10 gigabits per second. I definitely have less than 100’ of cable between my two switches, and that isn’t terribly far!

I was packing up a pouch for my NanoKVM kit with things like flat-pack Ethernet cables and RJ-45 couplers. I have way too many couplers, because I needed two or three but had to buy a 10-pack! I also have plenty of extra 10’ flat-pack cables. I wound up having a fun idea while staring at them.

How many 10’ cables and couplers can I add to this 10-gigabit Ethernet run before it degrades to 5gbe? An even more important question might be whether it will even attempt to connect at 5-gigabits per second!

How about a tl;dr?!

I don’t want to keep you here longer than necessary. There is currently something like 70’ for Cat 5 and Cat 5e cable between my two switches, and they have been happily talking at 10 gigabits per second for months.

That trend continued after adding two 10’ flat-pack Ethernet cables and two RJ-45 couplers to the that chain of wiring. Adding a third 10’ cable and coupler dropped the connection down to 5gbe.

That doesn’t mean that 90’ or 100’ feet is the actual limit for my gear. Some of the reasons why are explained farther ahead. I wasn’t trying to prove how long of a Cat 5e run I could have gotten away with. I really just wanted to see that these Xicom SFP+ modules would actually gracefully degrade to 5gbe.

I was more than a little worried that someone would follow me example, buy these SFP+ modules, then learn that they fool-heartedly attempt to maintain a 10gbe connection even if it was full of errors and retransmissions. I was pleased to learn that they did what they were supposed to do!

The correct way to do this test

I am not prepared to do this the right way because I am pretty sure I left the remainder of my last 1000’ spool of Cat 5e cable at Brian Moses’s house. Even if I still had the spool of cable here, there might not even be enough cable left to successfully break the 10-gigabit connection.

I’d just crimp ends on both sides of the spool and see if adding that length dropped the connection to 5gbe. Then I would cut 10’ or 15’ off the cable, crimp on a new end, and repeat until we see it go back up to 10gbe.

That is about the cleanest way to see exactly how far my setup might be able to go before degrading. I did not do this.

What did I do?!

I grabbed four spare RJ-45 couplers. I collected four spare flat-pack patch cables. I daisy chained them all together, and I put the extra 40’ of cable and couplers in between the switch and the yellow patch cable that connects the 10-gigabit port to the wall.

My iperf3 performance immediately dropped to 4.86 gigabits per second, so I think I did a pretty good job, and I am excited to see that I chose a useful pair of SFP+ modules for this task.

AI Excited guy holding Ethernet cables in his home office

This is just about the worst way you can connect two Ethernet ports. Every RJ-45 connection degrades the signal. The pairs of wires inside the RJ-45 couplers ride along a PCB, those pairs almost definitely aren’t going to be twisted, and who knows if my cheap couplers from Amazon are even any good?!

The fresh 10’ cables are in perfect coils, and the older ones are manually wound up by yours truly. That is absolutely horrible for noise. If we had the gumption to straighten these cables out, then we might be able to go farther before degrading to 5gbe.

What were the results of the test?!

Adding any two of my 10’ cables and couplers to my 70’ of existing wiring had no trouble running at 10gbe. Adding a third cable would drop things to 5gbe, and it would stay at 5gbe with a fourth segment. I doubt I have enough cables and couplers to get the connection to drop any lower, but my suspicion is that it would have drop all the way to 1gbe if it couldn’t connect at 5gbe.

I hope I have already emphasized enough that this is a terrible test of the maximum distance I should be able to get over Cat 5e cable. I am most excited that I finally got to see the Xicom modules connect to each other at 5 gigabits per second.

I bought these modules with the hope that if my wiring would be able to negotiate 5gbe if it couldn’t manage 10gbe, and I was also hoping that would make these inexpensive SFP+ copper modules a good recommendation for other home gamers.

Being able to demonstrate that they worked at full speed over 70’ of old cable was nice, but I feel better being able to tell you that they do indeed degrade to 5gbe just fine!

Conclusion

I think it is best for me to repeat much of what I said in the MokerLink blog post. I don’t think you should be upgrading lots of your home get to 10-gigabit Ethernet unless you are running into a particular bottleneck or have money burning a hole in your pocket.

I absolutely think you should be working on upgrading to 2.5gbe. 2.5-gigabit switches, PCIe cards, and USB dongles are almost as cheap as 1gbe hardware now. Even if you aren’t yet replacing old 1gbe stuff, it is probably time to make sure you’re buying hardware with 2.5-gigabit ports. Eventually your chicken-and-egg problem will work itself out naturally.

While I don’t think you should be upgrading everything to 10 gigabits, I do believe you should be shopping for 2.5-gigabit switches with one or two 10-gigabit ports. Being able to connect my home-office to the opposite side of the house with a fast link is awesome, it didn’t cost all that much, and the reasonably priced Xicom 10-gigabit SFP+ modules should still get you at least a 5-gigabit connection even if your aging wiring can’t manage full speed. That is still a nice upgrade!

Curious about trying out your own setup or have questions? We’d love to see you to join our Discord community. It’s a friendly space where we chat about home networking, servers, 3D printing, and help each other out. Whether you’re just starting or already a pro, there’s a spot for you!

The Sispeed NanoKVM Lite Is An Amazing Value

| Comments

I was over at my friend Brian Moses’s house for pizza last night, and as soon as I walked in the door, he handed me one of the NanoKVM units that he bought during Sispeed’s crowd-funding campaign. I ate my delicious pizza, hurried home, and immediately dug out a microSD card to flash with Sispeed’s boot image.

I have been a huge fan of the Pi-KVM for years. Aside from the huge price difference between a NanoKVM and a Pi-KVM setup, I do have one big complaint about my Pi-KVM, but we will talk about that later.

My NanoKVM Lite in a 3D Printed case

This case fits well but is a little tight. I would stretch it on the long axis by about 0.5 mm if I had to print it again. I assume there are slight differences between the earlier units and the production run.

I am most excited about the NanoKVM’s price. We were using USB HDMI capture cards to put together our early Pi-KVM kits. The entire NanoKVM Lite doesn’t cost all that much more than the cheapest HDMI capture cards, and you don’t have to buy an expensive Raspberry Pi to plug the capture card into.

You just spend around $30 on a NanoKVM Lite, and you have a zippy little IP KVM.

What is an IP KVM?

KVM stands for keyboard, video, and mouse. An IP KVM gives you remote access to a computer as if you were sitting at a local USB keyboard. You can pick which kernel you want to boot from Grub. You can enter the BIOS and change settings. You can even connect a virtual USB disk image to load a fresh operating system.

Brian and I talked about the Pi-KVM and what an IP KVM is at length in this very old episode of “The Butter, What?! Show!” It is an older episode, so it isn’t as well polished!

The fancier devices, like the full-size $60 NanoKVM, can be connected to your server’s power and reset buttons.

The goal is often to never use your KVM. It is just there for when something goes wrong.

My initial experience with the NanoKVM

I plugged in my fresh microSD card, connected the NanoKVM’s HDMI and USB ports to one of my Intel N100 mini PCs, plugged the NanoKVM into my 2.5-gigabit switch, and I was ready to go. I was watching the DHCP server on my router for a new IP address to show up. I punched that into my browser, logged in with the default credentials, and the KVM interface was right there.

I did have to upgrade the software before the KVM parts actually started working. That seemed weird because I downloaded the latest disk image. The update only took a minute or two, and after that, I was clicking around on my Steam gaming mini PC via my network.

I don’t have a way to measure the latency accurately, but it feels more than acceptable.

NanoKVM Lite GUI

My first successful connection to my Steam mini PC via the NanoKVM interface

I was excited to see that Tailscale is an option in the menu on the NanoKVM. I just had to click through a few messages before the interface gave me a Tailscale URL to authenticate my NanoKVM to my Tailnet. It probably took less than a minute to go from clicking that Tailscale button to connecting to my NanoKVM via Tailscale.

The hardware in the NanoKVM is quite underpowered. It does a good job doing its KVM duties, but the latency sure builds up when you add Tailscale’s encryption. I wouldn’t have any trouble using this IP KVM at a remote location via Tailscale in an emergency, but I would prefer to only use it just long enough to get proper network access back!

My problem with the Pi-KVM

This might be a problem unique to my own older Pi-KVM. I don’t have the $200 Pi-KVM hat. I have a $12 USB HDMI capture card that has to be plugged into the correct port, and I have a little USB-C power and data splitter board.

That means I need a power brick with a USB-C cable to connect to my splitter. I need a second USB-C cable to power the Raspberry Pi, then I need a third USB-C to USB-A cable to connect the splitter to the server.

Idiocracy GIF

Meme gif from yarn.co

This isn’t terrible, but the labels on the splitter board aren’t great. It is easy to use when you plug things in on a regular basis, but what about when Brian asks you to plug a Pi-KVM into his off-site NAS 18 months after the last time you used your Pi-KVM? You will have no idea what goes where, and you’ll be running back and forth between the server and your laptop, repeatedly checking whether you got things right or not.

The NanoKVM solves this problem so well. It gets power from the server’s USB port, so you don’t even need to carry a power supply, and it is ridiculously obvious where you need to connect the HDMI and Ethernet ports.

I didn’t have to reference any documentation to plug my NanoKVM in last night, and I won’t have to reference any documentation when someone has a problem in six months.

Why not just buy an enterprise server or motherboard with built-in IPMI?!

Brian and I talked about this a lot when the Pi-KVM was brand new, and we definitely discussed it on The Butter, What?! Show! on more than one occasion, but what is IPMI?

Among other neat features, modern IPMI implementations include IP KVM and remote power on and restart functionality. You can basically have what the NanoKVM provides built right into your motherboard.

The trouble is that motherboards with IPMI are expensive. We were excited about the Pi-KVM when it came out because you could buy a Pi-KVM and a consumer-grade motherboard for around the same price as the least costly motherboards with integrated IPMI. The exciting part about the Pi-KVM was that you don’t lose your remote access hardware when you buy your next server, so you don’t have to pay the IPMI tax again. You can just move your Pi-KVM to the next machine you buy.

The NanoKVM is an even better value. Instead of buying a $450 motherboard with integrated IPMI, you can buy a $100 or $200 motherboard and add your own remote KVM access for $30 to $60. Not only does that wind up saving you money, but you will get to use that NanoKVM on your next motherboard as well!

The NanoKVM is priced well for your Mini-PC homelab!

I have migrated from a single large homelab server to a short but growing stack of mini PCs. I currently have three mini PCs running at home, and any single one of those mini PCs is about as powerful as the aging server that they replaced, yet all those mini PCs combined use less than half the electricity of the old server. It is pretty awesome!

There was no way I would ever consider putting a Pi-KVM onto each of those mini PCs. A working Pi-KVM unit costs MORE than any one of those mini PCs. Even a single Pi-KVM with a 4-port KVM switch box attached would cost more than any single mini PC in my homelab.

The NanoKVM Lite is around $30. I wouldn’t be opposed to plugging one of these in to each of my three $150 mini PCs, though I am not excited about adding all those extra cables to my home setup. It would be way cleaner to just walk to the network cupboard with a NanoKVM when there is an emergency.

I want more NanoKVMs!

I am not convinced that I want to add three thick cables to my homelab for every mini PC. That will add to the clutter so fast, but these are so inexpensive that I am considering other weird uses where I would never want to leave a dedicated $300 Pi-KVM.

When I was setting up the NanoKVM and messing around with my spare Trigkey mini PC, I immediately realized how handy it would be to have a permanent NanoKVM wired up in my office. I just want a dedicated space in the corner with power, Ethernet, and the NanoKVM where I can drop a new or problematic PC, server, or laptop so I can work on it from the comfort of my desk.

NanoKVM on my Tailnet

The physical device doesn’t even need to be at my desk. It can be sitting on the other side of the room. How cool would it be if I could install Proxmox on a new machine from the keyboard at my desk without having to finagle a Pi-KVM or run cables across the room?!

One for my office. One to leave partially wired in the network cupboard. One for my laptop bag. One for my emergency network pouch. There are all sorts of places where these would be handy, and they cost so little that I COULD put one in all of these places!

I have some concerns!

First of all, I have no idea which parts of the NanoKVM are open-source. It does seem as though much of the project is open now. The firmware image I downloaded came from a repository on Github that says it is licensed under the GPL version 3. I assume that is correct, but my NanoKVM wouldn’t display any video until I hit the button to download an update. Maybe that means some core part of the KVM stack is still closed-source? This isn’t a huge concern to me at this point, but I plan to keep a closer eye on this as I figure things out.

There are people losing their Tailscale connection when Tailscale gets updated, and there are other who are having their Tailscale client killed by the OOM killer. The NanoKVM only has 128 megabytes of RAM, so this isn’t surprising, but it is extremely concerning.

Omnigen Pat With A Giant NanoKVM

I thought it would be fun to have Omnigen put me in a datacenter with a NanoKVM. Look at the size of that thing!

I’d like to be able to deploy these NanoKVM devices remotely where they can only be accessed via Tailscale. I will be in big trouble if Tailscale decides to crash shortly before an emergency! You might want to leave a NanoKVM running in an accessible location for a few months before you commit to hosting one off-site.

I also noticed that the NanoKVM accepts root logins via SSH using a password on the Ethernet interface. The default password is root, and it isn’t tied in any way to the web interface’s admin password. The web interface and documentation doesn’t make it obvious that this password even exists. I disabled password logins via SSH, but that breaks the web interface’s terminal access.

This is an easily solvable problem, but it is also a huge gaping hole. How many NanoKVM devices are plugged in right now with an open SSH port and an obvious root password?

Conclusion

I am excited! My Pi-KVM kit, a.k.a. my portable crash cart, is now a fraction of the size. I was able to squeeze the NanoKVM Lite, an HDMI cable, a network cable, a USB cable, and a few odds and ends into a pouch that is only twice as big as a Raspberry Pi. I am probably down to one third of the size and weight of the old kit, and the whole thing is a about one tenth the price of either a TinyPilot or Pi-KVM build. How awesome is that?

My NanoKVM pouch vs. a Raspberry Pi 5

My entire NanoKVM pouch with an HDMI cable, network cables, and all sorts of adapters vs. just a Raspberry Pi 5!

I think Sispeed has a little ways to go on their software with regards to both security and reliability. I can tighten up the security and the firewall, but I need a Tailscale client I can trust if I am going to rely on a NanoKVM at a remote location. I am confident they will get there.

I want to hear what you think! Are you as excited about a $30 IP KVM with a built-in Tailscale client as I am? Are you going to be ordering a few? How many do you think you will need, and what will you be using them for? Let me know in the comments, or join our Discord community to chat about it. We have a few NanoKVM users in there already, and several more waiting for orders to arrive!

Deciding Not To Buy A Radeon Instinct Mi50 With The Help Of Vast.ai!

| Comments

I am quite grumpy about both cloud AI and local AI. I just can’t figure out where I want to land, and I don’t even want to choose just one or the other. I am happy to use the cloud where it makes sense, then use a local LLM or image generator where that might work out better. The trouble is that none of it makes sense!

I have been interested in the idea of grabbing a used server GPU on eBay for a long time. The 24 GB Tesla P40 used to be a great deal, but they increased in price from $150 to nearly $400 almost overnight. They still might be a reasonable deal at $400 if you really do need 24 GB of VRAM, but the even older 24 GB Tesla M40 can be had for less than $100.

Pat with a GPU

This is an AI-generated image of me holding an immitation of my MSI Radeon 6700 XT generated using OmniGen V1 at fal.ai. I tried to generate a picture of me with a Radeon Instinct Mi50, but that didn’t go so well!

My friend Brian Moses beat me to that, and his results were more than a little underwhelming. Don’t get me wrong here! What you get for $100 is fantastic, but my 12 GB Radeon 6700 XT in my desktop is significantly faster. I feel like I am better off dealing with the LLM startup times just so I can keep my performance.

Why is the 16 GB Radeon Instinct Mi50 interesting to me?

The Tesla GPUs are awesome because they run CUDA. That makes everything related to machin learning so much easier. Some software won’t even run with AMD’s ROCm, and AI software tends to run slower on ROCm even when the AMD’s hardware should be faster.

That means that there are things that just won’t currently run on an Instinct enterprise GPU, and getting the things that will work up and running will be more work. I am already running a ROCm GPU, so I am confident that the things that I already have working could be made to work easily enough on a server with a Radeon Instinct GPU.

I would love to have 24 GB of VRAM, but I am mostly happy being stuck with my current 12 GB, so the 16 GB Radeon Instinct ought to be comfortable enough. The Mi50 has a massive 1 terabyte per second of VRAM bandwidth. That is three times more bandwidth than my GPU or either of the reasonably priced used Nvidia Tesla GPU models. That should be awesome, because LLMs love memory bandwidth.

There are currently several 16 GB Instinct Mi50 GPUs listed on eBay for $135. They will require a custom cooling solution, but I have a 3D printer, so that shouldn’t be a problem. There are 32 GB Instinct Mi60 GPUs with slightly more performance starting at around $300. I think that is quite a reasonable price for 32 GB of VRAM, but I don’t even know if these Instinct GPUs will be suitable for my needs, so I have only been looking at the 16 GB cards.

Why is 16 GB enough VRAM for my needs?

I am using a 5-bit quantization of Gemma 2 9B to help me write this blog post. It is definitely up to the task. This model uses around 9 gigabytes of VRAM on ROCm with the context window bumped up to 12,800 tokens. Dropping down to a 4-bit quant and lowering the context window a bit would allow this model to fit comfortably on a GPU with 8 gigabytes of VRAM.

Older Stable Diffusion models run great on my current GPU, and my settings give me decent images at around 20 seconds each. I have to run Flux Schnell’s VAE on the CPU, and I can generate decent 4-step images in around 40 seconds. I believe I am using a 4-bit quant of Flux Schnell.

AI Guy holding a floppy disk in a datacenter

Squeezing all of Flux Schnell into VRAM would be a nice upgrade.

My Radeon 6700 XT has roughly 35% faster gaming performance than an Instinct Mi50 according to Tech Power Up’s rankings. My hope was that the Mi50 would process more tokens per second since LLMs are usually limited by memory bandwidth, and the Mi50 has boatloads of bandwidth.

I keep waffling back and forth!

I just couldn’t bring myself to order an Instinct Mi50.

I do hate that I have to remember to spin up the Oobabooga webui every time I work on a blog post, and I have to remember to shut it down before playing games. I am also not good at remembering that I even need to spin it up before I start writing, so I wind up having to wait an extra 20 seconds while I am already prepared to paste in my first query.

I would like to have a dedicated GPU with reasonable performance in my homelab. Then I could just hit a key from Emacs and have some magic show up in the buffer five seconds later. I also feel that I would do a better job at even bothering to set things up to automatically query the LLM if I knew the LLM would always be there.

Flux Schnell Waffle Guy

What’s wrong with the Instinct Mi50? It has no fans, yet it needs to be cooled. I would have to choose and buy a blower fan, and I would have to find or design a 3D-printed duct.

I don’t have any PCIe slots in my homelab any longer, because I have downsized to mini PCs. I can definitely plug my old FX-8350 server back in, but I am not confident that there is enough room for the length of an Instinct Mi50 with a blower fan clamped to the end. I can for sure make it work, but it might require swapping to a different case.

I have been on a repeating cycle in my brain for weeks. “I want an LLM available 24/7!” “The Instinct Mi50 is so cheap!” “I don’t want to do the work only to find out it is too slow!” “I want an LLM available 24/7!”

I managed to find a cloud GPU provider with a 16 GB Instinct Mi50 GPU available to rent. That meant that I could test my potential future homelab GPU without spending hours getting the hardware assembled and installed in my homelab.

Trying out the Radeon Pro VII at Vast.ai

I was disappointed at first, because I didn’t see any Instinct GPUs in the list of currently available machines. I only saw some goofy Radeon Pro VII. What the heck is that?

It turns out these are the workstation version of the Instinct Mi50. Same cores. Same VRAM. Same clock speeds. They are almost identical except for the cooler and the extra video output ports. While the Instinct cards rely on a rackmount server’s own airflow to keep them cool, the 16 GB Radeon Pro VII has its own fan. You can get these on eBay, but they cost more than a 32 GB Instinct Mi50. I don’t know about you, but I’d rather put in the effort to finagle a fan onto the 32 GB card than pay the same for a 16 GB card!

The Xeon Gold CPU in the available server is a pretty good match for my old FX-8350. They have similar single-core performance, while the Xeon has twice as much multi-core performance. That is because my old FX-8350 has half as many cores!

My Radeon Pro VII testing was disappointing

I used Vast.ai’s Ubuntu 22.04 image with ROCm 6.2 preinstalled. I cloned Oobabooga’s text-generation-webui repository, and I copied up the exact same Gemma 2 9B 5-bit model that I use locally.

Everything installed fine. My model loaded right up. I also saw that rocm-smi had me at 71% VRAM usage, and it was peaking at 236 watts during inference. Everything seemed to check out. My LLM was running on the GPU, and the GPU was running at full speed.

I pasted in a blog-editing session from my local Oobabooga install’s history, and I got the Radeon Pro VII into precisely the same state as my local machine. Then I had them both regenerate the conclusion section for my last blog post, since that is my most successful use case for my local LLM so far. It also uses as much of the context window as I am likely to ever need to use.

In fact, I clicked the regenerate button three or four times just to be sure the results were consistent.

Instinct Mi50 Gemma 2 9B results

My Radeon 6700 XT managed to crank out results at 14.75 tokens per second. The Radeon Pro VII’s best run managed 7.64 tokens per second.

I think it is fair to say that the Instinct Mi50 equivalent machine that I rented at Vast.ai was roughly half as fast as my own GPU. I tried poking around with the model settings, but I didn’t manage to coax any extra performance out of it.

Some posts in r/LocalLlama lead me to believe that I might have been able to push something closer to 40 tokens per second with an Instinct Mi50 GPU with a model like Gemma 2 9B. I had high GPU utilization. I was consuming the right amount of VRAM. It is possible something else is going wrong, and it might be something out of my control, but I don’t have a good guess as to what that could be.

I don’t think I have much use for a 24/7 LLM server today if it can only manage 7 tokens per second. I was already having trouble convincing myself that I needed to buy an Instinct GPU to try out. Even though I am concerned that this could just be a flaw in the cloud-GPU setup, this still just might be how fast this GPU will be. I don’t want to put in the effort with a used GPU just to wind up with an LLM that is too slow to be useful.

I have only nice things to say about Vast.ai so far

I have not tried any other cloud GPU services, and I definitely need to tell you that I didn’t do a ton of comparative research here. Vast.ai was on one of three or four price lists that I checked. They are at just about the lowest price point of any cloud GPU provider, but that means they are also one of the more manual options. They were saying nice things about Vast.ai all over Reddit, the prices are great, and most importantly, they had the GPU I was most interested in testing available.

How is Vast.ai more manual than some of their pricier competition? There are companies that will let you fire up a preconfigured LLM, Stable Diffusion, or Flux Schnell service in a matter of seconds. You have to bring your own Docker configuration to Vast.ai, though they do have a library of basic templates that you can choose from.

I added $5 to my account a few days ago, and I have $4.78 left over. It only cost me three cents to do my little benchmark of the Gemma 2 9B on the Radeon Pro VII.

I spent the other 19 cents messing around with the Forge webui and Flux Schnell on a middle-of-the-road 16 GB Nvidia A4000 GPU. Why I chose this GPU will probably be a good topic for another blog post, but telling you how things went will fit well in this blog post!

Flux Schnell Confused Robot

I was generating nice 4-step 1152x864 images at around 20 seconds per image. That’s half the time it takes to generate a 640x512 image on my Radeon 6700 XT. The A4000 didn’t go any faster at lower resolutions. That was a bit of a surprise but is also kind of neat! It is unfortunate that this is still miles away from sites like Mage.space that will generate similar Flux Schnell images in 3 seconds.

Spending 20 cents per hour to generate Flux Schnell images in 20 seconds seems like a good deal compared to Mage.space’s $8 per month subscription fee. The problem is that it took 15 minutes for my Forge Docker image to download Flux Schnell from Huggingface.com. This is mostly a problem with Huggingface, because I have the same complaint here at home.

You can avoid most of that 15-minute wait by storing your disk image at Vast.ai with Forge and Flux Schnell already installed, but that would cost almost $9 per month. Why pay $9 per month for slower image generation when you can pay Mage.space to manage everything for you at $8 per month?

This even more of a bummer since I discovered that Runware.ai added an image-generation playground to their API service. I can pop in there and generate all the Flux Schnell images that I need for a blog post in less time that it takes a server at Vast.ai to boot. Booting the machine at Vast.ai costs me a nickel. Spending the same time generating images at Runware.ai costs a fraction of that, and I don’t have to wait 15 minutes to get started.

I am excited that I bought some credits to use at Vast.ai. When I can’t get something cool to work locally, I can spend a few nickels or dimes messing around on someone else’s GPU. That is a cheap way to find out if it is something that I can’t live without having locally. I can always decide that I just have to buy a 24 GB Radeon 7900 XTX or Nvidia 4090 later, right?!

The costs are small

I am grumpy about all of this. The prices on everything short of a current-generation GPU with 24 GB of VRAM are almost inconsequentially small.

I can add sluggish but not glacially slow LLM hardware to my homelab for $135, and I can power that new hardware for $70 per year. I don’t know how you want to amortize those dollars out, but lets call it $17 per month for the first year and $6 per month for every year after. That is peanuts if you ignore the work of configuring and maintaining the software.

That is about what it would cost to pay for a service like Mage.space for image generation, and they’ll do the work of installing new models and keeping the software up to date for you. I can hit OpenAI’s API as hard as I know how to, and it will only cost me pennies a month. Both of these services are faster than anything I can run locally even if I shelled out $1,800 on an Nvidia RTX 4090.

Flux Schnell Robot Counting Money

The cost of keeping a disk image ready to boot at Vast.ai would be about the same. There is a ton of flexibility available with this option, but they still can’t boot a machine instantly, because they have to copy my disk image across the globe to the correct server. It will still take two or three minutes at best for me to start generating images with Flux Schnell.

You can be flexible, cheap, and slow at home. You can give up instant 24/7 availability to be flexible, cheap, and fast at Vast.ai. Or you can give up flexibility to be fast and cheap with services like Mage.space and OpenAI’s API.

AI in the cloud is almost free when you pay by the image!

Everything in this blog post fit together really well when I started writing it, because a Mage.space subscription, storing a disk image at Vast.ai, or idling your own machine-learning server at home would all cost a bit less than $10.

I didn’t discover that Runware.ai implemented an image-generation playground until I was almost finished writing this entire blog post. Runware.ai didn’t have a playground a month ago!

This fits my needs so much better. I can generate 4-step 1024x1024 Flux Schnell images at Runware.ai in about 8 seconds, and I can log in and start typing the prompt for my first image in about 10 seconds. They will let me generate 166 Flux Schnell images for a dollar. All of this is fantastic, and it meets my personal needs so well.

I don’t think it is possible for me to use up more than 20 cents a month in OpenAI credits, and I would be surprised if I ever manage to use up an entire dollar’s worth of Runware.ai credits in any given month. Both of these services are five to ten times faster and cheaper than running local LLMs or local image generation.

Runware.ai’s playground isn’t as flexible as Mage.space. There are fewer models to choose from, and there is no way to automatically run a matrix of models, prompts, and scales like I can locally. It is tough to complain with Runware.ai’s price and performance.

I did some more searching just before publishing this blog when I wanted to try out OmniGen. I figured it’d be fun to combine a photo of myself with a GPU in an odd location, and that was fun! My search helped me discover that Fal.ai has much fancier Flux Schnell image generation features than Runware.ai. Fal.ai costs a bit more, but it is still tiny fractions of a penny per image.

I don’t know why I didn’t figure this out sooner. I remember when Brian was using Fal.ai to train a lora to generate weird images of himself!

What about sharing an LLM server with your friends?!

I don’t have any good plans here, but I thought the idea was worth mentioning. Tailscale would make it ridiculously easy to share an Oobabooga text-generation webui and a Forge image-generation webui with a bunch of friends. Splitting the energy costs of a small server and the price of a used Tesla M40 or Instinct Mi50 GPU would quickly approach the price of a cheap cup of gas-station coffee every month with only a handful of friends contributing.

Flux Schnell three GPUs

That handful of three or four friends could easily make a used 24 GB Nvidia 3090 in a shared server approach the cost of the cloud services, and that would be fast enough to be properly competitive with OpenAI or Runware.ai.

Just about the most interesting reason to run your AI stuff locally is to protect your privacy. Running on a GPU at your friend’s house may or may not be something you consider private. Is it better to be one of millions of queries passing through OpenAI’s servers, or would I prefer to have all my queries logged by Brian Moses?

Conclusion

I started writing this conclusion section an hour before I discovered that Runware.ai has a playground, and my original plan was to explain how it makes me grumpy that both local and cloud machine-learning services have roughly equal pros, cons, and costs. Now I am suddenly happy to learn that I can run both LLM queries AND generate images with cloud services faster than I ever could at home for literal pocket change.

I am still excited about local LLMs. They have gotten quite capable at sizes that will run reasonably fast on inexpensive hardware. There is a good chance I will still wind up buying an Instinct Mi50 to run a local LLM, but I am waiting until I actually NEED something running 24/7 here in my home.

I would love to replace my Google Assistant devices around the house with something that runs locally, and an in-house LLM that can respond in a few seconds would be a fantastic brain to live near the core of that sort of operation. I would want this to be tied into Home Assistant, so I certainly wouldn’t want it to rely on a working Internet connection.

What do you think? Are you running AI stuff locally or using cloud services? Do you keep flip-flopping back and forth on what you run in the cloud vs. what you run locally? Tell us what you prefer doing in the comments, or join the Butter, What?! Discord community and tell me all about your machine-learning shenanigans. I look forward to hearing about your ML adventures!

Should You Buy An Intel X540-T2 10-Gigabit Ethernet Card?

| Comments

I bought a couple of 2.5-gigabit Ethernet switches with 10-gigabit SFP+ ports, and if I wanted to be able to utilize the 10-gigabit connection from my home office to the other side of the house, then I needed to buy a 10-gigabit Ethernet NIC for my PC. I did absolutely zero research. I saw a 2-port 10gbe PCIe card on Aliexpress for $17, so I added it to my cart alongside the three Xicom 10-gigabit SFP+ copper modules that I was already ordering.

Stable Diffusion and/or Flux Generated AI Person

The Intel X540 card has been working well, but I have had two problems. I am going to tell you about my problems so that you can make an informed decision. The only thing that I learned while searching the Internet to see if my problems were common is that the general vibe is that these cards can be problematic. Not in any specific way. They just aren’t great cards.

I think it is important for me to say that I am currently having zero problems with my 10-gigabit Intel X540-T2 Ethernet card. It was a good gamble at $17, and I will buy another if I need more 10-gigabit cards in the near future.

Why are these network cards so cheap?

The cards on Aliexpress were originally intended for something not quite compatible with the PCIe standard. There is an extra connector in the back that has been cut off. You wouldn’t be able to put this card in a normal PCIe slot if that connector were still intact. I don’t know what sort of proprietary machine these were originally meant for.

My Intel X540T2 10-gigabit Ethernet Card

The part circled in red notes where the PCB was cut at the factory after manufacturing. You can find nearly indentical cards in an image search where there is another edge connector at this location.

People on the Internet would like you to believe that these cards are assembled from salvaged components from used servers. I don’t think that is likely, but I wouldn’t be surprised if they’re assembled from Intel X540 chips that didn’t make the cut after testing by one of the reputable manufacturers.

The Intel X540 also happens to be a rather outdated 10-gigabit Ethernet chipset. This means that it doesn’t support 2.5-gigabit or 5-gigabit Ethernet, and it uses quite a bit more power than something newer. My cable only had to run 10’ to reach the switch, so I was confident that I would get a stable connection at full speed. You might want to find something that supports 5-gigabit Ethernet if your cabling situation is sketchier than mine.

My unique and serious problem

I couldn’t find anyone else with this problem, and I am not even quite 50% certain what actually solved my problem. I don’t expect that you will encounter this problem, but I feel like I need to tell you about it.

When I first installed the X540-T2 in my workstation, I couldn’t get it to boot. My fans and lights would turn on for a second, then everything would shut off. I reseated the card, and everything was happy. At least, I thought everything was happy.

When I rebooted my computer a few days later to check a BIOS setting, it shut off and wouldn’t turn back on again. Reseating the card didn’t help this time, so I scrounged up an old computer and tried the Intel X540 card in there. It worked just fine. It worked over, and over, and over again.

The most likely cause of this symptom is inadequate power delivery. My 850-watt power supply’s fan was making a gentle ticking sound last month, so I decided to replace it with a quieter fan than it came with. I have been running an old 500-watt power supply in the mean-time, and since it has been working just fine with my CPU and GPU maxed out, I haven’t been in a hurry to swap the repaired PSU back in.

Stable Diffusion and/or Flux Generated AI Person

So I did the work to swap the correct power supply back into my case. I noticed while doing this that the metal bracket on the Intel card wasn’t lining up well with the back of the case. I thought maybe it was pushing the 8x PCIe card slightly out of alignment in the 16x slot, so I gave the bracket a slight bend, and it fit much better.

A misalignment would explain why the computer wouldn’t boot, but it wouldn’t explain why a reboot would fail. You would expect it to run into a problem while running for days.

I checked the 12- and 5-volt lines on a drive power cable of the old power supply with a multimeter, and they weren’t dipping at all when powering up the computer, so I am not terribly confident that swapping the power supply was the fix. Those were the only two rails I could easily test.

One of these two things definitely helped, because I have been rebooting and power cycling my computer over and over again without issue. I am sure someone with a keen eye will notice the missing screw on the PCIe bracket in one of the photos on this blog post. I can only assume that I took that photo while swapping out the half-height bracket that was installed at the factory. There have been two screws installed every time I have had the card installed!

These Intel X540 cards tend to overheat

At least, I think an overheating issue is what I have run into. Heat is the number one complaint from owners of these cards. I don’t care how hot my silicon runs, but it definitely needs to be running within the design specifications.

I noticed the other day that some of my pings in Glances were failing. I pulled up speednest.net, and the results weren’t consistent. Reddit wasn’t loading full pages of content successfully, and my iperf3 tests to one of my mini PCs were bouncing between 0, 500, and 2,200 megabits per second.

I temporarily fixed the problem in an odd way. I fired up enough openssl speed benchmarks to max out all my CPU cores to get the chassis fans choochin’. It didn’t take long before all my network problems just straightened themselves out.

Stable Diffusion and/or Flux Generated AI Person

I have yet to properly address this issue. It hasn’t happened again, but I plan to bump up my minimum chassis fan speeds next time I am in the BIOS. My fan speeds are set really low to keep the noise down in here when I am recording podcasts, but I don’t think an extra 5% or 15% will make an audible difference.

The fans are usually set to spin at the minimum possible RPM. This is enough to keep my Ryzen 5700X and my Radeon RX 6700 XT at or below 45C when idle. That just doesn’t seem to move enough air past that poor Intel NIC.

The $60 Intel X540 cards on Amazon seem to ship with much larger heat sinks, and some cards ship with a tiny fan. I would expect the large heat sink to be a nice upgrade, but I definitely wouldn’t install a card with one of those tiny fans. I am trying to keep my office quiet, and I would expect to hear a high RPM fan like that.

My ACTUAL solution to my overheating Intel X540T2

My 10-gigabit LAN connection started getting weird again. My small set of hosts that I track in glances was showing timeouts again, so I did an iperf3 test. It came back wit the same weird results where it was bouncing between zero, hundreds of megabits, or several gigabits per second.

I know that I explained in the previous section that I bumped up my minimum case fan speeds in an attempt to help with this problem, but I wasn’t so certain that I ACTUALLY went into the BIOS to make that change. I was about to do boot into the BIOS to make those tweaks, but I changed my mind. I decided to pull the Intel NIC, pop off the heat sink, and see how things were doing in there.

Smokeping monitoring my Intel X540-T2 NIC

The heat sink was getting really hot. That should have been a good sign. That means that heat was leaving the chip on the NIC and transferring into the heat sink. Even so, whatever material they used for thermal transfer was dry and cracking, and there wasn’t actually any thermal compound directly between the chip and the heat sink.

I slathered on way too much of my old but preferred thermal compound, which is a tube of dielectric grease from Autozone that I have been using since 2008. I will probably be using this stuff for the rest of my life unless I manage to lose this tube. I didn’t know how hard the springs press the heat sink to the chip, so I figured I should err on the side of too much thermal grease. That way there wouldn’t be any air gaps.

Dielectric Grease as Thermal Paste

I honestly didn’t expect this to work, because the heat sink was already pulling what felt like a ton of heat out of the NIC. I have been monitoring the connection with Smokeping for more than ten days now, and I haven’t seen a single missed packet. I guess the upgrade to some nice, soft thermal compound was just enough to make the difference!

I do wish that I was smart enough to add my workstation to my Smokeping server BEFORE I repasted the heat sink.

Alternatives to the Intel X540

There’s no shortage of used Mellanox 10-gigabit Ethernet cards on Amazon between $25 and $50, but they all have SFP+ ports. You will need to add a $20 transceiver to plug in your Cat 6 cables. I could write an entire blog post about retired enterprise 10-gigabit network equipment, but that is a bit out of scope this time. I am trying to make use of the existing cables in my house this time instead of running short 40-gigabit connections between nearby computers.

Old Mellanox cards are probably the best option if you need 10-gigabit Ethernet and don’t want to use the inexpensive Intel X540 cards from Aliexpress.

Ethernet megabytes/s Rough Equivalent
100 megabit 12.5 Slow
gigabit 100 500 GB laptop hard disk
2.5-gigabit 250 3.5” hard disk or older SATA SSD
5-gigabit 500 the fastest SATA SSDs
10-gigabit 1,000 the slowest NVMe drives
40-gigabit 4,000 mid-range NVMe drives

There are plenty of PCIe 5-gigabit Ethernet adapters on Aliexpress that use the RTL8126 chipset for about the same price as the older Intel X540 cards. The only thing disappointing about these cards seems to be the speed.

The Intel cards consume almost 20 watts of power, while the Realtek 5-gigabit cards claim to use less than 2 watts. That doesn’t seem like too big a deal when you only have one card, but I have mini PC servers in my homelab that use less than 10 watts.

Since the Realtek cards fit in a smaller 1x PCIe slot, they are easier to fit into your build than the bigger Intel X540 cards.

I have a working 10-gigabit Ethernet card, and I am not going to put in work for a downgrade, but I think the right choice for me would have been buying a 5-gigabit Realtek card instead. Those cards just sip power, and I don’t have a use for the extra 5 gigabits today outside of running iperf3 tests to make sure the 10-gigabit link between the switch in my office and the switch in my network cupboard is actually running at 10 gigabits per second!

The problem with 40-gigabit and 56-gigabit network cards

The 40-gigabit Infiniband cards I was using a few years ago were fantastic. Those same Mellanox cards could run either 10-gigabit Ethernet firmware or 40-gigabit Infiniband firmware, so they’re the same family of 10-gigabit Ethernet cards I was recommending in the previous section—except you can get 40 gigabits per second out of them when running Infiniband.

That sounds awesome, except it is extremely challenging to find enough fast PCIe lanes to max out even one of the pair of 40-gigabit ports on those cards. You’re going to be buying older enterprise hardware, so the maximum supported PCIe version of those cards will be a generation or two out of date. If you are anything like me, the servers in your homelab are often made up of old computers that you used to run at your desk.

Those older computers might have even older PCIe slots, and the newest machines on your Infiniband network might have a GPU in the only properly fast PCIe slot on the motherboard. The best I ever managed in my setup was 16 gigabits per second of PCIe bandwidth. That worked out to 12.8 gigabits per second via iperf3 on my IP-over-Infiniband link.

That’s only 30% faster than 10-gigabit Ethernet, and I had no easy way to extend that connection from one end of the house to the other. I can run plain old 10-gigabit Ethernet over the old Cat 5e cable in my attic.

My first 20-gigabit Infiniband setup was a fantastic deal in 2016. I was getting 8 gigabits per second over iperf, and my entire setup to connect two computers cost less than a single 10-gigabit Ethernet card.

10-gigabit Ethernet is priced much better in 2024.

Conclusion

In the end, the Intel X540-T2 might be tempting with its low price, but it might be a bit of a gamble. I suspect that you will almost definitely get lucky and have a perfectly stable card, but there is the possibility that you will spend days wrestling with issues. While I personally wouldn’t hesitate to grab another X540 for myself, I think you should consider something like the $17 Realtek 5-gigabit cards unless you absolutely require 10-gigabit speeds—just make sure the other end of your connection actually supports 2.5- and 5-gigabit Ethernet!

This journey into the world of 10-gigabit Ethernet has been a reminder that even seemingly simple hardware choices can come with unexpected challenges. It highlights the importance of thorough research, understanding the nuances of different technologies, but also not being afraid to experiment. After all, what’s the fun in building a homelab without a few learning curves along the way?

If you’ve had similar experiences with networking hardware, or if you have any insights to share about overcoming these challenges, please leave a comment below! I’d love to hear your thoughts and experiences. And for even more in-depth discussions about homelab gear and tech troubleshooting, join the Butter, What?! Discord community.