I Am Running XFCE On My Proxmox Host?!

| Comments

I was supposed to buy one power-sipping Celeron N100 mini PC to replace the aging and inefficient AMD FX-8350 in my homelab. That is what I did at first, and it worked out great. My Intel N100 with a single 14-TB USB hard disk averages around 15 watts, while the FX-8350 with four 4-TB hard disks was averaging just over 70 watts. Not only is it more efficient, but the mini PC is very nearly as fast and has nearly double the usable amount of storage as the old server. How awesome is that?

Then I bought a second N100 mini PC and decided to experiment with it before putting it directly into my homelab. I learned that the N100 has enough GPU to play some reasonably modern games, and it can emulate consoles up to and including the Nintendo Wii.

Minisforum UM350 Proxmox Machine In Its Temporary Home

My Proxmox host mini PC with XFCE in its temporary home where I can easily plug and unplug USB devices while I continue testing

That’s what put this weird thought in my head. Why not load Steam and Retroarch on the host OS of one of my Proxmox nodes and leave it on the TV in the living room? That is as far as I got with the question because I don’t have wired Ethernet running to the TV, and I am not going to put one of my Proxmox nodes on WiFi.

Then I bought a third mini PC with a Ryzen 3550H CPU. This older CPU roughly comparable to the Intel N100 in both power consumption and horsepower, but the Ryzen has an integrated GPU that is 3 or 4 times faster.

These two extra Proxmox nodes have both been sitting on my desk for enough weeks just waiting for me install them in their permanent homes that it getting another hare-brained idea was inevitable. I at least am hoping that my new idea is a good idea!

Installing XFCE on the Proxmox install was easy!

I just had to run apt install task-xfce-desktop. Aside from creating a non-root user account for myself, that was it. You will have to either reboot or manually fire up lightdm yourself, and you’ll be able to log into XFCE.

I don’t believe Proxmox uses network-manager, so the network controls in XFCE aren’t going to work. That is fantastic, because I wouldn’t want XFCE goobering up my Proxmox networking settings!

I had to add myself to the sudoers group and log back in before I could install Steam. That was easy.

OBS Studio via VNC

I installed flatpak so I could use it to install OBS Studio with VAAPI. That was also easy, and it went quite smoothly.

It took me longer than I want to admit to remember the name of the project that replaced Synergy. I remembered that the modern fork of Synergy is called Barrier. Barrier was in apt, and it only took me a few minutes to get that working. Now I could move my mouse to the right edge of my monitor and start controlling the Proxmox host on my office TV.

One of our lovely friends in our Discord community pointed out that Barrier is also dead now, and that the new fork is called Input Leap. This was not in apt, so I am just going to leave my barrier setup running for now. It is locked down to my Tailscale network, so I am not worried about it being an older piece of network code.

Why on Earth am I doing this?!

I have two reasons for wanting to run a desktop GUI, and I have one extra reason for wanting one of my Proxmox nodes to live in my office.

The first was inspired by a recent hardware problem on my desktop PC. My PSU fan started making some noise, so I had to shut it down to work on that. I happened to have a spare power supply that was beefy enough to swap in its place while I replaced the power supply’s fan, but only just barely. This could have easily been a situation where I would have had to wait for Amazon to deliver me a new piece of hardware to get up and running again!

Proxmox with desktop logged in

The Proxmox server is using less than 3 GB of RAM with the XFCE desktop running, OBS Studio running, and the Octoprint container running

I thought it would be nice to have a spare mini PC with a desktop GUI at my desk that I could use in an emergency. I could quickly plug in my monitor, keyboard, and mouse so I could use Discord, Emacs, and Firefox. That is plenty of functionality to keep me chugging along, and it is way more comfortable than using my laptop.

This idea all by itself seems like a good enough excuse to try this out.

If you’ve been wondering why I wouldn’t attempt to pass the GPU through to a virtual machine and isolate the desktop GUI inside, here is the first part of the explanation. When my computer stops working, I want to be able to plug my important peripherals right in and use them. I don’t want to be futzing around with passing through all the appropriate USB devices. I wouldn’t even have a good place to sit down and do that work!

A dedicated OBS streaming box might be a nice thing to have!

I am running out of USB 3 ports on my desktop PC. Some of my video gear is plugged in via USB hubs. Sometimes I need to disconnect and reconnect my USB to HDMI adapters to get them to work correctly. Even just reducing the number of USB cables that have to run toward my desktop computer will be an improvement.

I have my podcasting camera mounted upside down. This lets me see my Sony ZV-1 vlogging camera’s little display while pushing the lens as close to my monitor as possible. The USB to HDMI dongle that I use doesn’t allow for simple flipping and inverting via the V4L2 driver, so I have to pass it through something like OBS Studio to transform the camera.

Having a dedicated streaming mini PC to handle my cameras, encoding, and streaming seems like it’d be handy.

My desk with my podcasting camera

The latency and frame rate of the OBS Studio output from my mini PC over VNC are both terrible, but it is more than adequate for making tweaks to the OBS Studio settings!

I am able to encode 1080p HEVC video using the GPU via VAAPI. It only uses around 40% of this tiny iGPU’s horsepower, but it has had weird encoder hiccups during the first couple seconds of recording. I suspect it just isn’t ramping up the speed of the GPU quickly enough, but it is fine once it settles in. Worst case I have to dial that back from h.265 to h.264.

I haven’t decided exactly how I am going to tie this into my recording and streaming setup, but I am excited to see that this little box is more than capable of handling these tasks.

I simplified my virtualized Octoprint setup

A lot of people run Octoprint on a Raspberry Pi. I’ve always run it in a virtual machine on my homelab server. When I moved my old FX-8350 homelab server out of my office to quiet things down in here, that meant my 3D printer was now 60’ away from the Octoprint virtual machine, and I surely wasn’t going to run a 60’ USB cable across my house!

I brought one of my old OpenWRT routers out of retirement and installed socat. That let me extend a virtual serial port across my network so I was able to keep using my Octoprint instance. It also gave me an extra access point to add to my 802.11r WiFi roaming network, which was a nice bonus.

Octoprint on my Proxmox host via Tailscale

The socat setup worked most of the time, but every once in a while I would have to restart the socat processes. This didn’t happen often, but these days I only fire up my last remaining Octoprint server once every month or two. The socat process now almost always needs to be kicked before I can start printing.

I was able to plug my Sovol SV06 directly into my new quiet Proxmox server in my office. I did have to cheat, though. If I wanted to run the cable along the wall, I would have needed a 30’ USB cable to make that journey. I decided to run a cable directly across the carpet from my 3D-printer stand to my desk, so there’s 32” of USB cable on the floor with a temporary piece of duct tape helping to make sure I don’t snag it with my foot.

This cable across my floor is a terrible solution, but the Sovol SV06 will definitely be my last printer that is slow enough or old enough to be used with Octoprint. I don’t have a timeline for retiring it, but it is definitely going to happen. That means this short span of USB cable across my office floor is temporary. We just don’t know how temporary!

Is this a good idea?

Should you do this with one of your production Proxmox servers at your company? Absolutely not. Should you do this at home? If you know what you are doing, I would say that it is worth a shot. I am only eating up 2.6 gigabytes of RAM having a desktop session logged in with OBS Studio running. That will be less than 10% of the available RAM in this machine once I am finished shuffling SO-DIMMs around.

Way back when I built my arcade cabinet, I started telling people that the best computer in the house to use as a NAS is the one that already has to be powered up 24 hours a day. An arcade cabinet is way more fun when you can just walk up to it, hit a button, and immediately start jumping on goombas.

Mario Galaxy on an Intel N100 Mini PC

If you’re already paying a tax on your electric bill to keep one computer running all day long, why not give it one more task instead of buying another machine and paying that same tax again?

The important thing to bear in mind is that you may be tying the uptime of these tasks together. When you can’t play Dead Cells because your GPU has somehow gotten itself into a weird state, and the only way to fix it is a reboot, then all the virtual machines on that Proxmox server will also wind up being restarted.

If that means your PiHole VM and Jellyfin container have to be stopped, then nobody in your house will be able to access websites, and someone’s movie stream may stop. Nobody is going to lose any sales, but it is up to you to decide how much money it is worth to avoid this situation.

Conclusion

This little mini PC project has proven to be fun, and it was as easy to get a desktop GUI running on Proxmox as I had hoped! Having a single power-sipping mini PC fitting into two or three roles at the same time seems like a good value, and I am excited to see just how much use I get out of these tertiary use cases.

Odds are pretty good that I will never need to use this Proxmox mini PC as an emergency workstation, but I will feel better knowing that it is available. I do expect to get some use out of the video recording and streaming capabilities. I have some work to do there because I want to be able to use the output from OBS Studio on the mini PC as a virtual webcam in Chrome on my desktop when I connect to Riverside.fm.

What do you think? Do you think it is silly to run a GUI on a server? I would usually be the first person to think so! Are you already doing something similar with one of your Proxmox servers? Or are you running your emergency GUI in a virtual machine? Does it seem like a good idea to overload one of my Proxmox nodes as a video capture, encoding, and streaming machine? Tell me about it in the comments, or join the Butter, What?! Discord community to chat with us about it!

I Added A Refurbished Mini PC To My Homelab! The Minisforum UM350

| Comments

I saw the refurbished Minisforum UM350 on Amazon for $155, and my brain immediately started tingling. Geekbench 5 says that this older Ryzen 3550H is a little faster than an Intel N100, though the spec sheet implies that it may use a lot more power than an Intel N100 when you start to push it hard, but surely it can’t be too bad at idle! It has TWO DDR4 SO-DIMM slots, and the listing INCORRECTLY says that it can run with a pair of 32 GB SO-DIMMs. Not only that, but this Minisforum box has a 2.5-gigabit Ethernet port.

Minisforum UM350 Ryzen 3550H mini PC

So you’re saying that for $10 to $15 more than a brand new Trigkey N100, I can get a comparable machine that supports twice as much RAM, has a faster network port, and should have a much faster GPU?! The lowest price I have seen on a mini PC with 2.5-gigabit Ethernet is around $200. This seemed like it was worth a try!

Mine doesn’t work with 32 GB SO-DIMMs

It sort of works. Memtest86 sees that there are two sticks of RAM installed, but the BIOS and Memtest86 both only see 32 GB. Memtest86 was getting 15 GB/s throughput with the stock pair of 8 GB SO-DIMMs, but it was only getting 9 GB/s with the pair of 32 GB SO-DIMMs. That makes me suspect that it is really only using one of the DIMMs. It did pass a full run of Memtest86 with the pair of 32 GB SO-DIMMs installed.

The BIOS is pretty basic and didn’t give me much insight here.

My suspicion is that there is something about my particular pair of 32 GB SO-DIMMs that this mini PC doesn’t like. Just the fact that it can fully utilize one 32 GB SO-DIMM is a strong indication that it should be able to use a second, and there isn’t anything about the first-generation Zen CPUs that would suggest that it won’t support 64 GB.

I had a lot of weird problems with the 32 GB SO-DIMM that went away when I put the stock RAM back in. My USB keyboard had to be unplugged and replugged a lot. I was able to install Proxmox, but it kept locking up shortly before the fresh install finished booting. It didn’t lock up every time, though. It booted clean the first time. It was definitely acting weird.

If my Minisforum UM350 would have worked with 64 GB of RAM, it would have been an AMAZING value for my homelab. Even so, I feel that the 2.5-gigabit Ethernet port alone was worth the extra $10!

Minisforum vs. Beelink or Trigkey

I am pretty convinced that Beelink and Trigkey are the same company. The packaging has the same corporate address, and their cases are identical.

All of these mini PCs share some common features. They vary in size, but they would all be considered small. Almost every model ships with some sort of hardware to mount the mini PC to the VESA mount on the back of your monitor.

This particular Minisforum PC is a little bigger than any of the Beelink or Trigkey boxes that I have seen, but not by a huge margin. I enjoy that it has a tool-free lid granting you access to the RAM, NVMe, and WiFi card. That kind of feels like a wash because the NVMe is tucked underneath the edge of the shell, and it is a bit of a challenge to get it in to place to push it straight into the slot.

Minisforum UM350 Ryzen 3550H mini PC

You can see some scuff marks on the tool-free lid of my refurbished Minisforum UM350!

I don’t know why, but I was under the impression that Minisforum wasn’t a Chinese company. I thought they were a European company shipping rebranded hardware shipped from China.

Buying a Minisforum product has all the same advantages and disadvantages from Beelink, Trigkey, Topton, or Sovol. The documentation is thin and often poorly translated. Customer service is probably on a different schedule than you. Even so, you will be getting a lot of hardware for your money.

I am extremely pleased with the Ryzen 3550H!

I expected that the 3550H would be reasonably comparable to the much newer Intel Celeron N100. I didn’t expect the aging Ryzen mobile CPU to be 15% faster, and I also didn’t expect the Minisforum box to idle at a lower wattage than either of my Intel N100 mini PCs!

After installing and updating Proxmox on my new mini PC, I plugged it into my spare Tasmota smart outlet, and I have been logging electricity usage. It is using 0.11 kilowatt hours each day. That works out to an average of only 4.6 watts.

Ryzen 3550 Minisforum UM350 Kilowatt hours per day and watts

That is 1.6 watts less than I have ever seen on my Topton CWWK N100 router-style mini PC!

I also ran a couple dozen concurrent copies of openssl speed to keep all the CPU cores pegged, and the Minisforum UM350 topped out at 28 watts. That is only 2 watts more than I have ever seen my Intel N100 mini PCs use. That isn’t too bad, either. It is using 7% more electricity while being about 15% faster. That is a decent trade!

How is the iGPU?

I was really curious about this. The Ryzen 3550H has the exact same iGPU silicon with the same core counts as my Ryzen 5700U laptop, but while the Ryzen 3550H’s GPU is clocked at only 1.3 GHz, my laptop’s GPU is clocked at 1.9 GHz.

None of these machines have what you would call powerful GPUs, but the Intel N100 can easily play emulated games up to and including the Nintendo Wii, and I was able to play Gunfire Reborn at just a little under 60 FPS at 720p. My laptop is a much more capable gaming machine than my N100 mini PCs, and I kind of expected the Ryzen 3550H to be closer to my laptop.

Why Gunfire Reborn? It is a fun game that isn’t very old, and it is still being updated and improved. It also runs reasonably well on even the slowest mini PCs.

I am quite pleased with how the Ryzen 3550H stacked up. I swapped my Intel N100 gaming PC’s NVMe right into the UM350, and I fired up Gunfire Reborn with the exact same settings. It runs at 102 FPS at 720p, and it still managed 92 FPS at a full 1080p. Mind you, this is with all the settings dialed down to low.

Intel N100 vs. Ryzen 3550H Geekbench 5

Just for reference, my Ryzen 5700U laptop runs at about 132 FPS at 720P and 118 FPS at 1080p. That is about 30% faster than the Ryzen 3550H. That is in the ballpark of what I expected.

Does this matter? Probably not. I don’t think anyone is buying a refurbished Minisforum UM350 to play games. It is likely more interesting to you just how well this thing transcodes video for Jellyfin or Plex. The silicon used for encoding video is entirely separate from the silicone used for playing games, so how much better this Ryzen iGPU runs Gunfire Reborn doesn’t mean it will transcode video faster than the Celeron N100!

I had a little trouble getting the Jellyfin Proxmox helper script working. I do know from friends in our Discord community that the Plex and Jellyfin hardware encoding on this Ryzen iGPU is more than up to the task, but I am quite curious how it stacks up to the Intel QuickSync hardware on the Celeron N100. I am going to give this another try soon, but I didn’t want to let it slow down the publishing of this blog post!

Did Pat get hosed by not being able to use 2x32 GB of RAM in the Minisoforum UM350?!

Nope! I am going to turn this into a really good value for me!

My $155 Minisforum UM350 came with 2x8 GB SO-DIMMs. That is technically enough for my needs, but at the same time it feels like a bummer.

My $143 Trigkey N100 came with a single 16 GB SO-DIMM. I then spent $105 on 2x32 GB of DDR4 SO-DIMMs. The intention was to buy a second N100 mini PC and upgrade both to 32 GB.

Minisforum UM350 Ryzen 3550H Mini PC

Can you see how the NVMe with the thick heat sink is partially blocked by the chassis?

Are you keeping score? That means I was going to pay $390 for a pair of N100 mini PCs each with 32 GB of RAM, and I would have had an extra pair of 16 GB SO-DIMMs left over.

What if I buy another Trigkey N100? I can upgrade it to 32 GB, then I can use my leftover pair of 16 GB SO-DIMMs to upgrade the Minisforum UM350.

This may mean my homelab is getting out of control, but it WILL give me three comparable mini PCs each with 32 GB of RAM for my homelab for $545.

It might just be an excuse to add one extra mini PC to my homelab, but it’d only lower the cost of each mini PC with 32 GB of RAM from $195 to $180.

Conclusion

I could write twice as many words about how pleased I am with this Ryzen 3550H mini PC from Minisforum, but it won’t do you any good. This is an older model that you won’t likely be able to buy, and that bums me out!

That said, you CAN still buy a Beelink or Trigkey mini PC with a Ryzen 5560U or Ryzen 5700U, and those models are priced quite aggressively. They both have more CPU and GPU chooch than my Ryzen 3550H, but they are missing the 2.5-gigabit Ethernet. That might be just fine for your use case, and you can add 2.5-gigabit Ethernet for about $7. My suspicion is that the 5560U and 5700U would both idle just as well as my refurbished Minisforum PC because my Ryzen 5700U laptop uses 5 or 6 watts at idle, and that includes the extra power for the screen.

What do you think? Did I get a good deal on my UM350 mini PC? Do you own one, and are you happy with it? Do you think I need to grab a Ryzen 5560U mini PC to test its power efficiency? Let me know in the comments, or join the Butter, What?! Discord community to tell us about your experience or see what we are up to with out homelab mini PCs!

I Bought The Cheapest 2.5-Gigabit USB Ethernet Adapters And They Are Awesome!

| Comments

I very nearly typed “so you don’t have to” as the end of this blog post’s title, but that wouldn’t be the least bit accurate! I bought these because I posted a link to a deal for a Sabrent 2.5-gigabit USB Ethernet adapter for $20 on our Discord server, and someone piped up and said they buy theirs from Aliexpress because the cheap ones over there use the same Realtek R8169 Ethernet chip. I fully expected these inexpensive network dongles to work great!

I now have two mini PCs in my homelab with 2.5-gigabit Ethernet ports, so it feels like it is going to be time to start ordering some 2.5-gigabit switches. I figured I could order a couple of 2.5-gigabit dongles to upgrade some other machines around the house, and I could also use those dongles to make sure that the Cat 5e cable that Brian and I pulled through this house a decade ago is up to the task of pushing faster speeds from my homelab on the opposite side of the house to my home office!

Aliexpress 2.5 Gigabit Ethernet

I also don’t know that I literally chose the cheapest dongles on Aliexpress. Some cost more but had free shipping. Others never had free shipping. Some vendors had free shipping over $10. I figured that last one would be where I’d get a deal, so I bought a USB-A and USB-C dongle from the same store for a total of about $14 for the pair.

Oddly enough, these are the most premium-feeling USB network adapters I have ever owned. My 1-gigabit Ethernet dongle in my laptop bag is all plastic and weighs 20 grams. These new dongles are in an aluminum shell, and they weigh over 30 grams each!

Why am I excited about USB network adapters? Aren’t PCIe cards better?!

The best bang for the buck in mini PC gear for your homelab cluster will have DDR4 memory and almost always have 1-gigabit Ethernet ports. Part of the reason the older DDR4 mini PCs are such a good value is because large DDR5 SO-DIMMs cost 40% more. That isn’t a big difference if you’re only buying one mini PC or if you aren’t maxing out the RAM, but it can add up to a sizable amount of cash if you’re building a small cluster.

Not only that, but some of us already have mini PCs in our homelab that only have gigabit Ethernet. These $7 dongles are a fantastic way to turbocharge our older mini PCs, and they are such an inexpensive way to add 2.5-gigabit Ethernet without paying an extra $80 to $120 for a more expensive mini PC.

I now have two mini PCs in my homelab with built-in 2.5gbe ports. The third Celeron N100 mini PC in my cluster only has 1gbe. I will definitely be adding one of these $7 dongles to that $143 mini PC once I get around to upgrading my network cupboard to 2.5-gigabit Ethernet!

The cheap 2.5-gigabit dongles passed their first test without any hiccups!

I plugged the USB-A dongle into my workstation, then I plugged the USB-C dongle into my Minisforum UM350 mini PC, and I connected them up with one of my new 10’ flat-pack Cat 6 cables.

I have been a fan of flat Ethernet cables for many years, but I haven’t bought any in ages, and I needed a reasonably long cable for my network toolkit. The cables I decided to try were excited to tell me that they were Cat 6 cables at Cat 5e prices, so I was a little skeptical. Seeing them move data via iperf at 2.36 gigabits per second made me feel better about my purchase!

Aliexpress 2.5 Gigabit Ethernet

There isn’t much to say. The kernel modules automatically loaded on both my Ubuntu workstation and my Proxmox mini PC. Pings were fast, and throughput was right where it should be.

I would call that a successful first test.

How about testing the Cat 5e in my walls?!

Brian and I ran Cat 5e to every room in this house about five years before I moved in. I bet Brian wrote down which ports on the patch panel lead to each room, but I don’t have that document. I was smart enough to make a note of some of the important ports when I was setting things up, so I knew that my office was on port 13 on the 48-port patch panel.

I plugged one of the 2.5-gigabit USB dongles into my laptop and connected my laptop directly to the RJ-45 jack on my office’s wall using one of those 10’ flat-pack Cat 6 cables.

Then I walked across the house with a second 10’ flat-pack cable and plugged one of the extra 2.5-gigabit Ethernet ports on my router-style mini PC Proxmox server into port 13 on the patch panel.

I configured each of those network interfaces with new IP addresses on an unused subnet, and then I ran my usual iperf tests. I am not surprised to report that I was 2.36 gigabits per second in both directions. Huzzah!

It was nice to see my Realtek R8169 USB adapter communicating with an Intel i226-V chipset, and it was also great to see that adding a second inexpensive 10’ flat-pack cable to the connection didn’t degrade my signal.

Will the cheap 2.5-gigabit Ethernet adapters be stable over the long term?

I put both 2.5-gigabit adapters away after my tests were completed. That is kind of a bummer because I definitely want to make sure they continue to work for weeks, months, and years.

I am pondering which Mokerlink 2.5-gigabit switches I need to buy, and I don’t have much use for extra 2.5-gigabit network interfaces until I order and install a new switch in both my network cupboard and my home office. That work is at least a few weeks away.

We have several people on our Discord server who have had good luck with their R8169 USB adapters, and Brian Moses tells me he had his homelab server connected with an R8169 USB dongle for a few months. I don’t expect to see any long-term problems once I put these into service, but I will definitely report back with anything I see!

What’s next?

It is a good sign that the cabling in my walls is in good enough shape to run at 2.5 gigabits per second, but the Mokerlink 2.5-gigabit switches I want to put in my network cupboard and home office have 10-gigabit SFP+ ports. I do not have proof that my wiring will get me that far!

The next step is definitely to buy a pair of Mokerlink switches, and I will definitely shop around for 10-gigabit modules that actually support 2.5-gigabit and 5-gigabit Ethernet. My hope is that they will negotiate down if my office can’t support a full 10 gigabit.

Want to follow along on this journey? We’re building a homelab and DIY NAS community on the *Butter, What?! Discord server where we share tips, tricks, and troubleshoot issues. Swing by, join the conversation, and share your homelab setup with us! I’m also curious to hear your thoughts. What kind of network gear are you using in your own setups? Are you using inexpensive USB 2.5-gigabit Ethernet adapters? How are they working out for you? Are you using the Mokerlink switches that I am currently shopping for? Let me know how things are working out for you in the comments below!

When An Intel N100 Mini PC Isn’t Enough Build a Compact Mini-ITX Server!

| Comments

This really depends on what you mean when you say that a Celeron N100 mini PC isn’t enough. If an N100 mini PC isn’t powerful enough, you can spend a bit more on a Ryzen 5560U mini PC that is more than twice as fast, or splurge on a mini PC with a Ryzen 7840HS that is more than four times faster. Heck! You could buy a stack of four N100 mini PCs, or you can mix and match models to fit your unique needs!

Maybe that isn’t what you mean. Maybe you need to cram in more hard drives to store backups, the videos you record yourself on your vlogging camera, or movies and TV shows you acquire on the high seas. You should have no shame in hanging a big USB hard disk off the back of your mini PC, but if that isn’t enough, what are you supposed to do?

Flux Schnell Small Form Factor NAS Guy

You can definitely build something for your homelab that is much more well equipped than a mini PC, but you can also keep your up-front costs and electricity bill down by sticking with a mobile CPU like the Intel Celeron N100. You just have to upgrade your form factor!

My friend Brian Moses has been using, buying, and selling all sorts of mini-ITX motherboards with low-power CPUs, lots of 2.5-gigabit Ethernet ports, and plenty of SATA ports.

You can probably build what you need, and it might only be as big as two or three stacked shoe boxes. That isn’t as small as a stack of mini PCs, but it isn’t exactly huge!

Why is the Intel Celeron N100 so popular now?

Brian and I have both been excited about the various small form factor motherboards built around mobile CPUs for a long, long time. I built my homelab server around a motherboard that was rocking an AMD 5350 processor nine years ago, and Brian put together one of his DIY NAS builds a few years ago based around an Intel N5105 CPU.

I could write an entire blog post about how much of a bummer it was that there weren’t any good homelab or NAS motherboard choices with mobile CPUs in the seven years in between our two builds. Instead, I am just going to say that it is fantastic that power-sipping CPUs are showing up on motherboards again, and they have been showing up on motherboards with plenty of SATA ports. There’s no shortage of homelab-friendly mini-ITX motherboards to choose from today!

The Celeron N5095, N5105, and N6005 had some minor popularity, but not nearly as much as the Celeron N100. I can’t explain why. The Intel N100 is an incremental evolution over the N5095 and not some sort of revolution. It is only better in the ways that you would expect from a processor released two years later: it is a little faster and a bit more power efficient.

The previous generation is every bit as useful as the Intel N100. I imagine it just took a few years for everyone to start catching on, and the Intel N5095 that started the party is now just too old to be as relevant.

You didn’t answer the question, Pat!

You are right. Sorry about that!

The Celeron N100 is awesome because it is a little less than half as fast as a Ryzen 5600 desktop CPU, and you can get an entire mini PC with an N100, RAM, and storage for about the same price! The Intel N100 has a fantastic little GPU with more than enough video-encoding capacity to transcode several 4K Jellyfin or Plex streams. You can upgrade them to 48 gigabytes of RAM. They are also reasonably priced. You can usually buy a mini PC with RAM and storage for $145.

UPDATE: The Trigkey N100 with 16 GB of RAM and 500 GB NVMe has been going on sale almost once a week for less than $140. Sometimes as low as $134 shipped!

My own Intel N100 router-style mini PC idles at around 8 watts. That is the whole thing: motherboard, network interfaces, RAM, NVMe, and the CPU. It is using half as much electricity as a modern LED light bulb. Adding the external 14-terabyte hard disk drive brings that up to about 15 watts.

All of this applies to the Intel N5095 and its siblings.

You can go a long way with just a mini PC!

Everyone’s needs will be a little different, but 32 GB or 48 GB of RAM paired with an N100 is a pretty reasonable ratio of RAM to CPU horsepower, and so is 64 GB paired with a Ryzen 5560U. You can fit a lot of light- or medium-duty virtual machines and containers on these tiny boxes! What about storage?

I had a RAID array at home from the late nineties right up until just a few years ago. They just didn’t make large enough hard drives to hold all my data. The options were to manually split things between drives, or bundle them all together in a RAID and just dump everything in one place. I chose the latter because it also adds redundancy.

Hard drives have been getting bigger at a rate that exceeds my needs, so I am using single USB hard disks on each of my mini PCs and in my personal desktop computer. A copy of all my data lives on a 14 TB USB hard drive attached to the N100 mini PC at my house, and another copy lives on a 14 TB USB hard drive attached to a Raspberry Pi at Brian Moses’s house. The original copy of that data lives on a 12 TB SATA hard disk in the computer I am sitting at right now.

Assuming they survive long enough, those 14-terabyte disks will be rather old by the time I fill them up. When I do, I will be able to replace them with 30 terabyte disks.

What if I don’t like the idea of USB hard drives? What if I need way more storage than that?!

I haven’t had any real issues with the USB hard drives on my servers in the last four years. The USB disk on my N100 Proxmox server can read and write data at just over 200 megabytes per second, which is pretty close to the maximum speed of the drive that came in Western Digital’s enclosure. I have no qualms about recommending this configuration to other people.

I also don’t want to twist your arm! If you want to build a RAID using Linux’s mdadm or ZFS, I surely don’t want to stop you!

You don’t have to give up the power-sipping Intel N5095, N6005, or N100 to do that. If those offer enough horsepower for your needs, you can start shopping for a mini-ITX motherboard and a nice little case. There are plenty of options to choose from.

I like Brian Moses’s new off-site buddy NAS build!

I am not just saying this because I host Brian’s off-site NAS at my house. Many of the choices Brian makes in his builds can be attributed to my own recommendations, thoughts, and insights. In the case of the current iteration of his off-site NAS, my contribution was me exclaiming, “OMG Brian! You are absolutely correct! That 3D-printed NAS case from makerunit looks awesome!”

Brian has been selling Celeron N5105/N6005 and N100 motherboards in his eBay store for quite a while now. He wound up pairing one of those Celeron N6005 motherboards with six refurbished 12-terabyte SATA hard drives. He is running TrueNAS Scale and connecting to his other devices using the awesome Tailscale mesh VPN.

These Celeron N5105, N6005, and N100 motherboards from Topton are all awesome. You get six SATA ports, multiple m.2 NVMe slots, and four 2.5-gigabit Ethernet ports. They squeeze a lot of hardware into a small space, and they manage to utilize every one of the limited number of available PCIe lanes on these mobile processors.

It is a fantastic little server. Brian printed his case using two different shades of blue PETG filament, and he has his face logo on the front.

The makerunit 6-bay NAS case’s front cover is held on by magnets, and you can slide any of the six hard drives right out if you need to replace them. It might not be quite as slick as a case with real hot-swap drive bays, but it costs less than two rolls of filament, and it looks super cool!

The case has room for a full-height single-slot PCIe card, so you might be able to sneak in a slim GPU for some machine learning shenanigans. You really need at least 24 gigabytes of VRAM to run the more serious LLMs, but you can fit some really useful stuff into 8 or 12 gigabytes.

I don’t believe that you will find room to squeeze a GPU into the Topton N100 motherboards. It does have a 1x PCIe slot, and that would actually be just fine for running large language models that completely fit on one GPU, but you’re going to have clearance issue with other ports on the motherboard.

You don’t have to 3D-print a case

Finding cool new uses for your 3D printer is fun. So is finding a good excuse to buy a 3D printer, and I think all of makerunit’s cases qualify here. Even so, you still don’t have to 3D-print a case. There are no shortage of awesome mini-ITX and micro-ATX cases to build a compact NAS or homelab server around.

I am a fan of all the NAS cases from Jonsbo. The Jonsbo N1 and Jonsbo N2 look equivalent according to the spec sheet, but they each have a very different vibe, and they lay out the major components in very different ways. The Jonsbo N1 looks like something out of a sci-fi movie, while the Jonsbo N2 is a more functional and usable NAS case. I could argue with going with either one.

The Jonsbo N3 looks a lot like the Jonsbo N2. The Jonsbo N3 bumps you up to 8 hot-swap 3.5” hard drive bays instead of 5, and it gives you room for a dual-slot full-height PCIe GPU. Then there’s the even larger Jonsbo N4 that gives you room for a micro-ATX motherboard and a wide half-height GPU.

Or you can follow Brian Moses again and go even larger. He migrated his personal DIY NAS to a SilverStone CS382. It has 8 hot-swap 3.5” drive bays, room for a micro-ATX motherboard and full-size ATX power supply, and he is using an Icy Dock to squeeze eight additional 2.5” SATA SSDs into the single available 5.25” optical drive bay.

The SilverStone CS382 is a big case. That not only buys you room to fit your hands inside the machine when you need to work on things, but it also gives you room for a huge GPU and a less expensive and more capable ATX power supply.

There’s an option between a full mini-ITX NAS build and USB hard disks!

Topton makes a nifty little server, the R1 Pro, that we lovingly refer to as “the toaster NAS”. Why do we call it that? Because the pair of 3.5” SATA hard disks slide in from the top like two pieces of bread! It is reasonably priced, quite compact, and it sips power. Brian stocks them in his eBay store, but they are also available with different branding on Amazon and Aliexpress.

Stable Diffusion NAS Guy

The toaster NAS features the Intel N100 processor, one DDR4 SO-DIMM slot, one m.2 slot, and a pair of 2.5 gigabit Ethernet ports. It is basically a mini PC sandwiched between two hard drives but wrapped up in an attractive shell.

This is a fantastic middle ground between a basic mini PC and a 5-bay mini-ITX home server. Depending on how you are measuring things, you could almost fit four of these in the space that Brian’s 6-bay makerunit NAS is taking up under my network cupboard.

Can we trust these motherboards from Topton?

I have to admit that when I first heard about these motherboards a few years ago, I was worried that they’d be sketchy. You had to order them from Aliexpress. Who the heck is Topton? Do they make a reliable product? Will I have to ship it back to China if it doesn’t power on? What if it stops working in a week?

Then Brian bought one and used one. A few of our friends tried them, too, and we got more confident. We started hearing DIY NAS success stories. Then Brian started importing them by the dozen selling them in his eBay store, and we didn’t have to worry about something going wrong between us here in the United States and China, and we didn’t have to wait two or three weeks for the motherboard to show up. This helped a lot!

Today we know that Brian has shipped hundreds of Topton mini-ITX motherboards to customers. I spoke with Brian about failure rates and returns on more than one of our weekend pizza nights here at my house. I don’t want to blabber about specific numbers that I may or may not have been told in confidence and that I may not be remembering precisely, but I think it is safe for me to say that he has had some returned motherboards, and I believe every one of his very small number of returned motherboards worked just fine when he tested them.

We’ve gone from an anecdote, to anecdata, to real data over the last few years. At this point, I would try any of Topton’s motherboards without hesitation.

All of this advice works in reverse

I am not entirely certain that whatever insights this blog post provides qualify as advice, but I am going to refer to it as such anyway!

Maybe you have a huge but aging NAS in your home. Maybe you have a short stack of rack-mount servers in your closet. Perhaps you’ve been planning to downsize during your next upgrade, but you were planning to move from one or more giant 4U servers into something like the Jonsbo N2 or makerunit’s 6-bay 3D-printed case.

Stable Diffusion Mini PC Guy

Is it possible that this is bigger than you need? You might be better served with a toaster NAS and a couple of Trigkey or Beelink Intel N100 or Ryzen 5560U boxes in a little Proxmox cluster that would all fit inside your childhood Return of the Jedi lunch box. You may be better off with just a stack of mini PCs.

The neat thing is that there are no wrong answers. As long as you are treating your valuable data correctly and you have good backups, then getting everything else perfectly optimal is much less important. You can just do whatever seems fun, interesting, or useful to you!

Conclusion

We’ve explored a world of options here, from the humble USB hard drive to the spacious SilverStone CS382, and many options in between. There’s no single right answer when it comes to building your ideal homelab or NAS setup. The best choice depends on your unique needs, budget, and even your personal aesthetic preferences. Maybe you’re drawn to the sleek minimalism of a mini PC, or perhaps you crave the expandability of a larger case.

Whatever path you choose, remember that the journey is as important as the destination. Don’t be afraid to experiment, to try something new, and to learn along the way. Embrace the challenges and celebrate your successes. And most importantly, share your experiences with others!

Join the Butter, What?! Discord community and connect with a passionate group of individuals who are just as excited about homelabs as you are. Come share your builds, ask for advice, and learn from the collective wisdom of the group. Let’s build something awesome together! And don’t forget to leave a comment below and tell us what you think of this exploration of mini-ITX options. What are your favorite builds? What challenges have you faced?

GPT-4o Mini Vs. My Local LLM

| Comments

I am not some sort of machine-learning enthusiast. I am not trying to shoehorn large language models into every problem I encounter. I am just a dude who writes a blog post every week or so, and I have been messing around with various things to see how they might help streamline my workflow.

I am probably only scratching the surface, but I figured this is a reasonable time to write down what I have learned so far.

Stable Diffusion man using an llm

I am almost definitely not using the most optimal local models. I have tried a handful of different things available on Huggingface, and I settled in on things that seem to work well for my needs.

Large language models aren’t replacements for search engines

I used to see a lot of comments on Reddit and Hacker News that were excitedly posting what ChatGPT had to say about the topic at hand, or what ChatGPT said the answer was. More recently I am seeing questions asked on Reddit, not being answered in a way that the poster believes is adequate, so the original poster leaves a comment with the better advice they believe they got from ChatGPT.

Large language models make stuff up all the time. I asked one of the local models I was trying out about the specifications of my first childhood computer: the Texas Instruments TI 99/4a. Much to my surprise, that rather tiny large language model seemed to be correct! I didn’t verify that everything was perfect, but it matched my memory, and that was more than enough to impress me.

Then I asked it for the specs of an Apple 2e. It confidently told me that it had a dual Intel Xeon machine with 192 GB of RAM.

There is a lot of information hiding in a lossy compressed state inside these models. GPT-4o Mini definitely has more real-world information hiding in its weights than my local LLM, and the full GPT-4 model has way more information than that. Either has a better chance of being correct than my tiny local model, but they all suffer from the same problem.

Even if full GPT-4 will be correct more often, it will still be just as confidently wrong as my local model.

I have asked ChatGPT to summarize some of my 3D printing blog posts, and it has given me back lots of bullet points that are exactly the opposite of what I actually said. I only know this because I wrote the words. I’d be careful basing any important decisions off of a summary from ChatGPT if I were you.

Large language models can definitely supplement your use of search engines!

When Windows Copilot+ Recall was announced, I almost immediately started writing a blog post about how I felt about this sort of software. I remembered with absolute certainty that there was a Gnome project more than twenty years ago that attempted to keep track of your information for you in a similar way.

The project was called Dashboard. It monitored your email, instant messages, and text files in an attempt to gather related information before you needed it. It didn’t use AI, but it felt a lot like what Copilot is hoping to accomplish. The trouble is that I couldn’t remember the name of the project, and my Google searches were coming up completely empty.

Stable Diffusion man using an llm

I had a back-and-forth with ChatGPT about it. It went off the rails a lot, and ChatGPT was extremely confident of some wrong answers, but I didn’t eventually get it to mention both Dashboard and Nat Friedman in one of its responses.

Finding that nugget of information made it easy for me to find some old references to the actual Dashboard project!

This applies to OpenAI’s models and local models. They are so often breathtakingly incorrect, but we used to say the same thing about search engines twenty years ago. If you’re not a subject matter expert, you had better dig a little deeper to verify the responses from whichever chatbot you decide to use!

OpenAI’s API is stupidly inexpensive

OpenAI’s most expensive API costs $30 per million tokens, and their newest and cheapest model, GPT-4o Mini, costs only $0.60 per million tokens.

I have been using an Emacs package to help me quickly and easily send paragraphs and entire blog posts up to their API to have thing rewritten, rephrased, or have introductions written for me. I added $6.61 to my OpenAI account in November of 2023, and I have $5.83 left in my account as of July 2024.

I have no idea why I wound up adding $6.61 to my account. That is a really odd number, isn’t it?!

Sending a paragraph or two at a time for GPT-3.5 Turbo to rephrase was costing me a few pennies per blog post. Things got way more expensive when I started using the huge context window of GPT-4 to start each session with entire 2,500 word blog posts. That was costing me nearly an entire dime to work on a single blog post!

I used a local LLM to help me write an intro, conclusion, and call to action for my most recent blog post. When GPT-4o Mini went live, I sent it the exact same text, and I gave it the exact same prompt. That work cost me less than a penny. Not just less than a penny, but less than a tenth of a penny.

This was fun, because I have been messing around with various local models using the oobabooga webui for a few weeks, and I had forgot how lightning fast the OpenAI API can be!

My local LLM is slower and not as good as GPT-4o Mini

I didn’t buy some big, expensive LLM-friendly workstation with a $2,00 24 GB Nvidia RTX 4090. I have a little $300 12 GB Radeon 6700 XT. I didn’t buy it for running AI stuff. I bought it to play games! The AI stuff is a free bonus.

My local LLM can ingest an entire blog post in a handful of seconds, and it can respond to my prompt about that blog at around 25 tokens per second. That winds up settling down to something around 15 tokens per second as I get deeper into the chat and the context goes up.

How much better is GPT-4o Mini?

I gave both my local Gemma 2 9B LLM and GPT-4o Mini the following prompt:

If I give you the text of a blog post I am writing, can you write me a “conclusion” section with a call to action that includes joining our discord server and leaving a comment?

I pasted them both the contents of the blog post. I got this response back from GPT-4o:

Conclusion

In the world of 3D printing, choosing the right machine can be both exciting and daunting. While the Sovol SV06 has its merits, especially with the recent firmware upgrades that enhance its performance, it’s essential to recognize that the Bambu A1 Mini excels in various aspects, including user-friendliness and print quality. For those who value speed and ease of use, the A1 Mini is hard to beat, while the Sovol SV06 may appeal to enthusiasts looking for a larger build volume or the joy of tuning and customizing their machines.

Ultimately, your choice should align with your printing needs and preferences. But whether you opt for the Sovol or the Bambu, remember to keep your expectations realistic and informed.

I’d love to hear your thoughts! Have you had experiences with either the Sovol SV06 or the Bambu A1 Mini? Share your insights in the comments below. And if you’re passionate about 3D printing or want to connect with other enthusiasts, consider joining our Discord server! It’s a fantastic space to share tips, get advice, and stay updated on the latest in the 3D printing community. Let’s keep the conversation going!

This was the response I got from Gemma 2 9B:

Great! Here’s a conclusion with a call to action incorporating your Discord server and a comment section:

Sovol’s SV06 is a capable machine, especially when you consider the price. But, if you’re looking for an out-of-the-box experience with exceptional print quality and ease of use, the Bambu A1 Mini is hard to beat.

Want to explore the world of 3D printing and share your own projects? Join our Discord server where you can connect with fellow makers, get advice, and show off your creations! [Link to your Discord server]

Let me know in the comments what you think of the Sovol SV06 – would you consider it for your next 3D printing project?

I could never use any of these words without heavy editing. Every AI has a weird voice that feels way too pretentious for me.

I also don’t want to say that one of these did a better job than the other. The response from OpenAI is rather long winded, but it does contain a lot of useful information that Gemma’s response is missing. That said, the response I got from Gemma 2 9B is much more like the sort of conclusion I would write on my own.

While both GPT-4o Mini and my local LLM do appear to slowly type a response to you query, the difference is that GPT-4o Mini is only pretending to be as slow as it appears. It actually took GPT-4o Mini about two seconds to complete the entire task, whereas my local LLM took 25 seconds to ingest my blog post and return its entire first response.

That is definitely a usable pace, but I have spent a lot of money on hardware to get lesser responses that take longer. I could work with GPT-4o Mini to help me rewrite all 800,000 words of blog posts I have ever written and it wouldn’t even cost 5% of what I paid for my GPU.

Do you know what I think is neat? I had an LLM generate a conclusion for one of my recent blogs, and I decided to leave in a couple of sentences and phrases that absolutely never would have written on my own. I didn’t feel entirely comfortable speaking the way the machine wrote the words, but they were useful and relevant statements. If it didn’t work out, I could blame the robot!

The GPT-4o Mini API is extremely cheap, but my local LLM is almost free

The electricity required to have an LLM churning away on my workstation is a rounding error. My GPU maxes out at 174 watts, but llama.cpp doesn’t seem capable of utilizing all of the capacity, so it rarely goes far past 100 watts. It would cost less than twenty cents if you could somehow coax oobabooga into running for 10 hours straight with no downtime. That would be somewhere between 500,000 and 900,000 tokens.

The hardware is a sunk cost. I need my GPU to edit videos with DaVinci Resolve and play games. I need my CPU, RAM, and monitor just to be able to do any work at all. I am not investing in hardware to run a language model. I already own it. Running the model is effectively free.

Stable Diffusion neon pcb

Free isn’t cheap enough to justify the effort. My back catalog of blog posts should be somewhere around a million tokens. It would cost me 30 cents every time I have GPT-4o Mini ingest the whole thing, and it’d only cost $1.20 to get a full rewrite of my blog back out. If I were working with the API to actually do a good job of reworking my old posts, then I would most definitely have to go back and forth more than once with many of the paragraphs.

I can’t imagine having to spend more than $10 or so conversing with the OpenAI API to rewrite my entire blog. The OpenAI API would respond faster than my local API via oobabooga, and that alone would save me way more than $10 worth of my own time.

I would never actually do this, but this is by far the most extreme use case I can come up with for using an LLM, and it would only cost me ten bucks!

What if I don’t want to upload all my words to a third party?

This has to be the best reason by far to use avoid using an LLM in the cloud. Would Stephen King want to risk the chapters of his latest novel leaking? Maybe he wouldn’t care. I don’t think that a leak would have a significant impact on his life, but I think you understand the idea.

I have no qualms about uploading my words to OpenAI before I publish them. It feels icky in principle, but I’m not some sort of investigatory journalist trying to beat everyone to the latest scoop. The majority of the words that I write wind up on the public Internet anyway. Who cares if they manage to sneak out a week early?

You might not be as fortunate as I am. You might have actual private work that needs to stay private. I could totally see investing in a small server with an Nvidia 3090 to run Gemma 2 27B so your small business can have access to a reasonably powerful LLM. Spending a few thousands dollars to not leak your secrets is pretty inexpensive!

What is Pat actually running locally?

The two models that I have really settled on are CodeQwen1.5-7B-Chat-GGUF and gemma-2-9b-it-GPTQ, both of which are available at Hugging Face. I think Meta-Llama-3.1-8B-Instruct-GPTQ-INT4 is also fantastic, but Gemma 2 seems to fit my blogging needs better.

1
2
3
4
5
6
7
8
9
10
$ du -sh *GPTQ*
5.4G  hugging-quants_Meta-Llama-3.1-8B-Instruct-GPTQ-INT4
5.8G  marcsun13_gemma-2-9b-it-GPTQ

$ ls -sh *.gguf
4.5G codeqwen-1_5-7b-chat-q4_k_m.gguf
4.0G internlm2_5-7b-chat-IQ4_XS.gguf
4.4G internlm2_5-7b-chat-q4_k_m.gguf
6.0G internlm2_5-7b-chat-Q6_K.gguf
4.2G Llama-3-Instruct-8B-SPPO-Iter3-IQ4_XS.gguf

I was mostly using InternLM 2.5 before Gemma 2 9B and Llama 3.1 8B were released. I tried three different quantizations of InternLM with the context window set to about 12,000 tokens. They eat up about 6, 6.5, and 7 gigabytes of VRAM on my Radeon 6700 XT, and they seem to have the same speed and quality for my use cases.

InternLM supports a context window as large as one million. The more VRAM I leave free, the larger I can adjust the context window. I haven’t needed more than six or seven thousand tokens of context yet.

I had trouble getting CodeQwen1.5 to load. I recall the errors in the stack trace seeming a little nonspecific, so I originally assumed there was just a compatibility issue somewhere. I dialed back CodeQwen’s massive default context window, and it loaded right up. I was just running out of VRAM!

I have not properly used an LLM to help writing any code. I don’t tend to write much code. I just figured I should try out one of the programming-focused LLMs, and CodeQwen seemed to do a nice job of spitting out the short shell scripts that I asked for.

Stable Diffusion is also awesome!

I feel like I have to say this every time I write any words about machine learning. I have been using Stable Diffusion via the automatic1111 webui since I upgraded to my current GPU. It is so much fun generating silly images to break up the walls of words that are my blogs.

I can queue up a bunch of images with different config scales using a handful of different checkpoints, then I can wander off an make my morning latte. I will have several hundred images waiting for me when I get back, and I usually just pick the ones that make me giggle the most. The more fingers the better!

Why am I messing around with running these things locally at all?!

I’ve already said that it doesn’t make a lick of difference if my blog posts go up into the cloud before they are published, and OpenAI’s API is much faster and costs almost nothing. Why bother with any of this?

I think it is neat, and I am having some fun. I am excited that I know where the limits of local models seem to be, and I now understand how much GPU you need to buy to run something useful.

It is awesome that things are moving so quickly. When I bought my 12 GB GPU just over a year ago, I looked around to see what sort of value a large language model small enough to fit in my VRAM might add. At the time, those small models seemed to do a rather poor job.

A year later, and these small models are starting to feel quite capable! I imagine that things will continue to improve in the future.

Getting Llama 3.1 and Gemma 2 working with my Radeon GPU and oobabooga webui required a bit of a kludge!

At least, I think I am using Llama 3.1 8B. You have to roll back llama.cpp to an older version if you want to use the latest releases of oobabooga with AMD’s ROCm, and that version of llama.cpp doesn’t work with Llama 3.1 or Gemma 2, so I am running Llama 3.1 and Gemma 2 9B via ExLlama V2. I have no idea if I am doing this correctly.

Conclusion

As I continue to explore the capabilities of large language models and local alternatives, it’s clear that these tools have the potential to assist with my creative processes in interesting ways. My hope is that machine learning can take some of the monotonous work off my shoulders.

I believe it is doing that to a small extent, but at the same time it is creating more work for me while at the same time improving my writing. I’m not sure how much my blog posts are improving when I bounce things off of the artificial intelligence monstrosities, but it is interesting enough that I am going to continue to do so just to see where it leads.

This is the part where GPT-4o Mini suggested that I invite you to join the Butter, What?! Discord community, where we share our experiences, insights, and tips on leveraging AI in creative projects. I can’t say that, because it isn’t even true! There is hardly any machine learning nonsense going on in there, but it is a fun place where a lot of homelab, NAS, and 3D printing projects are often being talked about!

Additionally, I’d love to hear your thoughts on this topic! Have you experimented with local models or found innovative ways to integrate LLMs into your work? Leave me a comment below and let’s start a conversation!

The Sovol SV06 Got Much More Interesting In 2024

| Comments

First of all, I am NOT saying that you should buy a Sovol SV06. I wholeheartedly believe that the Bambu A1 Mini is by far the best $200 3D printer, and I really do think everyone should own one. The only measure where the Sovol SV06 beats the Bambu A1 Mini is size.

If you click one of my links and buy a Sovol, I will make a few dollars. If you buy a Bambu printer, I will make absolutely nothing. I still think you should buy a printer from Bambu.

Stable Diffusion 3D printer guy

I DO believe there are good reasons to buy a Sovol SV06, SV06 Plus, or SV08. I also believe that Sovol makes a decent printer, but you have to understand what you’re buying.

Now that we got that out of the way, let’s talk about why the Sovol SV06 recently got more awesome.

The Sovol SV06 now ships with Marlin’s input shaper

Do you already own a Sovol SV06? If you do, then you should definitely head over to Sovol’s website and download the latest firmware and Orcaslicer profiles. Your prints will run two or three times faster.

I updated my Sovol SV06 to support Marlin’s input shaping a little over 12 months ago. I had to dial it in on my own and create my own PrusaSlicer and Orcaslicer profiles. That work resulted in me being able to print a 21-minute Benchy with a 0.6 mm nozzle.

This was my Sovol SV06 going about as fast as Octoprint will allow!

This is neat, but I have to note here that I have seen my Bambu A1 Mini print a Benchy in less than 14 minutes at a higher resolution with its 0.4 mm nozzle. The A1 Mini can print faster than my Sovol SV06. Even so, if I slice a Benchy myself with Bambu’s stock profiles, it won’t print much faster than the 35-minute Benchy that Sovol is advertising for the SV06. The 14-minute Benchy on Bambu’s SD card is super optimized!

You don’t have to do any of the work I did to make my own Sovol SV06 fast. If you buy a Sovol SV06 today, it will ship with officially supported firmware that has the input shaper configured for you. Once you figure out how to get your Sovol up and running, you will be printing almost as fast as a Bambu printer.

If you already own a Sovol SV06, you are one firmware update away from printing almost as fast as a Bambu Lab printer!

The Sovol SV06 is ancient technology

The Sovol SV06 is an open-source clone of the venerable Prusa MK3. It is a solid, reliable, and well proven design, but it is built around technology that was state of the art in 2016.

It is also extremely important to note that while Prusa uses premium components, Sovol seems to be using the cheapest parts they can get away with. This isn’t necessarily a bad thing, but this is one of the reasons why a Prusa MK3S kit was $750 while an almost fully assembled Sovol SV06 costs $199.

Fast benchy on my Sovol SV06

One of my first 30-minute Benchy prints on my Sovol SV06

Almost every single Prusa ships with straight rods, good bearings, and a decent print bed. Some people get a good roll of the dice, and they wind up with a nearly perfect printer from Sovol. A few people will wind up rolling poorly, and they will receive a dud.

There are a lot of little things that a Prusa MK4, Prusa XL, or any printer from Bambu can do that a Sovol SV06 can’t, but the most important upgrade would be their modern bed-leveling systems.

Sovol’s printers are still using an inductive probe to locate the heated bed. These sense the position of the print surface indirectly, and their accuracy varies with temperature. That means they aren’t detecting the PEI surface that plastic has to stick to—they are sensing the metal underneath, which means it is up to you to explain to the printer where the nozzle is in relation to the metal surface.

Bambu Lab printers and the most recent Prusa printers detect when the nozzle makes actual contact with the surface. These printers know exactly where the nozzle is in relation to the PEI sheet. Temperature doesn’t have any impact on this like it does with a PINDA probe, and heat slightly changing the shape of the bed doesn’t matter. These more modern printers know how to get a perfect first layer almost every single time.

0.2mm nozzle benchy

The Bambu A1 Mini prints a very nice tiny Benchy with zero tuning or effort, using the 0.2 mm nozzle!

I only use my Sovol SV06 once or twice a month, and I always wonder if I am going to have to adjust the Z-offset for my next print. I have been doing this for more than a decade. I know how to make this adjustment quite well, but the trouble is that I don’t know if I will need to do it until after the start of my next print fails. This is a waste of my time.

If you are new to 3D printing, then this is the thing you are most likely to mess up. Buying a printer that doesn’t require you to dial in a z-offset may be worth several hours of wasted time.

You get a lot of nice features when you buy a Bambu A1 Mini. This feature in particular could be the difference between immediately seeing a successful print vs. hours of frustration.

Sovol is cutting as many corners as they can get away with

There are people all over r/Sovol and r/3DPrinting complaining about bent linear rods, crunchy bearings, and warped beds on their Sovol printers.

It is challenging to determine which folks are actually having a problem, and which folks are expecting perfection. Nobody is getting a perfectly flat print bed on a Sovol SV06, and you don’t need one. Marlin’s mesh bed leveling can compensate for a bed that is 0.4 mm or so out of flat.

Some people have DEFINITELY received beds that are too bent to be usable, but Reddit is flooded with people who think they need perfection. You don’t need perfection to get a good first layer, and Sovol isn’t likely to ship you perfection.

It isn’t just that Sovol chooses cheap components. Their customer service department is tiny, they are slow to respond to emails or tickets, and they don’t keep people updated about the status of their preorders.

Every time Sovol releases a new printer, you will see one or two people every day on Reddit complaining that Sovol isn’t answering their emails about when their printer is going to ship. If you’re expecting good customer service, then you need to spend more money to get it.

You are getting precisely what you pay for. You just need to understand that you aren’t paying for a premium product, and you aren’t paying for a cadre of customer support people. These are some of the reasons that you are getting a bigger printer from Sovol for less money.

Refurbished Sovol SV06 printers are where things get interesting

The Bambu A1 Mini and Sovol SV06 are both about $200. Just about the only reason to pick the Sovol is the bigger build volume. The Bambu is better in every single other way.

Refurbished Sovol SV06 printers are listed at $149. Now you get a bigger build volume AND you save about $50. Is that a good deal?

I’d still go with the A1 Mini. At the time I am writing this, the Sovol SV06 refurbs don’t seem to ship with the input-shaping firmware installed. That is something you will have to spend time upgrading, and it is something that you can mess up.

But that is just, like, my opinion, man! The hours I save using my Bambu A1 Mini instead of futzing with my Sovol SV06 are easily $50 to me. In fact, those hours are worth way more than $50 to me. The results that come out of my printer enable my other hobbies, so I prefer that my printers just work.

A whole lot of people enjoy tuning and upgrading their 3D printers. This is a valid hobby all on its own, and it makes a ton of sense to buy a Sovol SV06 or Sovol SV08 if that is what you enjoy. I certainly don’t want to deprive you of something that you would consider fun!

A few more complaints about the Sovol SV06

When I bought my refurbished Sovol SV06 in April of 2023, my biggest complaint was how loud these printers are. My Prusa MK3S was almost as quiet as a whisper when printing in stealth mode, while the SV06 has the drone of loud, cheap fans as soon as you flip the power switch, and things only get louder when you start printing.

This is still true today. My Sovol SV06 is louder than my Bambu Lab A1 Mini, but neither printer is quiet. You just can’t print fast and quiet. A 3D printer has to move a lot of air to cool the object you are printing, and the faster you are printing, the faster those fans need to go.

Sovol SV06 cable clips

*My SV06 has inadequate cooling for high speeds toward the front left. These are printed with the same settings, but with the overhang oriented in different directions.

Keep in mind that I have upgraded every fan on my Sovol SV06, and I have dialed in these bigger fans to spin as slowly as I can get away with to keep the noise down. My Sovol SV06 is probably quieter than stock.

A lot of people will tell you that there are upgrades and other things that you HAVE to do to your Sovol SV06. They are exaggerating. You don’t have to do anything. The Sovol SV06 prints just fine with 100% stock hardware.

That said, it is hard to resist the urge to upgrade. Those crappy 4010 fans on the extruder assembly are loud and underpowered. Upgrading the part-cooling fan will improve overhangs and bridges, and it will allow you to print a little faster. Swapping out the power-supply fan will make the printer quieter. Spending two bucks on a knock-off CHT-style nozzle will increase your maximum print speed by up to 35%.

If you are planning on immediately spending $20 or $30 on upgrades and investing hours of your time into getting those things working, it might be worth considering putting another $100 in and just skipping ahead to the even bigger and better 256-mm Bambu A1.

I feel like I have to mention the Sovol SV08

The Sovol SV08 is an impressive machine. It is an open-source clone of the Voron. It’s massive 350x350x340 mm build volume is very nearly as big as the $1,999 Prusa XL, but the Sovol SV08 is currently priced at $579.

If you need a machine as big as the Prusa XL, then the Sovol SV08 seems like a steal!

I am looking at things from the other direction. I rarely need a printer larger than my Bambu Lab A1 Mini, but I would enjoy owning a printer larger than my Sovol SV06. That printer SHOULD be a full-size Bambu A1 with an AMS Lite, but there is a tickle in my brain telling me I should put a Sovol SV08 in place of my Sovol SV06.

I liked the value proposition from this angle a lot better when the Sovol SV08 launched with early-bird pricing of $450 while the Bambu A1 was $399 or $559 with an AMS Lite. The trouble is that the Sovol SV08 has crept up in price, and Bambu sure seems to have made their anniversary sale pricing permanent.

Now it is $579 for the Sovol SV08, $339 for the Bambu A1, or $489 for the Bambu A1 with AMS Lite. That has me leaning back toward the convenience of a Bambu Lab printer with an AMS.

Conclusion

In the world of 3D printing, choosing the right machine can be a daunting task. While the Sovol SV06 does have its merits, especially with the recent firmware upgrades that enhance its performance, it is essential to recognize that the Bambu A1 Mini excels in various ways, including user-friendliness, build quality, print quality, and performance. For those who value ease of use, speed, and price, the A1 Mini is hard to beat, while the Sovol SV06 may appeal to enthusiasts on an even tighter budget who are looking for a larger build volume or the joy of tuning and customizing their own printers. Not all of us find joy in working on our 3D printers, and that is OK!

Ultimately, your choice should align with your printing needs, preferences, and budget. Whether you opt for a Sovol or a Bambu 3D Printer, remember to keep your expectations realistic.

I’d love to hear your thoughts! Have you had experiences with either the Sovol SV06 or the Bambu Lab A1 Mini? It would be awesome if you would share your insights in the comments below. And if you’re passionate about 3D printing or want to connect with other enthusiasts, consider joining our Discord server! It’s a fantastic space to share tips, get advice, and stay updated on the latest in the 3D-printing community.

Everyone Struggles to Identify Bottlenecks

| Comments

Identifying bottlenecks is a skill that many of us struggle with. Learning to pinpoint bottlenecks is important, and learning which bottlenecks to ignore can save you a lot of money.

In this blog post, we’ll explore the concept of bottlenecks and how they can impact your computing setup. We’ll discuss common types of bottlenecks, such as CPU, GPU, network, and IOPS, and provide tips and strategies for identifying and addressing them. Whether you’re a seasoned homelab enthusiast or just starting out, understanding bottlenecks is crucial for optimizing your system’s performance and getting the most out of your hardware investments.

What is a bottleneck?

You have seen a bottle. When you pop the top and tip that bottle over, the liquid won’t instantly exit the container. Its exit is slowed by the narrow neck of the bottle. The wider the neck, the faster the liquid can flow.

Stable Diffusion Bottleneck Guy

You see bottlenecks everywhere. The door to your house will only let one person through at a time. The escalator at the mall only has room for two people on each step. The ramp onto the 4-lane highway is one lane wide. Your faucet is only an inch wide, so it takes a long time to fill a pot of water.

Every IT department faces similar constrictions at every level of their operation.

You will always have a bottleneck, and that is OK!

This section almost came in right before the conclusion, but I think it is a better idea to discuss this closer to the top. No system will be perfectly balanced. There will always be a bottleneck somewhere. Every time you eliminate one bottleneck, that will just move the bottleneck elsewhere.

What matters most is that your system is designed in such a way that your bottleneck is acceptable. Performance may be the primary driver behind your design, but cost is usually a significant factor as well.

10-gigabit Ethernet has gotten pretty cheap, while 40-gigabit Ethernet hardware is still extremely expensive. Just because your 10-gigabit network is slower than the disks in your NAS doesn’t mean that the bottleneck is a problem. Would upgrading to 40-gigabit Ethernet ACTUALLY improve your workflow? Four times faster than fast enough is also fast enough.

You may learn that plain old gigabit Ethernet will do the job well enough, and that 2.5-gigabit Ethernet won’t cost you much more.

Your CPU might be bottlenecking your GPU!

PC gamers seem to love using the word “bottleneck,” and they really seem to enjoy using it as a verb. I dislike the verbing of the word bottleneck because it seems like a significant percentage of the people using “bottleneck” as a verb aren’t using the term correctly.

There are real bottlenecks in a gaming PC. The CPU, the GPU, and your monitor each have to do work for every frame that is displayed. Every game needs a different balance, but whichever component is maxed out first is your bottleneck.

If your GPU is near 100% utilization and your FPS isn’t keeping up with your monitor, then your GPU is a bottleneck.

If your GPU is significantly below 100% utilization and your FPS isn’t keeping up with your monitor, that means your CPU is the bottleneck.

When your CPU and GPU are both barely breaking a sweat while running your game at your monitor’s maximum refresh rate, then it might be time for a monitor upgrade! Your 60 Hz 1080p monitor might be a bottleneck, because your other components have room to render at a higher resolution or more frames per second. It might be time for a nice 34” 3440x1440 165 Hz upgrade!

I didn’t have a terribly appropriate photo to use in the section, so I dropped in a test clip of Severed Steel running on my Ryzen 5700X with a Radeon 6700 XT. You can see that the game can’t quite manage to maintain a constant 144 frames per second, and my GPU is at 97% utilization and max wattage, so GPU is holding me back from keeping up with my monitor!

The network is always the bottleneck of your NAS

I hate to use the word “always.”“ It always feels like a lie. As long as your NAS is ONLY being used as a NAS, then this is almost definitely correct. Sharing files tends to be light-duty work.

A 10-gigabit Ethernet connection can move data at about 1 gigabyte per second. That sounds fast, but it really isn’t! Three 20-terabyte hard disks can nearly max that out. A pair of SATA SSDs will be a little faster, and any NVMe you can buy will be a lot faster.

I paid less than $100 for a low-power motherboard and CPU combo ten years ago, and it could share files over CIFS or NFS as fast as the PCIe slot could go via my 20-gigabit Infiniband network. The available PCIe slot was my bottleneck then because it maxed out at around 8 gigabits per second, so I was definitely underutilizing my network at the time!

Stable Diffusion Bottleneck Guy 2

This is what encouraged me to start writing this post. There was a tiny Celeron N100 server posted on r/homelab recently that had slots for four NVMe drives. It is the more compact sibling of my own Celeron N100 homelab box with 5 NVMe slots.

So many comments on that Reddit post were complaining that each NVMe slot only has a single 1x PCIe 3.0 connection. These folks are all bad at finding bottlenecks! That little server only has a pair of 2.5-gigabit Ethernet ports, so any single NVMe installed in that miniature server will be twice as fast as all the network ports combined.

What if your NAS isn’t just a NAS?

I wholeheartedly believe that you cram as much stuff onto your homelab server hardware as possible. Serving files over CIFS or NFS might occasionally max out your network port, but it usually leaves you with a ton of CPU and disk I/O left to burn. You might as well put it to good use!

Stable Diffusion stack of hard disks

Running Jellyfin or Plex in a virtual machine or container on a NAS is quite common. I am fortunate enough that my Jellyfin server rarely has to transcode any video, because most of my Jellyfin clients can directly play back even the 2160p 10-bit files in my meager collection.

I do have one device that requires transcoding. My Jellyfin server can transcode 2160p 10-bit video at 77 frames per second, and it can transcode 1080p 8-bit video at well over 200 frames per second. That means the GPU on my little Celeron N100 is the bottleneck when transcoding video, and I will be in trouble if I need to feed more than three devices a 10-bit 2160p video at the time.

That bottleneck is so wide that I will never need to worry about it, and the transcoding runs a little faster when two or more videos are going simultaneously, so I wouldn’t be surprised if I could squeeze in a fourth playback session!

Sometimes, the bottleneck is IOPS and not throughput

Old mechanical hard disks have reasonably fast sequential throughput speeds. A 12-terabyte hard disk will be able to push at least 100 megabytes per second on the slow inside tracks, and a 22-terabyte has enough platter density to push nearly 300 megabytes per second on the fast outer tracks.

Every 7200 RPM hard disk has a worst-case seek time of 8 milliseconds. That works out to only 120 I/O operations per second (IOPS). A single hard disk has enough throughput to easily stream a dozen 4K Blu-Ray movies, but it might only be able to insert 120 records per second into your PostgreSQL database.

The cheapest SATA SSD can handle tens of thousands of IOPS, while the fastest NVMe drives are starting to reach 1,000,000 IOPS. These drives are fast when streaming a Blu-Ray, and they don’t slow down when you start updating random people’s phone numbers in you 750 gigabyte customer database.

The vast majority of people adding a NAS to their home network are storing video files from the high seas, or they are storing backups. If you fit either of these descriptions, then you probably only need inexpensive 3.5” hard disks.

My personal video storage is mostly footage taken straight off my own cameras, and I work with that footage in DaVinci Resolve. I layered a few hundred gigabytes of lvmcache on top of slow video storage,because 8-millisecond seek times add up to noticeable lag when you are bouncing around a timeline that references three or four videos.

One seek to get to the correct video frame, at least one more seek to backtrack to the previous keyframe, then maybe a third seek to pull in enough video to start rendering—that adds up to around 100 milliseconds on a mechanical hard disk before the GPU even gets to start decoding and rendering the video, while it would take less than one millisecond on any solid-state storage device. That is a difference you can feel!

Caching to an SSD is a great way to smooth out some of the rough edges. The SSD can catch thousands of those database updates and flush them back to the slow disk later on. My SSD cache is big enough to hold one or two projects’ worth of video files, so it is usually only holding on to the data that I need to work with this week.

Conclusion

In summary, understanding and addressing bottlenecks is crucial for optimizing the performance of your NAS and homelab setup. Identifying which component is constraining your system can make a world of difference, and recognizing these limitations can help you make informed decisions about upgrades and configurations, or even whether or not you should worry about upgrading anything at all!

It is your turn to contribute to the conversation! Share your insights, experiences, or questions related to this topic in the comments below. Have you encountered any unexpected bottlenecks in your own setup? How did you overcome them? Was the upgrade to reduce your bottleneck worth the expense? Let’s learn from each other and continue to refine our systems.

If you’re interested in connecting with a community of homelab, NAS, and even gaming enthusiasts, consider joining the *Butter, What?! Discord community. You can engage in discussions, share knowledge, and stay up to date on the latest trends, developments, and deals in the world of homelabbing.

Is Machine Learning Finally Practical With An AMD Radeon GPU In 2024?

| Comments

I don’t know for certain that I have enough to say here to justify writing an entire blog post, but let’s see what happens!

It has been a little over a year since I upgraded this computer from an ancient Nvidia GTX 970 to an AMD Radeon RX 6700 XT. I really needed that upgrade, but I was stubborn, and I didn’t want to pay those inflated crypto-mining GPU prices, so I hung on to that GTX 970 for way longer than I wanted to.

Stable Diffusion Nervous LLM Kid

I think I did a good job. Fifteen months later, and I have only seen my new GPU go on sale for at most $120 less than what I paid. I am happy to have paid less than $10 per month for the privilege of being able to have so much more fun playing games, so I think the timing of my upgrade was pretty decent!

I am not some sort of software developer trying to teach an LLM to learn how to read and do other stuff good too. I am just a guy hoping to use existing models in some sort of useful way.

One of the first things I learned immediately after installing my new GPU was that running AI models with an AMD GPU on Linux was a crapshoot.

At the time of my upgrade, getting Stable Diffusion to run with an Nvidia GPU was as easy as running one command and waiting for things to download. Getting it running on my Radeon took several attempts, and I felt like I was lucky to get it working at all. Every time I wanted to update my Stable Diffusion install, it was almost certain that something would break.

Getting Llama up and running seemed like it would be nearly impossible, but things are much improved today!

I had Oobabooga’s text-generation-webui up and running in thirty minutes

Since you are reading this, you can probably do it in less than half the time.

The first problem I had was picking out a model or two to download. I don’t know if I will find something better, but so far I have been pleased with MistralMakise-Merged-13B. It seems reasonably capable, and it fits well in my 12 GB of VRAM.

NOTE: So far, I am happier with DiscoPOP-zephyr-7b-gemma, and I am using it to help me put the finishing touches on this blog post before I send it to my editor for review.

My second problem was easily solved by punching some error messages into Google, but it took a few clicks before I found the solution. It is mentioned in their documentation under the AMD heading, but that section is way down near the bottom, and I managed to miss it.

1
2
3
4
5
6
# I had to uncomment those two lines in one_click.py

# Remove the '# ' from the following lines as needed for your AMD GPU on Linux
# os.environ["ROCM_PATH"] = '/opt/rocm'
os.environ["HSA_OVERRIDE_GFX_VERSION"] = '10.3.0'
os.environ["HCC_AMDGPU_TARGET"] = 'gfx1030'

If you have a functioning Nvidia GPU, CUDA will just work. If you have a working AMD GPU, things are a little more complicated. When you have the RIGHT Radeon GPU with ROCm correctly installed, Oobabooga’s text-generation-webui will also probably just work.

When you have a different Radeon, you have to give pytorch some hints as to which ROCm bits to actually use. This is a pain point, but if this is the only real problem we’re going to be running into today, then things are already infinitely better than they were a year ago!

Installing ROCm and OpenCL might be a pain!

The documentation says that I need ROCm 5.6 to use text-generaton-webui, but I already have ROCm 5.4.6 installed. That is the version that seems to work well with DaVinci Resolve Studio 19, my RX 6700 XT, and Mesa 24. It seems to be working just fine for text-generation-webui as well!

I would love to tell you the correct way to install ROCm and OpenCL, but I always goof something up, and I wind up jumping through hoops to fix it. That means I don’t REALLY know how to install these things. I know how to goof it up, correct at least some of my mistakes, then wind up with a working installation. I am not even confident that doing things in what seems to be the correct way would even get me to the correct destination!

The newer versions of ROCm can’t easily exist alongside the bleeding-edge versions of Mesa. If you install ROCm 5.6 or newer you can expect to not be able to play games or run DaVinci Resolve. At least, that was the case when I set things up last. This should be a problem that will eventually straighten itself out.

I don’t think this is currently any better or worse than it was a year ago. This is something AMD really, really, really needs to do better. Really.

Should you bother running an LLM on your local machine?

I am sure there are some fantastic reasons to avoid using the ChatGPT API. I do not enjoy the idea of sending all my words up to some company’s servers, but all the words I write are published to one of my blogs, so that doesn’t really matter for me.

The ChatGPT API is quite inexpensive. It didn’t even cost me an entire dime when I was messing around with sending every paragraph of my blog posts up to ChatGPT for rewrites and suggestions. That was with GPT-3.5-Turbo.

Stable Diffusion Man With Tiny Robot

GPT-4o is still inexpensive, but I could easily get up into dollars instead of cents. One of the problems is that GPT-4 offers a much bigger context window, so I can send entire blog posts up as context. Even though GPT-4o is still inexpensive per token, it encourages me to send up 1,500 tokens of context with each query.

OpenAI’s API is FAST. I forgot just how fast it was until I decided to ask GPT-4o and my local Mistral LLM to rephrase the same paragraphs. I ran the paragraph through ChatGPT first because I have some shortcuts to make that happen in Emacs, and I was surprised that it was able to ingest my context and give me a full answer almost instantly. The local LLM on my $300 GPU took a noticeable number of seconds to give me a response.

OpenAI’s API isn’t just ridiculously fast—it also gives significantly better responses than my tiny GPU ever could. I can’t even run an equivalent to GPT-3.5-Turbo locally, and GPT-4 and GPT-4o are orders of magnitude bigger than that.

Speed doesn’t matter if you don’t want to send your private information to a third party.

Sometimes, quantity beats quality!

The game shifts a little when you can run something locally and do not have to pay for every single transaction.

My GPU can consume an additional 175 watts when running at full tilt. It would take something like four hours of me constantly interacting with a local LLM to add 10 cents to my electric bill, and I certainly can’t ask her enough questions to keep her running without lots of breaks. My cost to keep the LLM running and answering all my questions is effectively zero.

I absolutely love being able to run Stable Diffusion locally. I can try out a handful of weird prompts to find something that makes me giggle. Then I can ask Stable Diffusion to generate a eight images at two different guidance scales using six different checkpoints. It will grind away for ten to fifteen minutes while I make a latte, and I will have 96 images to evaluate when I sit down. Usually one will be goofy enough to break up a wall of words in a blog post.

I can’t max out my GPU with an LLM for long, but asking Stable Diffusion to generate 96 images will keep my GPU maxed out for ten minutes. That means I can generate more than 2,000 images for a dime.

I can see myself doing something similar for my blog-writing workflow in Emacs. Right now, I just send a paragraph or two to GPT-4o when I can’t find a synonym I like, can’t decide how to start the next paragraph, or just don’t like the flow of a sentence. OpenAI’s API is almost always just a lazy thesaurus for me. ChatGPT’s writing feels either way too pretentious or corporate for my taste, but it does often inspire me to reorder sentence fragments into something that reads more pleasant.

Stable Diffusion Man with Robot

When the LLM doesn’t cost me anything to run, why not throw everything into that blender to see what comes out? I could write some Emacs Lisp that will send every paragraph to the OobaBooga interface as soon as I hit the Enter key. I’ve already tried connecting my Emacs automations to my local LLM’s API, and it works great even if it feels so much slower than GTP-4o!

Maybe it could show me the rephrased paragraph next to the window I am typing in. Maybe I could puzzle out a prompt that would coax the robot into only speaking up if its rewrite or suggestion seems like it would be helpful to me. Perhaps I could send it the last two or three paragraphs and give it a chance to write the next one?

I think this sort of thing would have to be done one paragraph at a time, or at least be limited to a few paragraphs. When I asked ChatGPT to turn six columns of a Google Sheet into a Markdown table, it gave me back the results in a few seconds. It LOOKED like it was typing the results slowly, but I was able to hit the copy code button right away, and the entire table was available.

It took my local Mistral robot 90 seconds to give me the complete table of mini PC prices and performance. The latency would be too high if my local artificial brain works with too much text at once!

Not every employee needs to be Albert Einstein

My little Radeon 6700 XT with 12 GB of VRAM will never run an LLM that can compete with what can be run on even a single AMD MI300X with 192 GB of RAM, and it certainly can’t compete with a server full of those cards!

That is OK. I don’t need to hire Albert Einstein to figure out how fast my FPV drone falls when I dive down the side of a building. A high school student should be equipped to handle that task, just like my little Mistral 7B LLM can give me a handful of synonyms.

I don’t need to hire Douglas Adams to fix up my words, even if I wish I still could!

Let’s get back on topic

We are supposed to be talking about how much easier it is now to run machine learning stuff on a Radeon GPU. I feel like automatic1111’s stable-diffusion-webui and oobabooga’s text-generation-webui cover something like 90% of the machine learning tasks we might want to do at home. These are both reasonably easy to get going with ROCm.

The other popular machine learning project is the Whisper speech-to-text engine. There is a webui for this, but it doesn’t seem to make it simple to get going with a Radeon GPU. Even so, I am not certain that a webui would be the right place to use Whisper.

Whisper feels like it needs to be built into something else. I want it to transcribe my video footage and turn the text into subtitles. I want to automatically transcribe any audio files that land in a particular directory. I don’t want to be doing any of this manually.

DaVinci Resolve Studio has a fantastic speech-to-text workflow. You can delete words from the transcription, and Resolve will cut it right out of the video timeline. How cool is that?!

I very nearly had to delete this entire blog post!

The version 1.8 release of text-generation-webui showed up in my feeds right in the middle of writing the previous section. I did the thing that any competent professional writer might do, and I upgraded to the latest release!

My GPU acceleration immediately stopped working. That took my performance down from between about 12 to 25 tokens per second to an abysmal 2 to 5 tokens per second.

Stable Diffusion man with GPU

Someone already filed a bug report. I decided to put this blog on hold, and I figured I could check back in a few days. The stickied AMD bug thread had a fix that worked. I had to edit the requirements_amd.txt file to replace one of the packages with an older version.

There were two lines with two slightly different versions. I assume that they weren’t supposed to be there, so I deleted both before pasting in the URL from the comment.

Llama 3.1 and Gemma 2 on an AMD GPU with Oobabooga

All the recent releases of oobabooga ship with broken support for llama.cpp when using AMD’s ROCm.

I forged ahead and installed the latest version anyway. I wound up getting GPTQ versions of Llama 3.1 8B and Gemma 2 9B running using the ExLlamav2_HF loader. They both seem to run at comparable speeds to the Llama 3 and InternLM GGUF models I was using before, so that is exciting!

I was a bit bummed out because not having a working llama.cpp meant that I couldn’t use any of the GGUF files I have been running. The new models are better than what I was using, but I didn’t want to miss out on using CodeQwen 1.5 7B.

I wound up editing the requirements-amd.txt file once again, and I downgraded the llama.cpp packages just like I did before. That means I can run all my old GGUF files exactly as I was, and I can now also run the newer models via ExLlamav2_HF. That’ll do!

Conclusion

I was eager to write this blog. It was exciting to no longer feel like a second-class citizen in the world of machine learning with my budget-friendly AMD Radeon GPU. Then I found out that no one attempted to run the text-generation-webui with a ROCm GPU in the two days between that dependency being updated and the release of version 1.8, and my citizenship level was once again demoted.

Is that the end of the world? Definitely not. Finding and applying a fix wasn’t a challenge, but even so, everything would have just worked if I had bought an Nvidia GPU, and everything would have just worked for the entirety of this past year. My 6700 XT is comparable in gaming performance and price to an RTX 4060 Ti, but I have 50% more VRAM. In theory, I should have as much or more machine-learning performance as well, except that there is so much less optimization work happening outside of the world of Nvidia’s CUDA.

What do you think? Are you running automatic1111’s stable-diffusion-webui or OobaBooga’s text-generation-webui on an AMD GPU? How have things been working out for you? Do you think it is worth the extra effort and problems to be able to own a Radeon GPU with 24 GB of VRAM for less than half the price of an Nvidia RTX 4090? Let me know in the comments, or stop by the Butter, What?! Discord server to chat with me about it!

Do All Mini PCs For Your Homelab Have The Same Bang For The Buck?

| Comments

I am writing way too many blogs about mini PCs lately, so I definitely wasn’t planning on publishing this. As I was staring at the graph, I just thought that I had to write about what I was seeing.

The simplified answer to the question in the title is yes. There are outliers that are a better deal. All the common mini PC models from the lowly $129 N5095 up to one with an 8845HS for $649 has similar enough value per unit of CPU horsepower or unit of RAM, though they do drift towards being a little more expensive as you climb the ladder. Even so, you do get other upgrades along the way like more NVMe slots, faster network interfaces, and better integrated GPUs.

Trigkey N100 mini PC and Trudy the cat

There is some variation. The math doesn’t work out perfectly, and the models don’t wind up being fractions that divide evenly with each other. You can’t literally say that one Ryzen 5560U Beelink box is exactly equivalent to two Celeron N100 Trigkey boxes, but it is quite close.

I am not even going to try to sneak an extra subheading in here before giving you the table.

CPU RAM Price Geekbench Geekbench
per $100
RAM per $100
n5095 16 GB $129.00 2100 1628 * 12.40 GB
n5095 32 GB $187.00 2100 1123 17.11 GB *
n100 1gbe 16 GB $134.00 2853 2129 * 11.94 GB
n100 1gbe 32 GB $192.00 2853 1486 16.67 GB *
n100 2.5gbe 16 GB $239.00 2786 1166 * 6.69 GB
n100 2.5gbe 32 GB $329.00 2786 847 9.73 GB
n100 2.5gbe 48 GB $380.00 2786 847 12.63 GB *
5560u 16 GB $206.00 6200 3010 * 7.77 GB
5560u 64 GB $321.00 6200 1931 19.94 GB **
5700u 16 GB $233.00 7200 3090 ** 6.87 GB
5700u 64 GB $348.00 7200 2069 18.39 GB *
5800u 16 GB $239.00 7900 3305 14.94 GB
6900hx 16 GB $439.00 10200 2323 * 3.64 GB
6900hx 64 GB $614.00 10200 1661 10.42 GB
6900hx 96 GB $721.00 10200 1415 13.31 GB *
7840hs 32 GB $550.00 12000 2182 * 5.82 GB
7840hs 64 GB $725.00 12000 1655 8.83 GB
7840hs 96 GB $832.00 12000 1442 11.54 GB *
8745h 32 GB $442.00 12000 2715 * 7.24 GB
8745h 64 GB $617.00 12000 1945 10.37 GB
8745h 96 GB* $724.00 12000 1657 13.26 GB *
8845hs 32 GB $649.00 12000 1849 * 4.93 GB
8845hs 64 GB $824.00 12000 1456 7.77 GB
8845hs 96 GB* $931.00 12000 1289 10.31 GB *

NOTE: Heavily rounded Geekbench 5 scores are guesstimated from searching Geekbench’s database. Precise numbers are from my own personal tests.

UPDATE: I am sneaking in this Trycoo HA-2 5800U mini PC, because it is priced close to a Trigkey or Beelink 5700U while adding a 2.5-gigabit Ethernet port. I don’t know the brand, but that sure sounds like a good deal!

UPDATE: There are two mini PCs with the much more recent Ryzen 8745H and 8745HS showing up in my deal feeds. The Minisforum UM870 and the Beelink SER8 are both starting to show up with 32 GB of RAM for less than $450. That price moves them way up near the front of the pack in bang-for-the-buck.

These mini PCs are discounted almost every single week, so the prices listed are the lowest usual sale prices that I have seen. Don’t buy at full price, but you may have to wait a while to see a sale as low as what I have listed in the table!

The lines with an asterisk next to the RAM are priced with the biggest RAM kit that they be capable of using. You can’t buy most of these without preinstalled RAM, so you will have a spare stick or two of RAM if you upgrade your own mini PC.

I have been keeping this data in a messy Google Sheet. If you want to be able to sort or do your own math, you should definitely to check it out!

I was surprised to see that 48 GB DDR5 SO-DIMMs are a good deal!

First of all, you can still save more than a few dollars on RAM if you choose an older mini PC that uses DDR4 RAM. DDR4 is still abundant, so the prices are still slowly going down.

You can only get DDR4 SO-DIMMs in 8 GB, 16 GB, or 32 GB sizes, but DDR5 SO-DIMMs are also available in 12 GB, 24 GB, and 48 GB sizes! That means you can squeeze 48 GB of RAM in a mini PC with only a single SO-DIMM slot or 96 GB in a mini PC with two SO-DIMM slots. You will have to pay a small premium per gigabyte compared to 32 GB DDR5 SO-DIMMs.

Old sticks of DDR4 2666

My brain immediately said, “That can’t be a good value!” but I was wrong. If your goal is to pack as much RAM into your homelab cluster as possible, then it is very much worth paying that premium.

NOTE: I haven’t personally tested 48 GB SO-DIMMs in any of these mini PCs. We have friends on our Discord server who have used 48 GB DDR5 SO-DIMMs with the Topton N100 motherboard and N100 mini PCs. Your mileage may vary, but I expect these will work in the higher end Ryzen mini PCs as well. To play it safe, I would make sure you buy your RAM from a store with a good return policy!

There are some outliers on that table that just don’t fit in!

It isn’t too surprising that both the oldest and newest CPUs on the list are the worst value. The N5095 is not only the weakest CPU in the table, but it has the worst performance per dollar. Even so, it is one of the best deals around if you want lots of RAM in your homelab!

The Ryzen 8845HS and 7840HS are very nearly the exact same piece of hardware. The newer CPU has a stronger NPU, but they are identical in every other way. You’re paying a premium for the absolute newest CPU here, and you are paying a premium with either for DDR5 RAM.

Both models of the Beelink SER5 lean in the opposite direction. Their Ryzen 5560U and 5700U processors are getting pretty long in the tooth, so they are priced really well for their performance capability. Not only that, but they require DDR4 SO-DIMMs, so they also have the advantage of using less costly RAM.

Sometimes you get six of one, half a dozen of the other

Half of the table has RAM prices in the 6, 9, or 12 gigabytes of RAM per $100 range. Each mini PC starts out expensive with the stock RAM, and the pricing gets better as you upgrade. The value for RAM gets slightly worse the higher you go up the list, but Geekbench scores also tend to go up.

The big, messy spreadsheet has a column with the Geekbench score per gigabyte of RAM. That might be a good number to look at if you are working hard to minmax your mini PC cluster, but I can save you some time. The winners here again are the Beelink SER5 with a Ryzen 5560U or Ryzen 5700U.

The Ryzen 5700U being the value champ seems to be the theme of this blog, but I think it is important to note that almost everywhere on this list you are trading RAM for CPU chooch.

Prices go up at the high end, but you do get something for your money!

I like simplifying everything down to RAM or benchmark scores per dollar. I want my virtual machines to only need to be assigned RAM and CPU to go there job, and I don’t want them to worry about where they are running. I prefer to avoid passing in PCIe or USB devices unless I absolutely have to. Sometimes you need more than just RAM and CPU to get things done.

Stable Diffusion Mini PC Guy 1

The Beelink SER6 MAX starts to drift into the more expensive territory if you are simply breaking it down by RAM per dollar, but it adds some useful upgrades. It has a 2.5 gigabit Ethernet port, a pair of m.2 slots for NVMe storage, and a 40-gigabit USB4 port. I don’t currently need this, but these upgrades definitely add value, and these could be extremely useful for someone.

The Beelink SER7 and SER8 both have the 2.5 gigabit Ethernet port, and they both have dual 40-gigabit USB4 ports, but they lack the second NVMe slot.

On the lower end, there are mini PCs like the CWWK N100 mini PC router in my own homelab. It has four 2.5 gigabit Ethernet ports and FIVE NVMe slots. It costs more than a Beelink or Trigkey mini PC with the same CPU, but if you need more than one network port or NVMe, then it might be worth the extra cost.

As far as I am concerned, RAM is the king of virtualization

If you’re going to run Proxmox in your homelab, and you don’t know what to buy, then you should put RAM at the top of your list.

You can over provision disk images. Things will work. Things will run. Things may run forever, but you may run out of disk space some day. That is something that can be fixed.

You can over provision your CPU, and you definitely should! The odds of all your virtual machines needing to do hard work at the same time ought to be low. Even if they aren’t, the worst thing that happens if you don’t have enough CPU is that your tasks will run slower. The important thing is that they will run even if they run slowly.

Pat's Proxmox Machines

From a purely technical standpoint, you can over provision RAM. There are mechanisms that let your machines release RAM to the host to be used by other machines when they need it, but if you push this too far, processes or virtual machines will start to be killed.

Memory is a resource that even idle virtual machines need. A gigantic PostgreSQL eats up a lot of RAM whether you are asking it run queries or not. Your dozens of Nginx processes are waiting for hours, days, or weeks for a connection may not be using any CPU, but they are sitting somewhere on a stick of RAM ready to respond.

Unallocated RAM on a Proxmox server is just RAM waiting to be allocated to a new virtual machine tomorrow. It is handy to have some spare!

How big of a mini PC do I need?

This blog is making me feel like a liar, because a few weeks ago I wrote an entire blog saying that it was cheaper to buy a second N100 mini PC than to upgrade the RAM in my existing N100 mini PC. For my own purposes, it was a better value, but I am beginning to think I didn’t do the math correctly with a 48 GB DDR5 SO-DIMM!

The variety and combination of mini PCs you buy for your homelab will depend on your needs. My whole world fits on a single Intel N100 with 16 GB of RAM, and adding a second similar machine was a luxury. Adding a second server has also been fun. That’s a good enough excuse to do pretty much anything!

My biggest virtual machine is using 3 GB of RAM, and I could tighten that up quite a bit if I had to. Most of my virtual machines and LXC containers are using 1 GB or less. These are small enough that I could easily juggle them around on a Proxmox cluster where each physical server only has 8 GB of RAM.

Stable Diffusion Network Control Center

Your setup might be different. You might have a handful of Windows virtual machine that really need at least 8 GB of RAM to spread out into. If that’s the case, you’d have trouble with 16 GB mini PCs like I am using. There’s only 14 or 15 GB of RAM left over once you boot Proxmox, so you might only fit one of these bigger virtual machines on each mini PC. That’d make playing musical chairs with virtual machines a challenge, so you might be better off with mini PCs with at least 32 GB of RAM.

There’s a balance to be had here. I keep saying that I like the ratio of CPU horsepower to RAM in an N100 mini PC with 16 GB of RAM or a Ryzen 5560U mini PC with 32 GB of RAM. I am comfortable with that ratio without knowing anything at all about what you’re doing with your homelab. It just feels like a good middle ground.

My own homelab wouldn’t be short on CPU even with 48 GB in the N100 or 64 GB in the 5560U. We all have different needs.

One big mini PC, or lots of small ones?!

One big server is easier to manage, and you won’t have to worry about splitting up your large virtual machines while leaving room to fit your smaller virtual machines in between the cracks.

These mini PCs are fantastic because they are so power efficient. The Intel N5095 and N100 mini PCs DO idle more than a few watts lower than all the Ryzen mini PCs on this list, but the difference isn’t all that dramatic. My pair of Intel N100 mini PCs burn at least as much electricity as a single Beelink SER5, and the most expensive mini PCs with Ryzen 8845HS CPUs are probably comparable to the Beelink SER5. That means consolidating down to one beefy mini PC will almost definitely save you money on your electric bill.

Splitting things up provides redundancy. I am not doing anything this fancy, but you can set up a high-availability Proxmox cluster with three or four N100 mini PCs. If one fails, those virtual machines will magically pick up where they left off on one of the remaining mini PCs.

That is pretty cool, but even if you don’t set that up, it is nice knowing that all your services won’t go down due to a single hardware failure, and you could always restore the unavailable virtual machines from backup and run them on the remaining mini PCs. That is assuming you have enough free RAM!

That brings us to another neat feature of running lower-spec mini PCs. When your cluster gets low on RAM or CPU, you can buy another inexpensive mini PC. This works even if you start with only one tiny mini PC.

I prefer the idea of multiple smaller mini PCs

I don’t think there are any wrong choices to make here, but here’s what I have been thinking about for my own setup.

There are plenty of uses outside of our homelabs for these mini PCs. My Trigkey N100 mini PC that I bought for my homelab hasn’t made it into my homelab yet. I have had the Trigkey N100 plugged into the living room TV playing Steam, NES, SNES, and Nintendo Wii games for the last few weeks.

You can plug these into an old monitor and have an extra PC at your workbench or CNC machine, and most of them have brackets that let you hang them from your monitor’s VESA mount. Almost anything you might do with a Raspberry Pi can be done with an Intel N5095 or N100 mini PC.

Stable Diffusion Homelab

If you buy one big mini PC that can meet all your homelab needs, then it is going to be stuck in your homelab for a very long time.

If you have a variety of less powerful mini PCs, you can decide to upgrade and retire one early to reuse that slightly outdated mini PC for something else. You could send it to a friend’s house to use as an off-site Proxmox server and backup host. You could retire it to the living room to stream your Steam library. You could set it up to control your 3D-printer farm.

I expect to get better overall use out of three $150 mini PCs than I would out of a single $450 mini PC, and it is nice knowing that I can upgrade my homelab one slowly instead of having to replace nearly everything all at once.

Your stack of mini PCs doesn’t have to be identical

I fall into this trap a lot when I stare at the spreadsheet. I want to say things like, “Three N100 mini PCs are equivalent to one 6900HX mini PC,” but I shouldn’t be thinking that way. The obvious problem is that the various mini PCs don’t quite divide evenly into each other, but that isn’t the only thing wrong with this line of thinking.

Maybe you have a single virtual machine that needs 16 GB of RAM, then a while mess of other machines that only need 1 or 2 GB. That seems like a good excuse to build your homelab out of one Ryzen 5700U mini PC and one or two Intel N100 mini PCs.

You can mix and match whatever makes sense for you. Heck! Your homelab doesn’t even have to be 100% mini PCs!

We didn’t talk about iGPUs!

From a gaming perspective, the table at the top of this blog goes from worst to best. The Ryzen 7840U and 8845U have ridiculously fast integrated GPUs! They are a generation ahead of the Steam Deck, and they have even more GPU compute units. They can run a lot of games quite well, and they can even manage to run many of the most demanding games as long as you turn the settings down.

A common use case for the iGPU in your homelab would be video transcoding for Plex or Jellyfin. There isn’t nearly as big of a gap in transcoding performance as there is in gaming performance between these boxes. They only set aside a small corner of these GPUs for the video encoding and decoding hardware, and they are all really only aiming to make sure you have enough horsepower to encode and decode 4K video in real time.

Trigkey N100 Mini PC at my desk

My Intel N100 mini PC can transcode 2160p HDR video with Jellyfin at 77 frames per second. That is enough to keep three movies playing on three televisions. I don’t know where my upper limit is on the N100. Transcoding two movies at the same time utilizes more of the GPU and each stream maintains better more half of that 77 frames per second. I wouldn’t be surprised if there is enough room for me to feed five televisions with transcoded 10-bit 2160p video.

Even better, my Jellyfin server rarely has to transcode because the majority of my devices can play back the raw video streams straight out of the video files.

If transcoding is your goal, every mini PC on the list is up to the task. I wouldn’t be surprised if the performance goes up a bit towards the higher end of the list, but I bet it doesn’t go up all that much.

Conclusion

I guess the real conclusion is that the best bang for your buck is easily the Beelink SER5 with the Ryzen 5560U or with the N5095 and N100 boxes coming in a close second, but the important thing to remember is that these aren’t miles ahead on value. You can spend a little more on beefier mini PCs without paying too much of a premium.

Sometimes that premium even buys you something you can us. The Beelink SER6 MAX has a 2.5 gigabit Ethernet interfaces and room for two NVMe drives, and if that isn’t enough for you, the CWWK N100 mini PC in my homelab has four 2.5 gigabit Ethernet ports and room for FIVE NVMe drives. There are so many options available in small packages to piece together a nifty homelab cluster for yourself that might fit in a lunchbox or shoebox.

What do you think? Are you building out a full-size 42U server rack at home? Or are you trying to build something quieter and more efficient? Are you already running a short stack of mini PCs, or are you looking to build up a diminutive server farm at home? What’s the most unexpected use case you’ve found for a mini PC? Tell us about it in the comments, or stop by the Butter, What?! Discord server to chat with me about it!

https://docs.google.com/spreadsheets/d/1cHuUnASp_mdcFfLnlvMJdhTCfzGfXC9ehUjTs9pWCik/edit?usp=sharing “Pat’s big messy mini PC homelab spreadsheet”

Microsoft Copilot+ Recall AI Looks Awesome!

| Comments

First of all, I don’t trust Microsoft to do a good job keeping our data safe. My recollection of history makes me feel that Microsoft’s execution regarding security often starts off poor, and there would potential for horribly incriminating or financially costly information to accidentally leak will be immense.

Stable Diffusion Kid at a terminal

There are a lot of posts on social media and news stories that are implying that the sky is falling. These posts all remind me of when people starting syncing everything to the cloud, and similar to the arguments that were made when people started posting everything to social media.

I don’t think this is fear mongering. I don’t think these posts are wrong. There is absolutely potential here for both new types of cyber crime and user error, and I can understand being fearful that your employee’s entire time spent on their computer could be subpoenaed. It is even more terrifying that this could also enable employers to scrub years of work of each employee looking for excuses to fire them.

Not only that, but I think a lot of the fears about storing your data in the cloud are extremely valid! I only store my important, private data in my own private cloud. Seafile works very much like Dropbox, except I own my Seafile server, and all my data is encrypted by the open-source Seafile client before being transmitted to my server.

I am not worried about any of this today. I don’t run Windows. I don’t use any Microsoft products.

I won’t be able to use Recall, but I am extremely excited about the concept and the future potential. This is the first time I have seen anything come out of the machine learning hype that could realistically save me significant time and help me avoid monotony.

I lost my trust in Microsoft in the early nineties when they put Stac Electronics out of business, and they left us stuck with Microsoft’s worse and much less reliable DoubleSpace in Stacker’s place.

Microsoft did a lot of shady things during the next couple of decades, and I have no idea if they could ever do enough to regain my trust.

Even though the title seems to imply otherwise, this post isn’t really about Microsoft or Microsoft’s specific new product. This is about the idea of compiling a database with all the text and images of essentially every moment of you computing life, and what that might be able to do for you.

I could save so much time researching blog posts!

You have absolutely no idea how much time I spend flipping between Emacs and a dozen browser tabs while writing a blog post. There are so many numbers involved in many of the things I write about, and I have to get those numbers correct.

I have to remember which tab has the documentation or notes. I have to skim the page to find the numbers that I need. Sometimes I have to do math.

Can you imagine if I had a local large language model (LLM) that had access the contents of all the recent web pages I visited, and all of my notes? How awesome would it be if that assistant was voice enabled?

  • “Hey, You! What is the max clock speed of an Intel N100?”
  • “How many kilowatt hours did I measure on my N100 server at idle?”
  • “Paste a Markdown table with the Geekbench scores of the N5095, N100, and Ryzen 5700U.”
  • “How long have I been running the Seafile Pi at Brian Moses’s house?”
  • “How long ago did I write about upgrading to a Radeon 6700 XT?”

Any of these functions would save me a minute or two, and that adds up fast!

This is a fantasy today, and I don’t think there are any open-source projects working on this today, but it sure feels like this is where things are headed.

This isn’t an entirely new idea

There was a Gnome project 20 years ago that used inotify to watch your emails, chat logs, and the text files you were editing to pop up related information. It might even be so old that it used dnotify!

If you got a ICQ message from your friend Brian asking you to meet up at the new pizza shop he’s been telling you about, it might have shown you a link to Brian’s recent email that talked about that pizza shop.

This was a really cool idea two decades ago. It was the very first thing that came to mind when I read the headline about Windows Recall, but it took me SO MUCH digging to find any sort of information about this project. I couldn’t even remember the name! It was Nat Friedman’s Dashboard.

Finding ways to connect instant messages to emails, and emails to text documents, and text documents to instant messages was extremely rudimentary when all the logic had to be built by hand with regular expressions.

I hope we see a new Gnome Dashboard in 2024!

I am hopeful that there will be a modern implementation of something like Gnome Dashboard. I want to collect regular screenshots. I want that information converted to text. I want it indexed, and I want to be able to query it!

I don’t need something as powerful a GPT-4o processing queries of my personal database. Give me a little LLM that fits in 4 GB or 8 GB of VRAM. It is OK if it sometimes thinks that March has thirty days. Even the best assistants goof up like this from time to time.

I don’t need my assistant to be Albert Einstein, and the quality of the small LLM will improve over time. The software will be optimized, and I will upgrade to a newer GPU with more VRAM every few years. Even a simple brain of an assistant would save me time today.

There are already some open-source projects popping up!

The first one I found is the rem project for macOS. Their first release was in 2023. Rem has a pretty good list of features that includes capturing the screen and recognizing text every 2 seconds. There is no built-in LLM capability, but there is a way to grab recent context from your screenshots to use as context with an LLM.

Stable Diffusion guy with a robot

There’s also Windrecorder for Microsoft Windows. The feature list seems less complete than rem’s, and I didn’t see a roadmap. Rem definitely has some of Recall’s features on their roadmap.

I don’t have macOS or Windows here, so I can’t try any of these. I haven’t yet found a project like this that runs on Linux, though I realize from the names and descriptions that the terminology used to describe these things is very different from Recall even though they do very similar things!

I understand that it isn’t that simple

I know that you can’t just feed a tiny local LLM with all the text I have ever written, all the text from months of screenshots of my activities, and all the text of the Wikipedia pages and other sites I have visited in that time. That is just way too much context to load into my little 12 GB Radeon 6700 XT.

All sorts of work has to be done to make the software summarize this stuff, and then summarize the summaries, and organize those summaries of summaries. Someone probably needs to figure out how to tie together old-school text search to find relevant document, then summarize those documents, then feed them into the LLM every time you ask a question.

What if someone steals your Microsoft Windows Recall database?!

No one will steal MY Recall database, because I won’t be able to run Windows Recall at all. I am running with the assumption that if there is any sort of open-source equivalent that I will have some amount of control over when it is active and what gets stored in the database.

It seems that Windows Recall makes no effort to keep usernames, passwords, and login screens out of the recording of your keystrokes and screenshots. If you log into your bank, it will most likely be in that database, and this makes the Recall database a juicy target.

I wouldn’t complain too much if I had to manually turn my AI tracker on when I start working, then turn it off when I start doing personal things. Maybe it would be interesting to have different databases for different projects and for personal stuff.

I would also bet that it would be pretty easy to tie in certain activities to the tracker. Maybe when you click on Bitwarden, your tracker would stop recording, and hopefully it would show you that in an extremely obvious way. If it detects that the processed text looks like it might be a private encryption key, your tracker could scrub that out of your database immediately.

Even just having an “oops button” that you can hit to have the last five minutes of activity expunged from the record would be nice.

I am ready to start recording every aspect of my computing life!

I have an incongruity here in my brain. I don’t like surveillance cameras. I don’t understand why so many people are pointing cameras at their backyards and living rooms, then plumbing those into Home Assistant. It feels icky. I don’t want to record myself lumbering around the house!

Yet I am excited about recording every second of my digital life. I believe this is because I can see an immediate value in being able to ask a virtual assistant to mine this data, and it will provide me with valuable time savings and possibly even improvements to the quality of my work from day one.

I will almost definitely be jumping on board with the first open-source screen-recording virtual assistant that shows up. I am excited to try something like this out to see to see how much it might improve my workflow, and I am excited to see where this technology will go as it improves!

What do you think? Are you skeeved out by Microsoft recording everything you do? Would you be willing to let an open-source AI assistant mine your screen for nuggets of useful data? Are you as excited about the prospects as I am? Let me know in the comments, or stop by the Butter, What?! Discord server to chat with me about it!