I Finally Repaired my Baratza Preciso Coffee Grinder

| Comments

I am trying to figure out when I shelved my Baratza Preciso. I suspect that it has been more than a couple of years! It stopped grinding fine enough to pull a slow enough shot of espresso from my Rancilio Silvia, so I started using my wife’s Baratza Encore and ordered some replacement parts.

Baratza Preciso and Baratza Encore

If my memory is correct, I started working on the repairs and then the big plastic drive gear gave up, so I also gave up. I have been limping along making lattes using a grinder that isn’t up to the task ever since. At least until last week!

These words won’t help much if you are shopping for a coffee grinder!

I am quite pleased with how my Baratza Preciso has held up. I have had it for eight years, and I am still using it today. It did need some inexpensive repairs over the years, but it is chugging along.

Baratza no longer makes the Preciso. It has been replaced by one of the grinders in the Baratza Sette lineup, but I am not sure which one.

I can’t speak for the Sette, but I can say that Baratza has amazing support, and they offer so many replacement parts that you could probably assemble an entire grinder from spare parts. If you buy a Sette or an Encore today, I have confidence that you can repair it in 10 years.

tl;dr: The reason I just had to write this!

The Baratza Preciso is SO FAST! It takes the Encore well over a minute to grind 18 grams of coffee for an espresso. The Preciso seems to grind that much in less than twenty seconds. I have watched the Encore with a stopwatch, but I only counted out the Preciso to 15 hippopotamuses, and I tend to count out time a bit more slowly than it actually passes.

I forgot how slowly the Encore grinds. I wrote a blog many years ago about how it takes less than six minutes for me to make a latte. I often head to the kitchen before sitting down at my desk to record a podcast. I look at the clock, and it says I have way more than six minutes available! This should be no problem!

Then, by the time I am finished doing my barista duties, I am almost late! Did I time myself incorrectly in 2013? Nope. I just didn’t realize how much time the Encore was adding to my routine.

This might be worth keeping in mind when you’re shopping for a grinder. I make two lattes every day, so I am probably saving three minutes a day. That’s 21 minutes each week, 90 minutes each month, or over 1,000 minutes each year.

I tend to make a latte either right before or in the middle of a task. Today I sat down, created an empty blog from a template, and wrote the opening paragraphs that you just saw. Then I got up, wandered to the kitchen, and made a delicious latte with Yemen Mokha Matari from Sweet Maria’s.

One might consider wasting a few extra minutes taking a break to be valuable. Others might find it more important that they’re saving 21 minutes each week when making coffee for their friends.

The problem with the Baratza Encore

There’s more than one problem. We are just going to ignore that the Encore probably produces an inconsistent grind for espresso. That is for sure a problem, but it is minor compared to the real problem.

What do you do when a shot pulls too slowly on setting 3 and two fast on setting 4? There’s no option in between, so all you can do is adjust the dose. If you adjust the dose, then you also need to adjust the volume of extraction. Adjusting two things at once makes tuning more difficult.

The Baratza Encore’s grind is more of a problem than I thought!

I wrote this entire blog two months ago, but forgot to put the finishing touches on it so I could publish it. Two months of being back to using the Preciso again have shown me another problem with the Encore. The cheaper grinder produces so many more fines!

I try to pop my shower screen off once a month for cleaning. This time the screen didn’t look any worse for the wear. Hardly any coffee grounds are sneaking past the tiny holes in the screen now that I am using the Preciso again. It was always so much dirtier with the Encore.

This surprises me a bit. Both grinders are obviously related, and they use the exact same upper burr. The Encore has a slower, weaker motor, and the lower burr is different.

I assumed the difference in the lower burr would just be the number of blades or how aggressive the angles on those blades might be. This is probably correct, but I didn’t realize that this could make a difference in the quality of the grind!

What happened to Pat’s Baratza Preciso?

You are going to have to forgive me. I don’t even know for sure what the order of problems and solutions has been, but I will do my best to give you a timeline.

I know for sure that I got to a point where I was grinding at the Preciso’s absolute finest setting, and shots were pulling in less than 10 seconds. It was awful!

I ordered replacement parts. I have several of the replacement plastic adjustment doodads and the plastic burr holders on hand now. There’s a tiny screw you can use to fine-tune the adjustment ring. I pushed that fine-tuning screw as far as it could go, and I was still getting 10-second shots of espresso.

At that point, I even tried to cut a shim out of a business card. I put that shim between the burr and the plastic retaining ring in an attempt to get the burrs closer together. I’m pretty sure this helped a bit, an I am almost positive that I brewed espresso for a few months like this. I bet this is what caused me to strip the drive gear and finished completely chewing up the upper burr.

The Baratza grinders have a lot of plastic parts

And I have managed to break every single one of them. This was the second time I’d stripped the drive gear, and I happened to order two the first time it happened. I don’t think I knew this at the time. I just threw my hands up in the air, put the Preciso out in the garage, and limped along with the Encore.

The plastic parts are probably a good thing. The idea is that you’re supposed to blow out a fragile $3 plastic part if there’s ever a rock in your bag of coffee. That’s better than destroying an expensive burr.

With the price of the burrs for the Preciso, it might be better if everything were overbuilt and I had to replace a $16 burr every few years. Your mileage may vary.

I wasn’t sure if I should order the replacement burr

There are two burrs in the Preciso. The upper burr is $16 and is extremely easy to replace. The lower burr is $45, and it looks like it is challenging to get it unscrewed from the drive assembly.

I could tell that the upper burr was damaged, and it looked like the lower burr was fine. I am assuming that the bigger burr on the bottom doesn’t do much cutting. It probably pushed the beans into the upper burr as it spins.

I spent about two hours when I replaced the drive gear. Replacing the gear wasn’t too difficult, but when the machine was still grinding too coarse, I wound up taking it apart two more times to move shims around and try adjusting things.

When that didn’t work out, I had wished I spent those two hours working toward getting our LumenPNP pick-and-place machine up and running. When the pick and place works, we can start selling OoberLights boards. When we sell OoberLights boards, there will be cash flow. I could use that cash flow to buy a grinder upgrade.

I do want a grinder upgrade. I’ve had my eye on the flat-burred Turin DF64 grinder for a while. It would be a really nice upgrade!

I didn’t know if I should repair the Preciso. The burr was about $25 after shipping and tax. That’s 5% of the cost of my next grinder. Not only that, but I suspected I would have to spend an hour taking the Preciso apart again. Maybe it would be better to put that $25 toward the price of a grinder upgrade and not waste an hour of my time getting angry at the old grinder.

I spent the $25. I did take the Preciso apart again to undo my adjustments to make sure the new burrs would never touch each other. It wound up being a good choice.

I am drinking a latte right now. I did not quite hit the right grind today. With 18 grams in with the Preciso set to 4F, my light-roast Yemen gave me 31 grams out in 45 seconds on my Rancilio Silvia.

This isn’t far off from my ideal shot for a latte. I could easily write 2,000 words about why I aim for a longer pull with a little more than a ratio of a little more than 1.5, but that would be drifting way off topic.

The important thing to note is that I can still go three entire clicks finer on an extremely light bean. I am calling this a successful repair.

UPDATE: A slightly darker roast Ethiopian coffee was able to completely choke the machine with the grind set to 4F! I had to bump it up to about half-way past the 6 setting to get a good pull. I think this means I have done a good job!

The thrilling conclusion?!

I don’t know if the conclusion is thrilling, but I am excited to have my Baratza Preciso working again, and I am quite happy that I didn’t have to spend $400 or more on a grinder upgrade this year. I expect that I will get at least a few more years of service out of the Preciso before I need to upgrade.

What do you think? Should I have junked the Preciso and splurged on something like a Turin DF64? There’s a good chance that next grinder will outlast me. When I eventually upgrade, will I be kicking myself for not doing it a few years sooner? Will my coffee taste that much better?!

Let me know in the comments, or stop by the Butter, What?! Discord server to chat with me about it!

Can You Save Money By Changing the CPU Frequency Governor on Your Servers?

| Comments

I am sure there will be somewhere around 2,000 words by the time I get done saying everything I want to say, but I most definitely will not make you read them all to learn the answer to this question.

The answer is yes. At least on my own aging homelab server, in my own home, with my particular workload. I am probably going to be saying just a hair under $10 per year by switching to the conservative CPU governor with some minor tweaks to the governor settings.

You don’t even have to wait to see my extra tweaks. Here’s the script I run at boot to switch to the conservative governor and tweak its settings:

1
2
3
4
5
6
7
8
#! /bin/bash

/usr/bin/cpufreq-set -g conservative

echo 40 > /sys/devices/system/cpu/cpufreq/conservative/down_threshold
echo 1 > /sys/devices/system/cpu/cpufreq/conservative/sampling_down_factor
echo 150000 > /sys/devices/system/cpu/cpufreq/conservative/sampling_rate
echo 85 > /sys/devices/system/cpu/cpufreq/conservative/up_threshold

My motivation

I moved my homelab server out of my office to the opposite side of the house. It now lives on a temporary table underneath my network cupboard. This network cupboard used to belong to my friend Brian Moses, but it is mine now. Should I write up what I’ve done with it since acquiring the house?!

I had to unplug the server before moving it and its UPS across the house, so I figured I could plug it into a Cloudfree outlet and monitor its power usage in Home Assistant. Once that happened, I couldn’t help but monitor power usage with various max clock frequencies, and during benchmarks, and testing all sorts of other things.

Power is heat, and heat is the enemy here in Texas

I am quite a few years late for this change to have a significant impact on my comfort. In our old apartment, my home office was on the second floor on the south side of the building, and that room had particularly poor airflow from the HVAC system. Heat rises, the sun shines from the south, and you need airflow to stay cool.

I know we didn’t get to the numbers yet, but my changes may have dropped my heat generation by nearly 60 BTUs. That would have made a noticeable impact to the temperature of my old office in July and August.

My new office at our house has fantastic airflow. The only time I get warm is when I close the door and turn off the air conditioning to keep the noise down while recording a podcast.

That was the real motivation for moving the homelab server out of the room. Sure, that got 300 unnecessary BTUs out of here, but the important thing is that there are now four fewer hard drives and nearly as many fans spinning away near my microphone.

Here in Plano, TX, we wind up running our air conditioning eight or nine months of the year. I wouldn’t be surprised if we would spend $5 per year to cool the heat that would have been generated by that extra $10 of electricity.

The specs of my homelab server

I wrote a blog post in 2017 about upgrading my homelab server to a Ryzen 1600. I almost made a similar upgrade to my desktop the next year, but instead I decided to save some cash and I just swapped motherboards. My desktop machine is a Ryzen 1600 now, and my homelab is an old AMD FX-8350. Here are the specs:

  • AMD FX-8350 at 4.0 GHz
  • 32 GB DDR3 RAM
  • Nvidia GT710 GPU
  • 2x 240 GB Samsung EVO 850 drives
  • 4x 4 TB 7200 RPM drives in RAID 10

When the FX-8350 was in my desktop machine, I had it overclocked to 4.8 GHz. I don’t think I have the exact numbers written down anywhere, but I recall that squeezing the last 300 MHz out of the chip would use an extra 90 or 100 watts on the Kill-A-Watt meter. The first thing I did on the homelab was turn the clock down to 4 GHz in the BIOS. I think it is supposed to be able to boost to 4.2 GHz when only two cores are active, but I had boost disabled when I was overclocking, and it is still disabled today.

This is what I know from measuring power over the last few weeks and scouring old blog posts for power data. My power-hungry FX-8350 machine never goes below 76 watts at the smart outlet. Old blog posts suggest that about 20 watts of that is going to the four hard disks, and up to another 19 watts could be consumed by the overly complicated GPU.

I am not done collecting data. I will clean up this table when I am finished. In the mean-time, though, here are all the things I know so far:

1
2
3
4
5
6
7
8
   geekbench             kwh per day         tailscale
    X8  X1         ondemand   conservative     mbps
                              stock  custom
4.0ghz 222 117 watts   2.2        2.03*    1.97    608
3.6ghz 184 106 watts   2.1        2.0*             533
2.8ghz 145  93 watts   2.04                        472
2.0ghz 118  87 watts   1.99                        377
1.4ghz  97  84 watts   1.96                        260

All the power numbers are measured at the Cloudfree smart plug.

I am not really saving $10 per year

I have had the clock speed of my FX-8350 capped at 2.0 GHz even since I removed the Infiniband cards from my network. You can probably see from the chart that this is only 0.02 kWh per day, and I have learned that 0.02 kWh per day for a year only works out to about $0.50.

I was already saving $9.50 per year by capping the CPU speed at the absolute minimum, but I was also slowing everything down. Switching to the conservative governor and making a few tweaks both made my homelab server faster and saved me the next $0.50. I think that is a nice win!

The motivation for wanting a faster homelab server

My personal network and computing environment is heavily reliant on Tailscale. Tailscale is a mesh VPN that effectively makes my computers seem like they’re all on the same local network no matter where each machine is located in the world. I have been trying to leverage the security aspect of this more and more as time goes on, and one of the things I have been doing is locking down my network services so they are only available on my Tailnet.

Homelab Power Utilization

NOTE: The first graph is misleading, because pixels are so wide! The graph always shows both the highest and lowest reading during that time period, but it can’t show you just how little time the server spent at the peak.

I have almost entirely eliminated my reliance on my NAS, but every once in a while I need to move some data around the network. As you can probably see in my charts, Tailscale tops out at around 350 megabits per second when I limit the server to 2.0 GHz. It is capable of going twice as fast as this, and even though that isn’t saturating my gigabit Ethernet port, it is still faster!

My testing methodology

My Cloudfree smart outlet runs the open-source Tasmota firmware. Tasmota keeps track of the previous day’s total power usage. I don’t know if you can set the time of day when this resets, but my outlets all cross over to the next day at 5:00 p.m. This is a handy time of day for checking results and setting new values for the next run.

All that data is stored in Home Assistant, so I can always go back and verify my numbers.

All of the most important tests were run for a full 24 hours. Some of the numbers in the middle are probably lazy. If I didn’t get a chance to adjust the governor until 6:00 p.m., I figured that extra hour at the previous setting wouldn’t skew the data significantly.

I would always wait for a full day when switching between the extreme ends of the scale.

NOTE: You probably shouldn’t buy an old-school Kill-A-Watt meter today. Lots of smart outlets have power meters, and you can set those up to log your data for you, and you can even check on them remotely. They also cost less than a Kill-A-Watt. The Cloudfree plugs that I use are only $12 and ship with open-source firmware.

My goals when tweaking the conservative CPU governor

I wanted to make it difficult for the CPU to sneak past the minimum clock speed. If something was really going to need CPU for a long time, I most definitely wanted the CPU to push its clock speed up.

I don’t have a good definition for what constitutes a long time. I figured that if I am going to scp a smaller volume of data around, that I don’t really care if the task runs for 10 seconds at 1.4 GHz instead of 5 seconds at 4.0 GHz.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
pat@zaphod:~$ iperf -c nas;iperf -c nas;iperf -c nas;iperf -c nas;iperf -c nas
------------------------------------------------------------
Client connecting to nas, TCP port 5001
TCP window size: 67.5 KByte (default)
------------------------------------------------------------
[  1] local 100.88.23.40 port 39508 connected with 100.75.238.10 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0719 sec   452 MBytes   377 Mbits/sec
------------------------------------------------------------
Client connecting to nas, TCP port 5001
TCP window size: 67.5 KByte (default)
------------------------------------------------------------
[  1] local 100.88.23.40 port 59728 connected with 100.75.238.10 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0579 sec   657 MBytes   548 Mbits/sec
------------------------------------------------------------
Client connecting to nas, TCP port 5001
TCP window size: 67.5 KByte (default)
------------------------------------------------------------
[  1] local 100.88.23.40 port 56344 connected with 100.75.238.10 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0374 sec   719 MBytes   600 Mbits/sec
------------------------------------------------------------
Client connecting to nas, TCP port 5001
TCP window size: 67.5 KByte (default)
------------------------------------------------------------
[  1] local 100.88.23.40 port 35472 connected with 100.75.238.10 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0465 sec   734 MBytes   613 Mbits/sec
------------------------------------------------------------
Client connecting to nas, TCP port 5001
TCP window size: 67.5 KByte (default)
------------------------------------------------------------
[  1] local 100.88.23.40 port 58154 connected with 100.75.238.10 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0403 sec   716 MBytes   598 Mbits/sec
pat@zaphod:~$ 

I will probably start to care if a 20-minute file copy takes 40 minutes.

I wasn’t able to plot a precise point on that continuum. I am able to slow down the clock speed increase by raising the sampling_rate, but if I push it too far, my iperf tests were never able to push past 2.8 GHz.

My tweaks to the conservative governor settings

I raised both the down_threshold and up_threshold above the defaults. I figured this would add some friction on the way up while making the trip back down to 1.4 GHz a little faster.

I bumped the sampling_rate from the default of 4,000 microseconds to 140,000 microseconds. In my tests, anything at 180,000 microseconds or higher wouldn’t let the CPU reach full speed. I may try lowering this value, but it takes 24 hours to verify the results.

Every time I made a change to the conservative governor, I would run three consecutive iperf tests. Why did I run three 10-second tests instead of a single 30-second test?

It seemed like the small delay when reconnecting between the tests would allow the CPU to clock down a notch or two. That seemed like a helpful simulation of what the real world might be like.

I didn’t use a stopwatch. I didn’t set up a script to watch the clock speed to let me know when we were reaching the maximum. I just ran cpufreq-aperf and counted hippopotamuses before seeing 4,000,000 in the output. I guess it helps that cpufreq-aperf updates once every second!

I didn’t really need an exact number, but it was easy to see how quickly the CPU was ramping up. I think I wound up at a point where the CPU bumps up one frequency notch faster than once every two seconds, but slower than once per second.

That means my iperf test reaches full speed in about five seconds. It also doesn’t dip more than one notch between runs.

I think that is pretty reasonable. I can reach or at least approach 600 megabits per second via Tailscale in less than 10 seconds while only using 0.01 kWh more throughout the day than if I locked the machine at 1.4 gHz.

Is all this work worth $10 per year?!

The important thing is that you don’t have to do all this work. I’ve spent a few hours making tweaks and recording data to learn that just switching to the conservative governor would save me 75% as much as if I just force my lowest clock speed, and I spend even more hours tweaking the governor to claw back the next 24%.

All you have to do is spend two or three minutes switching governors or applying my changes. You don’t have to hook your server up to a power meter to see if it is actually working, but you aren’t running my ancient and power-hungry FX-8350, so your mileage will almost certainly vary.

I imagine this helps less if you already have an extremely efficient processor.

Why wouldn’t you want to make this change?

My ancient server is overkill for my needs. I run a handful of virtual machines that are sitting idle most of the day, and they could all manage to do their job just fine with the smallest mini PC.

Maybe you run servers that need to be extremely responsive. Google measures your web server response time as part of their search rankings. If your web server is locked at full speed, it might be able to push your blog up a notch in the search rankings, and that would be worth so much more than $10 per year!

Most of us aren’t going to notice things slowing down. Especially if you just switch governors instead of applying my extreme tweaks. It isn’t like your CPU is going to be in sleep states more often with the conservative governor. It can still do work while cruising at a low speed.

The best server hardware is almost always the hardware you already have

Every time my home server gear starts getting old, I start thinking about buying upgrades. This old FX-8350 box is eating up $93 in electricity every year. $73 of that is the compute side, and about $20 is the storage.

If I wait for a deal, I could swap out the old 4 TB hard drives for a 14 TB hard drive for around $200. If we ignore the fact that I need more storage and these drives are getting scary old, I can save $14 per year here. That’d pay for the new hard drive sometime in the next decade.

NOTE: We post hard drive, SSD, and NVMe deals in the #deals channel on the Butter, What?! Discord server almost every day!

When I do this math, I always assume I am going to be buying something bigger, faster, and better. A new motherboard will be $100. A new CPU will be at least $200. New RAM will be at least $150. I am assuming I can reuse my case and power supply.

Even if this magical new machine uses zero electricity, it would take six years to pay for itself in power savings. If it only uses half as much power it will take will take 12 years.

I think this is the first year where I can pay for a server in energy savings!

My next upgrade might be very different! I am very seriously considering replacing my homelab server with the slowest Beelink mini PC you can buy. The Celeron N5095 model sometimes goes on sale for $140. I have some free RAM here to upgrade it, and the little Beelink would probably use less than around $10 in electricity every year.

It would cost me $340 for a Beelink and a 14 TB USB hard drive to hang off the back. The two pieces of hardware combined might only cost me about $16 per year in electricity. That would completely pay for itself in power savings in about 4.5 years. Maybe less than three years if we include the costs of cooling.

I don’t like lumping in the 14 TB hard drive with the Beelink. I am quickly running out of storage on my server, and I am planning on replacing those four drives with a single 14 TB drive after my next inevitable disk failure. I will be retiring those hard drives soon whether I retire the FX-8350 or not!

The Beelink would pay for itself in electricity savings in two years. No problem.

It is exciting that this is even a possibility, but it is a bummer because this is a downgrade in many ways. The Beelink doesn’t have six SATA ports and the bays to hold those drives. The Beelink doesn’t have PCIe slots for upgrades. My FX-8350 is 50% faster than the N5095, but it is possible that the N5095’s AES instructions would give it a significant Tailscale boost!

But the Beelink is tiny, quiet, and capable of doing the work I need. I am excited that it is literally small enough to fit in the network cupboard!

There are more capable Beelink boxes. The Beelink model with a Ryzen 5 5560U would be a pretty good CPU and GPU upgrade, but I don’t need more horsepower, and that $400 Beelink wouldn’t save me enough power to pay for itself before I’d likely retire it.

Of course this gets complicated because I have no idea how to account for prematurely turning my FX-8350 server into e-waste.

Conclusion

I am just one guy testing his one ancient homelab server. I’ll probably find a way to do a comparable test on at least one more piece of hardware, but this is still just me. I want to hear from you!

Are you going to try this out? Do you have an old Kill-A-Watt meter or a smart outlet capable of measuring power usage? If you happen to do a before-and-after test with and without the conservative Linux CPU governor, I would absolutely love to hear about it! You can leave a comment, or you can stop by the Butter, What?! Discord server to chat with me about it!

Trying Out Tailscale Funnel and Tailscale’s New Proxy

| Comments

I am excited about the new Tailscale Funnel feature. I have been asking for an option like this ever since the very first time anyone at Tailscale asked me for my opinion. What the folks at Tailscale have done isn’t exactly what I asked for. Funnels are better than the feature I was envisioning, and they’re only worse in a few minor ways.

My wish was to be able to click a button in the Tailscale admin console to have an IPv4 address assigned to a machine on my Tailnet. The use case floating in my mind was that I could leave a clone of my Digital Ocean droplet running on my homelab server. If something went sideways with Digital Ocean, I could immediately point Cloudflare away from the old server and toward my new Tailscale IP address and have all our blogs back up and on the Internet with very little effort.

Tailscale has done a good job with their Funnel. I only need to touch the admin console once to set up the ACLs to allow my machines to create funnels. Everything else is handled on the nodes that want traffic to be funneled their way. That’s awesome!

How is this different than the feature I’ve been patiently hoping for over the last year or two? Tailscale Funnel can only tunnel TLS connections. It can forward traffic to a web server on your Tailnet just fine, but I don’t expect you can use a Funnel to let the entire world connect to your Minecraft server.

I am OK with that.

What functionality did I replace (and upgrade!) with Tailscale Funnel?

Brian Moses and I have some rather simple infrastructure set up so we can both push new blog posts to butterwhat.com, and they will be automatically published. The blog posts are stored in a private repository on Gitlab.com. I have a virtual machine here on my homelab server that regularly pulls the latest changes down and publishes if there are any changes.

That virtual server also runs continuous previews of the jekyll blog so that Brian can see what he’s working on.

Instead of connecting to Gitlab every two minutes looking for changes, I figured I could configure a webhook on that repository and Gitlab could just tell me when there is a change.

Setting up the Funnel and webhook was easy!

I have never done any of this before. I was chatting about my progress on our Discord server, and it sure looks like it took me less than 30 minutes. That even includes tearing down most of what I saw in the webhook tutorial I was following so I could reconfigure things so the webhook server wouldn’t run as root. I also goofed up and accidentally configured the webhook server to listen for Github instead of Gitlab, and I had to spend a few minutes scratching my head before I noticed.

Aside from the Tailscale funnel, the star of the show is the little webhook server available in Ubuntu’s apt repositories. I didn’t really know how I was going to listen for Gitlab’s connection and kick off a shell script. The answer wound up being quite simple!

I followed a nice tutorial on how to use Gitlab webhooks. The tutorial tells you how to install and configure webhook, how to configure webhook for Github, and how to configure Github to call your webhook. There are links to sample configurations for Gitlab as well.

Tailscale’s documentation is always fantastic, but the documentation for the Funnel is a little sparse. That is OK. Funnel is a new feature, and it isn’t even a production feature.

To use a funnel, you need to enable HTTPS certificates in your Tailscale admin console, allow funnels in your ACLs, and then turn on the funnel at the client. It was the second step there that I wasn’t sure about. I was wondering if I could enable funnels based on ACL tags, and it turns out that you can.

1
2
3
4
5
6
"nodeAttrs": [
  {
      "target": ["tag:blogdev"],
      "attr":   ["funnel"],
  },
],

The webhook server defaults to listening on port 9000. I had to run two commands to pass that through to the funnel:

1
2
root@butterwhat:~# tailscale serve / proxy 9000
root@butterwhat:~# tailscale serve funnel on

I was going to leave out my webhook config file. I just pasted in the sample from the documentation and edited a few lines. I may as well include it, so here it is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[
  {
    "id": "butterwhat-gitlab",
    "execute-command": "/home/pat/bin/webhook.sh",
    "command-working-directory": "/home/pat",
    "pass-arguments-to-command":
    [
      {
        "source": "payload",
        "name": "user_name"
      }
    ],
    "response-message": "Executing redeploy script",
    "trigger-rule":
    {
      "match":
      {
        "type": "value",
        "value": "footballislifebutalsodeath",
        "parameter":
        {
          "source": "header",
          "name": "X-Gitlab-Token"
        }
      }
    }
  }
]

It has been slightly modified to protect my little webhook server.

Did I do a good job with my blog publishing script updates?

Not really! I did set things up so that the pull-and-publish job only runs one process at a time. The bad news is that the wrong thing will happen if we manage to push another commit before the job finishes. We are a small team. The odds of this happening are not exactly high.

When I started setting this up, I assumed I would be able to completely eliminate the publish loop. Then I remembered that the script is smart enough to only publish blog posts with timestamps in the past. The loop still needs to run to eventually publish blog posts in the future.

Instead of pulling from Gitlab and publishing every three minutes, it is now doing so only once an hour. That seems much more reasonable, though I am certain I can better optimize this.

The webhook over the funnel works great, but it is the wrong solution for us!

Our blog-publishing infrastructure seems out of place today. It is not only possible to have Github or Gitlab publish our Jekyll and Octopress blogs directly every time we push to the appropriate repo. It was possible to do this when Butter, What?! was born, but we had a virtual machine publishing previews of upcoming posts anyway. Making that machine rsync to our web server didn’t add much in the way of friction.

Brian Moses and I set up our collaboration on Butter, What?! before we heard of Tailscale, and possibly even before Tailscale released a product. We used a private repo on Gitlab so that we would both easily have access to the Markdown files.

I am excited about trying out Tailscale Funnel today to streamline our process, but it also made me realize that using Gitlab for this is an artifact left over from the days when we didn’t use Tailscale.

The Butter, What?! test server is on my Tailnet, and I already share it with Brian. He could just as easily be pushing directly to a repo on that server. We don’t need to host anything on the Internet. We don’t need to be punching a hole to our development box for Gitlab to send us a notification.

Everyone at Butter, What?! is all-in on Tailscale. I don’t like to redo things that are already working, but the idea of moving the blog repo onto our Tailnet is in my mind now.

I might be most excited about Tailscale serve

The new serve command is awesome. It seems to be new, and it looks like the whole thing arrived with the Funnel update.

I am not entirely certain of everything you can accomplish with this new TCP forwarding and proxy system. It seems to have plenty of seemingly artificial limits so far, but even so, it sure looks like it will come in handy. Right now you can’t point the proxy at a different machine on the local network. It can only proxy localhost.

You can use the proxy to wrap an HTTP server on the Tailscale node in TLS. This seems neat. You won’t have to figure out how to correctly install your certificates after running tailscale cert. As far as I can tell, the Tailscale proxy will just use the generated certificates.

Why is this a big deal? I already know how to set up my certificates with nginx or apache. Why would I need Tailscale to do it for me?

Octoprint doesn’t run as root, so it can’t run its embedded web server on port 80 or port 443. I also have no idea what web server software it uses or how to configure it. I can just run tailscale cert octoprint.humpback-rooster.ts.net then tailscale serve / proxy 5000 on my Octoprint server, and it will wrap my Octoprint instance up in some sweet encrypted TLS.

I don’t need to know how my appliances configure their web servers. I don’t have to worry about an update wiping out my TLS config changes. I can just wrap them in TLS without installing any extra software on the node, and I think that is awesome. I can set up that proxy in less time than it would take me to figure out that Octoprint isn’t using nginx or apache.

Forget the funnels. The real gem that was made available to me this week is the proxy built into tailscale serve!

Conclusion?!

I am excited. Did I already say that? I really am. Tailscale just keeps making my life easier. Tailscale makes it easier to keep all my machines connected and safely off the public Internet. Then Tailscale made it really easy to let friends and associates connect to my private servers. Then Tailscale started managing 90% of my SSH keys for me.

Now they’re letting me poke tiny holes in my mesh firewall and even stick a TLS proxy in front of both my public and private web servers. This is all fantastic stuff, and I expect Tailscale will just continue to become even more awesome and convenient!

Can I Use Octoprint with a Networked Serial Port?

| Comments

The short answer is yes! Holy crap! I have my Prusa MK3S plugged into the USB port on my oldest OpenWRT router, and I am running Octoprint in a virtual machine on my homelab server. I have run five print jobs for a total of around five hours of print time, and I even slept through the night before starting the later jobs.

Everything is working great. Almost everything.

This is not a tutorial!

I don’t think I am doing this correctly. I am using socat on each end to move the serial port data across the network, and it is doing an absolutely fantastic job when the connection is active.

Hacker 3 Stable Diffusion

If the Octoprint server’s socat process dies, the socat process on the OpenWRT box just keeps on running, and it won’t let a new socat process connect. That means if I reboot the Octoprint server, I will have to ssh to the OpenWRT end to restart socat.

The man page for socat is a mile long. It is likely that I am missing something obvious. If I ever do find it, that might be the time for a tutorial.

Why on Earth would I run Octoprint this way?

The majority of Octoprint users run Octoprint on a Raspberry Pi. I have never done this here at home. When I bought my first 3D printer, I already had my little virtual machine host sitting next to the table where the printer was going to live. No reason to add a Raspberry Pi to the mix when I can just spin up a fresh VM for Octoprint.

I am trying to make my home office quieter for recording podcasts, so I have been considering the idea of moving my homelab server into the room with my network cupboard. How can I move my Octoprint server to the opposite side of the house?!

I could buy a Raspberry Pi, but they’re pretty expensive and hard to come by. I also thought about trying Octo4a on an old Android phone, but I’d need to buy cabling to let me charge the phone and plug it into my Prusa MK3S at the same time.

Then I remembered that I now have a spare OpenWRT router left over from my WiFi upgrade shenanigans, and it has a USB port. I was hoping I could add some extra gigabit switch ports to my office, gain the use of an extra WiFi access point to roam to, and extend my Octoprint server’s serial port all the way across the house.

The old D-Link tube router that I am using is roughly equivalent to the GL.iNet Mango OpenWRT travel router that I carry in my laptop bag. They have similar CPU, RAM, and storage specs, though the Mango lacks a 5.8 GHz radio and only has a pair of 10/100 Ethernet ports. It is nice to know that if I didn’t already have the D-Link ready to go ,that I could have bought another Mango for about $20.

Should anyone else do this?

Do you know what is awesome about the OctoPi disk images for the Raspberry Pi? You don’t need to be proficient with Linux, command lines, or shell scripts to get it up and running. You follow a short tutorial, plug a micro SD card into your Pi, and you’re off to the races.

I’ve been running Linux at home since the mid-nineties, and I was doing professional Linux work by the late nineties. Getting this up and printing didn’t take much more than ten minutes.

My Prusa MK3S and OpenWRT Serial Port Extender

Getting my init scripts to do the right thing on the router took longer than I’d like to admit. I needed to make sure socat didn’t try to start before Tailscale was downloaded and running, and I needed to make sure socat would restart itself if things went weird.

The number of times I had a typo in a path or accidentally left a wrong socat option on a command line was ridiculously high, and I have to wait two minutes every time I rebooted the router for a test. Those two-minute waits add up fast!

Is this a workout for the OpenWRT router from a decade ago?!

Not in the slightest. My old 16 MHz 80386SX in the early nineties didn’t even break a sweat transferring data over a modem at 115,200 baud. Sure, the fastest modem I had on that machine was a 14,400 baud, but it could go quite a lot faster when compressing text with v.42bis.

The MIPS processor in my D-Link DIR-860L is orders of magnitude faster. We could probably hang several printers off the back of this router if we used a USB hub, but I only have the one 3D printer.

socat on the D-Link doesn’t even register as using any CPU according to top.

Could I put a camera on the OpenWRT router?!

Maybe, but I don’t even want to try. I haven’t bothered with having a camera on my 3D printer in years. I don’t use Octolapse. I just print stuff.

You can connect a UVC webcam to an OpenWRT router, and there’s even an mjpeg-streamer package available. I don’t know how much horsepower you need to run mjpeg-streamer on a MIPS router, but it should be easy to point Octoprint and mjpeg-streamer on the router. If it works.

I would be trying this out today if I knew where my spare USB hub went to!

How long can your virtual serial cable be?!

I had to be silly, so I figured I should pretend to be one of those hackers from the movies who bounces their connection through all sorts of cities and satellites. I don’t have any satellites, and I don’t have all that many physical locations on my Tailscale network, but I pushed things as far as I could!

I added two more nodes to my socat loop. I ran a 3D printing job from Octoprint on my homelab server in Plano to a Digital Ocean droplet in New York. That node pointed its instance of socat at my Raspberry Pi at Brian Moses’s house, and that one finished the loop by pointing to my OpenWRT router here in Plano, TX.

What should have been a 13-minute print job took about an hour. My virtual serial cable would have been something like 3,000 miles long. Octoprint was about 80 ms away from the Prusa MK3S. Math says that we may have literally added 80 ms of print time to each of the 34,000 lines of gcode. How cool is that?!

The printer stuttered a lot, and the print was extremely ugly on account of all the short pauses. There were no smooth printing moves.

Was it slow and stuttering because of flow control on the serial port? Do you think finding a way to disable flow control in Octoprint would help with this symptom? Do you think the problem was that I added in an extra hop? I imagine things would be a lot better if I were running Octoprint in New York and printing in Plano with only a single socat connection between them. That would at least cut the latency in half!

This was silly. There’s probably no good reason to do this other than to say I 3D-printed an object over a 3,000-mile serial cable.

How do you make socat work?

I’ll say it one more time. I don’t think I am doing this quite correctly. These commands are working exceedingly well as long as I don’t have to reboot anything. I am pretty sure that if the power went out, everything would start up and connect correctly, but if I reboot one half, I almost definitely have to go in and restart the other machine’s socat process.

This is the script I am running on the OpenWRT router:

1
2
3
4
5
while true;
  do
   /usr/bin/socat -d -d /dev/ttyACM0,b115200,raw,echo=0 TCP-LISTEN:9998,bind=100.84.13.80,fork
  sleep 5;
done

You can leave out the bind option. I am using that to make sure socat is only listening on my Tailscale interface.

My socat process will error out over and over again until Tailscale starts. I thought about adding some logic to wait for the tailscale0 interface to show up before firing up socat, but I want socat restarting if it dies. I figured it’d be OK to be lazy and kill two birds with just the one loop.

Here is the script on my Octoprint server:

1
2
3
4
5
6
sleep 20

while true; do 
  socat TCP:100.84.13.80:9998 PTY,link=/dev/ttyACM1,raw,crnl,mode=666; 
  sleep 2;
done

I have forgotten why there is a 20-second sleep at the beginning of this script. Was I working around a problem, or was I hoping this would cover up a problem that doesn’t even exist anymore? I would comment that out and reboot everything to test it out for you, but my printer is in the middle of a job!

UPDATE: Maybe this is less fragile than I think it is!

I moved my homelab server out of my office last night. I don’t have a shelf for it yet, the cables are all just running out of the network cupboard’s doors, and the machine is just sitting on top of a cardboard box to keep it off the floor. We wouldn’t want a leaky washing machine getting my server wet!

It took me long enough to power the server down, unplug everything from the UPS, and finagle the UPS out from behind my desk that by the time I got the server booted up and the Octoprint virtual machine going, the socat process in the Octoprint’s init script was able to connect to the OpenWRT router without a problem.

I don’t know how long it takes for the socat server to time out, but it definitely times out for me in some number of minutes. That means that as long as I am not too efficient in restarting my Octoprint server, I shouldn’t have to restart the socat server manually.

Conclusion

I am happy enough with how this all worked out. When it is working, it sure seems to work absolutely perfectly. I no longer have any reason to keep my homelab server in my office, and I am super excited that I will be able to eliminate the noise of four extra hard drives constantly seeking. Not only that, but I’ve added an extra 450 megabit of WiFi bandwidth to this corner of my house.

I will do my best to remember to report back in a few months if things manage to keep working this smoothly. If there’s a problem, I’ll be sure to report back immediately!

Is It Safe to Use a Big, Honking USB Hard Drive on Your Raspberry Pi Server?

| Comments

The short answer seems to be yes, but why did I decide to write about this question today?!

Through the magic of Tailscale, my Seafile server, a Dropbox-like cloud-storage server, has been running remotely on a Raspberry Pi in Brian Moses’s home office for 22 months. The Pi boots off of a cheap micro SD card with all logging turned off, and Seafile stores all its data on a 14 TB Seagate USB hard drive.

Last week I had a scare! The ext4 filesystem on my USB hard drive set itself to read-only, and there were errors in the kernel’s ring buffer. Did my big storage drive fail prematurely?!

What happened?!

We may never know. I rebooted my little server, and then I ran an fsck. When that finished, I mounted the LUKS encrypted filesystem on the 14 TB USB hard drive, fired Seafile back up, and everything was happy. I haven’t seen another error in 13 days.

dmesg output

What do you think might have gone wrong? Is my USB cable flaky? Did the USB-to-SATA hardware in the USB drive get into a weird state? Did my little friend Zoe run through Brian’s office and bang into the cabinet and jostle the USB cable enough at just the right time to generate an error?

This is the first error I’ve seen in 22 months outside of a couple of random power outages.

This is good enough for me to say that it is safe to run a USB hard drive on my Raspberry Pi server. Maybe that’s not good enough for you. If not, then I hope this is at least a useful data point in your research.

The power outages all wind up being long Seafile outages, because I have to SSH in to punch in the passphrase to mount the encrypted filesystem before Seafile can start. Not only that, but I have to notice that Seafile is down before I can spend two minutes firing it back up again!

I am pleased that my USB hard drive didn’t fail!

I have been comparing my Dropbox-style server to Google Drive’s pricing. Google sells 2 TB of cloud sync storage for $100 per year, and I paid about $300 for my 14 TB drive and my Raspberry Pi. I am getting my off-site colocation for free by storing my Pi at Brian Moses’s house on his 1-gigabit fiber connection. You can just barely see my Pi server in the background of our Butter, What?! Show episodes!

I have done a bad job of properly keeping track of how much money I haven’t paid Google Drive or Dropbox. When I started out, I would have needed to rent 4 TB of storage. At some point during the first year, I would have had to add another 2 TB, and then I would have had to add another 2 TB at some point during this year.

I am currently using 6.8 TB out of a possible 14 TB.

Is it OK if I simplify the payments I may have had to pay to Google Drive? How about we just say I would have spent $200 last year, $400 this year, and I would be spending another $400 in two months? I expect that’s underestimating by something around $100.

I have already paid for all the hardware, and I would still be at least $50 ahead even if I had to order a replacement 14 TB drive last week.

That would have been fine, but I would have been bummed out if I had to do the work of resyncing my data to my new Seafile server. I also might have to set up accounts for the friends I share work with, and they might have to get data synced back up. That would all be a bummer.

I don’t account for the cost of labor in my “savings”

I know what I used to charge in the short spans of time when I did hourly consulting. I understand that my time is valuable, and that value almost certainly exceeds the cost of a few years of Google Drive storage.

There are quite a number of advantages to what I am doing, and they are very difficult to put a price on.

  • I own all my data
  • My server is only accessible from my Tailscale network
  • Everything except Tailscale is blocked on the LAN port
  • My hard disk is LUKS encrypted
  • My Seafile libraries are encrypted client side
  • I don’t get stopped to pay more at every 2 TB

Aside from all of this, it is a bit disingenuous for me to imply that getting my Seafile setup running requires time and effort while setting up Google Drive or Dropbox requires none. I have my data split up into a dozen Seafile libraries, and I have no idea what the equivalent functionality would look like with Dropbox or Drive. I’d still have to make sure that the correct data is synced to the correct devices.

Even though getting Seafile running on a Pi was more persnickety and time consuming than I anticipated, I still spent the majority of the time setting up clients on all my machines and syncing the correct data to each one.

I didn’t intend for this to be an update on the Seafile Pi with Tailscale

Even though this wasn’t my intention, this seemed like a good time for an update, since the server has been in operation for nearly two years. I am quite pleased with how things have been working.

A few months ago, I started booting Raspbian’s 64-bit kernel, and I replaced the 32-bit Tailscale binaries with 64-bit static binaries. That increased my Tailscale throughput from 70 megabits per second to just over 200 megabits per second.

I also swapped out the 4 GB Pi with a 2 GB model. They’re both identical except for the RAM, but the 2 GB Pi will not negotiate gigabit Ethernet. It is stuck at 100 megabit, so I have given up my some of my Tailscale encryption speed boost!

Would I do this with a Raspberry Pi today?

Not a chance! The hardware is working just fine, but Raspberry Pi boards are never in stock. When they are in stock, they are overpriced.

My friend Brian Moses recently grabbed a Beelink box for $140 from Amazon. It has a Celeron N5095, 8 GB of RAM, and a 256 GB SSD.

You get so much more with the Beelink! There’s a real SSD inside. The CPU is several times faster, and so is the GPU. You get full-size HDMI ports. You get 802.11ac. The Beelink isn’t just a bare board. It comes in a nice case, and it comes with a power supply!

The Raspberry Pi 4 is more than enough for my purposes, and I would gladly use one for this purpose again if they were still under $50.

One of the neat things about the Beelink boxes is that you can get a lot more horsepower than the Celeron version that Brian bought. You can also get a Beelink with 16 GB RAM and a 6-core Ryzen 5560U that is about three times faster. One of our friends on our Discord server bought one of those when it was on sale for only $300.

Conclusion

I am relieved that my unexpected and unexplained SATA error didn’t wind up being a drive failure, and I am super excited that the minor gamble that is this entire Seafile Pi experiment is paying off. I have invested less than $300 in hardware, and in two months I will have not had to pay Google $1,000 or Dropbox $1,200 for cloud storage.

Even if we assign me a rather high hourly rate, I am confident that I have crossed the point where I am saving money. Isn’t that awesome?!

Enabling WiFi Fast Transition Between Access Points with OpenWRT

| Comments

This isn’t a tutorial. The steps to enable 802.11r on two or more access points running any reasonably recent release of OpenWRT are pretty simple. I will spell those steps out shortly, but there won’t be screenshots walking you through the process. I am writing today to document what I have done, what I might do next, and how things are working out so far.

As I fact check this post, it just keeps getting longer and longer. So much longer than I expected! It feels like every time I look up specifics to make sure my memories of various events are reasonably accurate I wind up finding something that I had an incorrect understanding or memory of.

Everything in here should be pretty accurate now, but I am definitely not an expert when it comes to any of the newer WiFi roaming specifications of technologies. This is just my experience while messing around with this stuff over the last week or so.

Why is Pat monkeying with his WiFi setup?!

I have not been entirely unhappy with the WiFi signal around my house for the last few years, so how did we get here? Why have I upgraded things?

I was at Brian Moses’s house a couple of weeks ago. He’d just replaced one of his OpenWRT access points, a TP-Link Archer A7 WiFi 5 router, with a GL.iNet WiFi 6 router. That got my gears turning. I was thinking I could use Brian’s old router to solve my poor connectivity issues with my Raspberry Pi Zero W on my CNC machine in the garage. It’d get me a switch port to plug the laptop into as well, so I asked him how much he wanted for it.

TP-Link Archer A7 v5 with OpenWRT

The next week he sent me home with two of his old OpenWRT-compatible routers. That TP-Link and [a Linksys WRT3200ACM][wrt3200acm]. Now that I had an overabundance of WiFi routers, I figured it was time to reengineer my home network and eliminate my only access point that can’t run OpenWRT.

NOTE: You shouldn’t buy the TP-Link or Linksys router. The Linksys seems terribly overpriced, and it feels like the TP-Link should costs less, too. You can get two better equipped WiFi 6 routers from GL.iNet for the price of the Linksys WRT3200ACM WiFi 5 router, and the GL.iNet routers ship from the factory with OpenWRT.

The layout of our house

There are much bigger homes in our neighborhood, but the homes with significantly more square footage will have a second story. Our house is shaped like the letter U, and Zillow claims our house is around 2,200 square feet. I also need to cover the garage which adds another 500 square feet. Let’s just say I need to cover 2,500 square feet with WiFi, and reaching out into the yard wouldn’t be a bonus!

My house and its wifi access points

I have two access points, and each access point has a separate SSID on each radio. Chicanery2.4 and Chicanery5 are on the gateway router that lives in the network cupboard. Shenanigans2.4 and Shenanigans5 are on an access point in the living room. Both devices are D-Link DIR-860L routers. Both have faster WiFi when running D-Link firmware, but the one in the living room can’t even run OpenWRT.

Shenanigans is named after Farva’s favorite restaurant, and Chicanery is named after its Canadian equivalent from the sequel.

This is how things were configured last week.

Why not just put the same SSID on every access point?

We can call this my poor man’s WiFi steering. The access point in the living room does a good job reaching the entire house, but there are some important devices that are close to the network cupboard.

There is a television in the room opposite the network cupboard, and it is connected to chincanery5, and the television in the living room is also connected to chicanery5.

Why is the TV in the living room not connected to the access point in the living room? It is close enough to the cupboard to get a good connection, so there is no reason to waste 20 or 30 megabit on the house’s primary access point.

These devices are still connected to the access point in the cupboard today.

What have I changed?

I wound up putting the [Linksys WRT3200ACM][wrt3200acm] in the cupboard as our new Internet gateway and our upgraded chicanery access point. I did some tests, and this router can easily manage routing and NAT at 920 megabits per second. That’s only about 20 megabits shy of the maximum speed of gigabit Ethernet, so we won’t have to worry about swapping routers if we upgrade the speed of our FiOS service.

I put the TP-Link Archer A5 in the living room. Both routers are running the latest release of OpenWRT, and they have all their old SSID settings copied over.

I also added a new SSID called kerchow to every radio. I set up 802.11r fast transition on that SSID for nearly instantaneous roaming.

How do you enable 802.11r roaming in OpenWRT?

I gave every radio that I wanted participating in the roaming the same SSID and WPA2 key. I also had to choose a mobility domain for my roaming group. Then I had to:

  • Tick the checkbox on that interface to enable 802.11r
  • Enter my mobility domain
  • Choose FT over the Air for the FT protocol
  • Set DTIM Interval to 3 (supposedly helps iOS devices roam)

The first two are the only steps that are truly required. Lots of advice on the Internet recommended using FT over the Air, but I haven’t tested both options. They said to do it, it seems to work well, so I am sticking to it.

I have no iOS devices, so I have no idea about the DTIM Interval.

How do I know if 802.11r is working?

I am not entirely sure that I have tested things correctly, but I am mostly confident that I’ve tested well.

I used ssh to connect to both routers and ran logread -f to follow the logs to see when machines were connecting to WiFi. I had the OpenWRT LuCI GUI open in two browser tabs, and I would go in and click the disconnect button for my laptop.

Stable Diffusion Hacker

I would immediately see output in one of the logs saying the laptop reconnected. I had a ping running on the laptop the entire time, and I never missed a ping. Sometimes one or two responses would bump up to 200 or 300 milliseconds, but they always made it.

This seems like a success to me!

How do you know if your 802.11r setup is working well?!

I had a glitch. My Windows 11 laptop very much preferred 2.4 GHz connections. Once it picked a 2.4 GHz radio, it would always connect to another 2.4 GHz radio after being disconnected. I had to turn WiFi off and on again on the laptop to get it to connect to 5.8 GHz again.

This is a bummer, because the 5.8 GHz radios are about four times faster. This happens because 802.11r doesn’t care about bandwidth. It only cares about signal strength. A 2.4 GHz signal has a much easier time penetrating walls than a 5.8 GHz signal, so the slower radio often has a better radio signal even if throughput will be lower.

I solved this problem in a simple, ham-fisted, but very effective way. I now have three SSIDs and three different mobility domains.

I have kerchow with a mobility domain of beef on every single radio. I have kerchow2.4 with a mobility domain of b0b0 on the 2.4 GHz radios, and I have kerchow5 with a mobility domain of b00b on the 5.8 GHz radios.

I wish I discovered both DAWN and usteer before setting this up.

I added a GL.iNet WiFi 6 router to the mix

I have a GL.iNet GL-AXT1800 travel router here, and I was curious if the 802.11r configuration would even work on this hardware. These particular routers have radio chips that aren’t yet supported by OpenWRT, so they are shipped with an oddball proprietary Qualcomm fork of OpenWRT.

Brian said his GL.iNet WiFi 6 router got pretty goofed up when he configured the WiFi through LuCI instead through GL.iNet’s interface, but it did let him add additional SSIDs to each interface through the LuCI GUI. I was worried that other things like 802.11r wouldn’t work.

GL.iNet Mango and AXT1800 on my desk

I ran out of Farva-related restaurants, so I put the SSID of slammin on my GL-AXT1800. Then I added the three kerchow SSIDs with the correct mobility domains.

I don’t really have a good location for a third access point. I just found an open network jack near a power outlet that is about half way between the other two access points. My laptop rarely decides to roam to this access point. I had to click disconnect in the OpenWRT GUI so many times, but it did eventually connect right up to the WiFi 6 router.

As far as I can tell, 802.11r fast transition does work with the oddball OpenWRT firmware.

What about DAWN and usteer?

I am bummed out about this. A few weeks ago I learned the name of one of these packages, but I didn’t remember to make a note of it. I tried finding it several times over the last week, but I failed miserably.

Do you know when I managed to find one of them? Minutes after I set up nine different variations of kerchow on three different routers.

I don’t have a lot to say about either one at this point. They are both available as OpenWRT packages. They are tools that help you configure how devices roam between your WiFi access points.

Both seem fiddly and complicated, and success seems to involved a lot of trial and error. I’d be happy if either package could easily manage to accomplish two things.

I’d like to be able to eliminate two of the three kerchow variations. It sounds like configuring one of these tools to make your 2.4 GHz less appealing only requires a few lines of configuration. That would be awesome.

Why am I waiting before trying DAWN or usteer?!

Both require upgrading the default wpad-basic OpenWRT package with the more feature complete version. I’m sure this will go smoothly, but I am just not ready to try something that might require that much effort!

If I were just monkeying around with a few test routers, this would be fine. That’s not what I am doing. These are the routers running my home network. I need them to work, and they are working right now.

NOTE: I might be confused about this. My fresh installs of OpenWRT 22.03 all have wpad-basic installed, and the description for this package claims it supports 802.11r. I am guessing that wpad-basic has replaced wpad-mini which lacks this support. I am not certain if wpad-basic has the features required for DAWN or usteer, but I am suspecting that it will work fine!

Your setup is probably more complicated than mine!

I have learned that I can cover my entire house with acceptable WiFi speeds using a single access point. Sure, the speed drops to 20 or 30 megabits per second in the bathroom and garage, but that’s more than enough to watch YouTube videos. Anywhere with a desk can manage 200 megabits per second.

I don’t truly need multiple access points with fast roaming, but maybe you actually do! If you do, I imagine things get more complicated!

Stable Diffusion hacker

My access points have their radios cranked up to maximum output. If you’re trying to cover a slightly larger area with two access points, you probably don’t want to be blasting quite so hard. You want your devices to notice that one access point is weaker than the other so they can switch over.

I am also cheating because there is a gigabit Ethernet port at every desk in the house. I only need just enough WiFi reaching every comfy chair for my phone and laptop to at most play YouTube videos, and I need a couple of dozen megabits to get 4K video streaming to two televisions. Things get more complicated if you really need a solid 200 megabit wireless connection at specific workstations in the house.

Why does Pat keep using numbers like 200 megabit?! Can’t modern WiFi do gigabit?!

I’ve been doing iperf tests to various access points from all sorts of distances and different rooms all week. The fastest WiFi 6 router I have access to has managed to pull numbers really close to 700 megabit, but that only happens in the same room, and it only happens some of the time. I am way more likely to see 400 megabits per second while in the same room.

Once you start putting a couple of walls in between your devices, these numbers drop. It seems like 250 megabits per second is about what I can expect to see from a about one room away from the access point. The signal has to either bounce around corners through doorways, or has to go through the pair of walls that make up the hallway.

The fastest advertised WiFi speeds are only possible under ideal conditions.

Something that surprised me!

First of all, if you are buying a fresh set of OpenWRT routers for your project, you should probably be buying GL.iNet routers that ship from the factory with OpenWRT. At last that way you know what you’re going to get. One of my old D-Link tube routers isn’t running OpenWRT because I had no way to know which revision of the hardware I was going to get before it arrived.

Not only that, but many WiFi routers that can run OpenWRT have radio chipsets that don’t run as well with the open source drivers. My older revision D-Link router’s WiFi is about 50% faster than the newer revision with OpenWRT.

I was handed a box of free OpenWRT-compatible WiFi hardware. I already knew the hardware would work. If this didn’t all happen accidentally, I would have bought GL.iNet routers instead.

I guess I should get to the weird part. The Linksys WRT3200ACM can be set to 200 mW on 5.8 GHz and 1,000 mW on 2.4 GHz. The TP-Link Archer is the opposite! It can be set to 1,000 mW on 5.8 GHz and 200 mW on 2.4 GHz!

Does the OpenWRT driver really know how much power the radio is going to put out? Is it anywhere near correct? We have no idea. I never trust numbers like this.

Radio is weird. You have to quadruple the transmit power to double your range. I have no way to properly verify just how much power the Archer A7 is pumping out, but the WiFi analyzer seems to think the TP-Link 5.8 GHz signal is quite a bit stronger than the WRT3200ACM!

Conclusion

I don’t know if there is a conclusion. I set up 802.11r. I pointed a few of my WiFi devices at the 802.11r SSIDs. I think I am at the point where I wait and see how things work out.

Maybe the conclusion is that 802.11r was really easy to set up on my OpenWRT hardware. It isn’t much more work than checking a box, though it does get a bit more fiddly when you are trying to put the correct mobility domain on three different SSIDs on three different routers. I know I goofed at least one up on my first attempt!

OpenWRT, Two GL.Inet Routers, and Tailscale: Successes and Failures

| Comments

This blog isn’t going to lead to some keen piece of insight or an interesting conclusion. I’ve been messing around with a pair of very different OpenWRT routers from GL.iNet: the GL-MT300N-V2 Mango and the GL-AXT1800 Slate AX.

I learned some things I didn’t know about last week. I learned just a little more about Tailscale. This is mostly just to write down what has worked and what hasn’t worked for me.

GL.iNet Mango and Slate AX 1800

I figured I should write some of these things down both for future me and for anyone else who might be trying to accomplish something similar. Mostly for future me.

How did we get here?

A friend of ours is moving to Ireland. He just wants to watch Jeopardy on his Hulu Live TV subscription. He’s been watching Brian and I monkey with Tailscale on our GL.iNet Mango routers for ages. I know our friend has had his Mango for a while, quite possibly for about as long as Brian and I have had our Mangos, but now he’s been trying to use it to connect his Apple TV to an exit node in the United States.

The story is more complicated than this, but the Mango didn’t work out for him, so he tried the much beefier GL.iNet Slate AX, and he couldn’t make that work. That’s how I wound up having a Slate AX here in my possession. He is currently using a Raspberry Pi 4 with Tailscale as a router to forward his Apple TV to America.

I am able to get my tiny Mango to route traffic of connected devices through one of my Tailscale exit nodes. I am not able to do the same with the much nicer Slate AX router.

Who is Pat blaming?!

I don’t think anyone needs to accept the blame here. I don’t believe that running Tailscale on an OpenWRT router isn’t officially supported. Running Tailscale on the Mango is a bit of a hack because the Mango doesn’t have enough storage to even hold Tailscale.

Not only that, but as I learned long after I started trying to figure this out, the GL.iNet WiFi6 routers aren’t even built upon official upstream OpenWRT!

We can probably blame me for not doing science correctly. I am also only a sample size of one, so just because something is working well for me several times on the day I happen to test it does not always mean it works smoothly next week.

Keeping track of what is going on is problematic. The logs on the OpenWRT routers are ephemeral. Not only that, but I didn’t think I would need to troubleshoot anything on my Mango, so my startup scripts redirect all tailscaled output to /dev/null!

Pat is bad at science

I did a bad job writing things down. I didn’t know I was going to have to. All I really managed to do was write down some things after the fact on our Discord server. Those were usually the weirdest happenings.

We also aren’t helped by the fact that not everything happened at the same time. I updated Tailscale and OpenWRT on my Mango a few weeks ago, and everything there was fine. Then, when I tried booting up the router a few days later, things weren’t going as smoothly there as I thought. Then even more time passed before I got to fart around with the beefier WiFi 6 router.

Science is hard. I was kind of expecting to make things work, wipe everything clean, and make it work again. That last step would have given me all the documentation I needed, but I never made things work on the beefy router.

Connecting my GL.iNet Mango to a Tailscale exit node

First of all, I am cheating when I load Tailscale on my tiny Mango. It only has 16 megabytes of flash, and most of that is taken up by GL.iNet’s OpenWRT firmware. The official Tailscale binaries are bigger than that, so I wound up installing them on a USB flash drive.

It was a rather manual process, and I documented it as best as I could. I had to make a few tweaks in the OpenWRT LuCI GUI to get exit node traffic routing correctly, but once I did, I thought I had it working pretty well. I powered the router off and on several times, watched it succeed every time time, then I packed the Mango away in my laptop bag to use in an emergency.

The Mango is pretty slow. It can only encrypt traffic over my Tailscale link at about 4 megabits per second. That’s way slower than Wireguard in the kernel on this machine, and probably only just barely enough for video streaming. I am guessing Netflix would bump you down to standard definition via a Tailscale exit node on the Mango.

A few weeks later when my friend was having trouble, so I pulled my Mango out of the bag and plugged it in. My Mango didn’t want to connect to my Tailnet unless I killed and restarted tailscaled. Adding a long sleep to my Tailscale script seems to help with this, but it isn’t perfect.

I should also note that I only managed to get the Mango to route traffic through my exit node if the Mango is using Ethernet for its WAN connection. If I set up the Mango to use WiFi as its WAN, it won’t route traffic via the exit node.

I didn’t notice this right away, and I haven’t investigated what is going on here. This is only a problem when trying to route traffic through an exit node. The Mango works fine for me with WiFi WAN as long as it is just a regular node or subnet router.

NOTE: On the Mango, I have to create an interface in LuCI for the tailscale0 interface and make sure it is attached to the WAN firewall rules. This coaxes OpenWRT’s firewall scripting to apply the correct iptables rules for packets to flow.

The GL.iNet Slate AX is built on Qualcomm’s OpenWRT fork

Everything that went wrong for me when attempting to route the GL-AXT1800 through a Tailscale exit node was completely different than on the Mango. I assumed something was just a little different between the newer release of OpenWRT that this router was built on, so I investigated the idea of installing a different version.

I couldn’t downgrade the AXT-1800 to match the Mango. This made sense to me. Why would GL.iNet start crafting their build for a new device with an older version of OpenWRT for a new piece of hardware?

I also noticed that I couldn’t upgrade the Mango to match the Slate. I knew there were official OpenWRT builds for the Mango, so I checked the OpenWRT site for firmware for the Slate, but I couldn’t find it. There wasn’t any open-source firmware for any of GL.iNet’s WiFi6 devices.

I thought I read that there wasn’t yet any OpenWRT support for WiFi6 devices

At least I thought I read this, but GL.iNet sells a few OpenWRT routers with WiFi6 chips. How can that be?! We just figured it out, and GL.iNet doesn’t hide what they’re doing. They spell it out right in the product description.

Qualcomm fork of OpenWRT

OpenWRT doesn’t yet have much support for WiFi6 devices, but Qualcomm has a fork of OpenWRT that supports their latest WiFi6 chipsets.

I haven’t decided precisely how I feel about this, but it is definitely making my troubleshooting more difficult. That is assuming you’re willing to call my ham-fisted attempts at messing about here troubleshooting!

I understand that I don’t truly understand how Tailscale works

I understand well enough how packets get from a Tailscale device at my house to a Tailscale device at another location. What I don’t understand is how packets on my Linux machines find their way into the Tailscale process, but I do understand that Tailscale has its own little IP stack hiding in there.

There are no entries in my routing tables listing any of my Tailscale addresses. They should all be matched by my default gateway, yet they manage to get snagged by Tailscale and routed appropriately.

What works for me with Tailscale on the GL.iNet GL-AXT1800?

I installed the Tailscale OpenWRT package and immediately noticed that it is built with an old enough version of Tailscale that it doesn’t support exit nodes. I cheated and replaced the package’s binaries with the latest Tailscale binaries. It fires right up, connects to my Tailnet, and I can connect to things.

I figured cheating was the right thing to do. It seemed smart to let the OpenWRT Tailscale package install startup scripts and maybe even LuCI GUI configuration bits and pieces for me, and I could just replace the outdated binaries.

What happens when I try to route traffic through an exit node?

When I run the command to route traffic through an exit node, things get weird.

As soon as the exit node is enabled, I can no longer reach the exit node’s IP address, but I can reach other nodes just fine. At least I thought I could reach other nodes just fine.

One time I was accidentally pinging the wrong node in the background while enabling the exit node. I got really excited when it continued to respond, but then I noticed that the round-trip time increased quite a bit. A few seconds later I noticed that I was pinging the wrong address.

You can pass Tailscale the --netfilter-mode=off option to prevent Tailscale from creating any firewall rules. This gave me the same results.

What is the solution?

Our friend’s solution is Tailscale on a Raspberry Pi. That’s a fine solution, but I did something that made me feel a bit dirty. I set up a Wireguard server in a Docker container.

I followed somebody’s guide. I don’t think it was this exact guide, but it was similar. I was disappointed when I saw that I had to hit my page-down key twelve times to get to the end of the documentation. This is an order of magnitude more work that setting up Tailscale and a handful of usable exit nodes.

Not only was it long, but the documentation didn’t work for me. I had to install the wireguard-dkms package on the Docker machine. This makes perfect sense, but it took longer than I’d like to admit for me to figure out that I needed Wireguard support in my host kernel.

The good news is that GL.iNet’s awesome Domino interface makes it extremely easy to connect to a Wireguard or OpenVPN server.

The Domino GUI on the Slate AX is fancier than the tiny Mango!

These are very different travel routers. The Mango cost me $20 two years ago, only has 2.4 GHz WiFi, only a pair of 10/100 Ethernet ports, and it happily boots up when plugged into a 0.5 amp USB hub.

The Slate AX has a beefy heat sink with a little fan. It comes with a 5-volt 4-amp power supply. It has near bleeding-edge WiFi speeds. It is a dense beast of a little machine, and it costs nearly six times more than the Mango. I am not surprised that there are significantly more options in the stock Domino interface.

Their Domino GUI is a really nice feature on both routers. Either will let you do things like connect to WiFi or tether your phone to use as your WAN. You could do this with OpenWRT and LuCI, but you’d have to click dozens of buttons and change so many settings, and you better not mess any of it up!

The Domino GUI lets you do this with just a few clicks. I’m not excited about having this at home, but it is extremely handy on a travel router.

The Slate AX has options to allow you to use Ethernet, WiFi, and a tethered cell phone as a weighted multiport WAN. I haven’t had a chance to test this, but I am impressed that the option is available, and it looks easy to configure. You can do this sort of thing with stock OpenWRT, but you will have to work very hard to do it!

Conclusion

It is a bummer that we couldn’t get a GL-AXT1800 routing through a Tailscale exit node, but I am mostly only bummed out about it because we couldn’t get our friend watching Jeopardy through an OpenWRT router with Tailscale. The things that we are currently able to do with Tailscale on an OpenWRT router are still quite impressive to me.

When I bought the Mango two years ago, loading Tailscale on it was an afterthought, and exit nodes didn’t even exist yet. I just thought it was neat that I could leave my Mango behind and still be able to ssh to it. That option alone has a ton of value to me, and there are dozens of times when something like this would have come in handy over the last twenty years.

Two years later, and my Mango can route traffic through an exit node or route outside traffic to its own local subnet. I am hopeful that things will improve over time from every angle. GL.iNet seems to really want to get the IPQ6000 support ported to upstream OpenWRT. OpenWRT should eventually have a more recent version of Tailscale in their package repositories. And of course Tailscale is constantly improving.

This conclusion has gotten away from me. What I am trying to say is that it sure seems like everyone involved is probably doing a good job.

Is It Time For You to Set Up Tailscale ACLs?

| Comments

If you’re a lone Tailscale user like me, there’s a good chance that you have no pressing need to set up Tailscale’s access control lists (ACLs). Until quite recently, I didn’t feel there was much reason to lock anything down.

Pretty much every computer I own has been running Tailscale for more than a year now. They could all ping each other. In fact, most of them are on the same LAN, and they could ping each other before I had Tailscale. Tailscale already locked them down a bit more thoroughly for me. Why lock them down any more?

Then I started using Tailscale SSH

As soon as I started enabling Tailscale SSH, I needed to set up some access controls. I wanted to emulate my previous setup.

My desktop and two laptops had their own SSH private keys, and their matching public keys were distributed to all my other machines. That meant these three computers could connect to any computer I own.

1
2
3
4
5
6
7
8
9
"ssh": [
        // I don't actually use this rule anymore!
      {
          "action": "accept",
          "src":    ["tag:workstation"],
          "dst":    ["tag:server", "tag:workstation"],
          "users":  ["autogroup:nonroot", "root"],
      },
  ],

I gave those three devices a tag of workstation, and I stuck a server tag on everything else. Then I set up an ssh rule in Tailscale to allow any workstation to ssh into any server or any other workstation.

So far, so good. This configuration does happen on Tailscale’s Access Controls tab, but it isn’t in the acls section of the file. At this point, my Tailnet was still wide open.

I got worried when I added my public web server to my Tailnet

I have a tiny Digital Ocean droplet running nginx hosting patshead.com, briancmoses.com, and butterwhat.com. I always said I should install Tailscale out there, but my web server droplet has been running an outdated operating system for a while, so I knew I would be creating a fresh VPS at some point.

I finally did that. I spun up one of the new $4 per month droplets, copied my nginx config over, and installed Tailscale. I am super excited about this because it means I don’t even have to have an ssh port open to the Internet on my web server.

However, this means that a scary server that I don’t personally own that is sitting out there listening for connections on the Internet is connected directly to my Tailnet. Yikes!

Tagging all your machines for use in ACLs is hard!

It isn’t hard because you have to click on every machine to add tags. It is challenging because choosing names for your tags is easy to goof up!

My original decision that a workstation would connect to a server and never the other way around was too simple. It wasn’t the right way for me to break things down, and as I started adding more tags, I wasn’t able to easily set things up the way I wanted.

I’ve been doing my best to make sure my Tailscale nodes don’t have any services open on their physical network adapters. My workstations are mostly locked down well, and I moved things like my Octoprint virtual machine behind the NAT interface of KVM instead of being bridged to my LAN.

Even so, I have two servers at home that need to be accessible from outside my Tailnet. My NAS shares video to my Fire TV devices just in case I need to watch Manimal, and I have lots of unsafe devices around the house that need to connect to my Home Assistant server.

This seemed easy. I immediately tagged my NAS, my Home Assistant server, and my public web server with a tag of dmz.

What was the problem with this?

I want my workstations to be able to see everything. I want my servers to be able to communicate with each other, but I don’t want my servers in the dmz to be able to connect to my internal servers or workstations.

This all seemed simple and smart until I realized that everything in my dmz already had a server tag. I also very quickly realized that my Home Assistant server listening to my LAN is much less threatening than my web server listening to the public Internet. One of those should be on an even more restricted tag!

Where did I actually land?

I have four main tags now:

  • workstation
  • server-ts
  • server-dmz
  • server-external

My personal workstations can connect to anything. Machines tagged server-ts can connect to machines tagged server-ts and server-dmz, while the server-dmz servers can only talk to other server-dmz machines.

1
2
3
4
5
6
7
8
  "acls": [
    {"action": "accept", "src": ["tag:workstation"],   "dst": ["*:*"]},
    {"action": "accept", "src": ["tag:server-ts"],     "dst": ["tag:server-ts:*", "tag:server-dmz:*", "autogroup:internet:*"]},
    {"action": "accept", "src": ["tag:server-dmz"],     "dst": ["tag:server-dmz:*"]},
    {"action": "accept", "src": ["tag:blogdev"],      "dst": ["tag:blogprod:22"]},
    {"action": "accept", "src": ["nas"],              "dst": ["seafile:*"]},
    {"action": "accept", "src": ["autogroup:shared"], "dst": ["tag:shared:22,80,443"]},
  ],

These are all my ACLs as of writing this. There are a couple of more specific rules there that I didn’t talk about yet.

There’s a rule there that allows one of my virtual machines here at home to publish content to my public web server.

My NAS is in the dmz, so I had to give it its own rule to allow it to connect to my Seafile Pi*. My NAS syncs extra copies of some of my data for use as a local backup!

I goobered up my exit nodes!

I am more than a little embarrassed by how many times I had to go back and forth between desks to figure out why the exit node on my GL.iNet Mango stopped passing traffic to the Internet.

The Mango had a tag that allowed it to access the exit node. If I took that tag away, it couldn’t ping the exit node. I’d add it back, and while it could ping the exit node, it couldn’t route any farther. If I dropped the original ACL that leaves everything wide open, the Mango could route traffic just fine. What was going wrong?!

It seems like I had this idea in my head that Tailscale’s ACLs only applied to Tailscale nodes and addresses. I didn’t immediately realize that I had to explicitly allow access to the Internet or even other subnets I might be routing!

1
{"action": "accept", "src": ["tag:server-ts"], "dst": ["tag:server-ts:*", "tag:server-dmz:*", "autogroup:internet:*"]},

I just had to add autogroup:internet to the allowed destinations for the appropriate tag. Duh!

Don’t think too hard before implementing your ACLs

This is especially true if you are down here at my scale with a couple dozen nodes and only a few shared nodes. Just drop some tags on things and set up some access controls that allow nodes access to what they need.

You probably won’t set things up optimally. I know I didn’t on my first try, and I am already seeing things I’d like to do differently. Even if my initial attempt left things more open than I might like, it was still a huge win just because it blocked my public web server from connecting to the rest of my Tailnet. Any other improvements are minor by comparison.

If money and other people’s livelihoods are on the line, maybe you should spend some time having meetings and planning things out on whiteboards. It only takes a few seconds to switch back to the single default ACL that leaves your Tailnet wide open, so if you do find a problem, you can at least revert your changes quickly and easily!

Tailscale SSH is affected by Tailscale network ACLs!

This seems obvious, but I wasn’t positive that this would be the case! Tailscale seems to always make the best possible default choices, and that got me thinking that it might be the case that Tailscale’s own SSH server would ignore the ACLs if the connection were allowed in the ssh section of the access control configuration.

This does not seem to be the case. If you want to use Tailscale SSH, then your networking ACLs have to allow it. To be clear, I think this was the correct thing for Tailscale to do.

Shared nodes are allowed access by default

I wasn’t sure about this. The default single ACL just has one line that allows everyone access to everything. The first thing you do when designing your own ACLs is delete that entry. At that point nobody has access to anything, so I assumed I would need to add a line similar to this:

1
{"action": "accept", "src": ["autogroup:shared"], "dst": ["tag:shared:*"]},

We tested this. This wasn’t necessary, but I figured it would be a good idea to lock down my shared nodes just a bit, so I wound up using this ACL:

1
{"action": "accept", "src": ["autogroup:shared"], "dst": ["tag:shared:22,80,443"]},

It is a bit lazy. Three people need access to ports 80 or 443 on the Seafile server, and Brian needs SSH access to rsync files to his blog. It gets the job done.

I did test out removing ports 80 and 443 from this ACL, and I watched the connections on my Seafile server. All the Tailscale IP addresses that I didn’t own dropped off the netstat list, and when I put those ports back in the ACLs, everyone connected back up immediately.

I am sure the documentation explains this, but I doubt I am the only one who likes to see things work in practice just to make sure!

Forgetting you have Tailscale ACLs configured makes troubleshooting a real challenge!

This happened to me yesterday! A friend sent me a GL.iNet GL-AXT1800 router to help him get his identical router to pass local traffic through a Tailscale exit node.

I installed the ancient OpenWRT Tailscale package, replaced the binaries with the official Tailscale static ARM binaries, ran tailscale up, and it gave me the URL to open to authenticate this new node. Everything went smoothly, except I couldn’t ping any of my other Tailscale devices!

Derp. Since I didn’t remember to put any tags on my new Tailscale device, it wasn’t matching any of my Tailscale ACLs, so it couldn’t actually connect to anything!

This was a simple mistake, but I walked back and forth between two desks and rebooted the GL.iNet router at least twice before remembering that I even configured any Tailscale ACLs in the first place!

Conclusion

If you’re just a home gamer like I am, you probably don’t need to worry about Tailscale ACLs. If you have one or more nodes on your Tailnet that have services running on the open Internet, you may want to lock things down a bit. It would be a real bummer if someone managed to crack open your public web server, because they might be able to ride Tailscale past your other routers and firewalls.

One of the awesome things about Tailscale is that I have absolutely no idea what you’re doing with it. You might just be one person sharing a Minecraft server with some friends. You might be sharing a couple of servers with business partners like I am. You might even be managing a massive and complicated Tailnet at a giant corporation.

You and your Minecraft server probably don’t need to worry about ACLs, but if you are in a position where you should be thinking about tightening up your access controls, I hope my thoughts have been helpful!

The OpenWRT Routers from GL.iNet Are Even Cooler Than I Thought!

| Comments

I have had my little GL.iNet Mango router for about two years now. It was an impulse buy. It was on sale for less than $20 on Amazon, and I just couldn’t pass it up. It was exciting for me to learn that there is a manufacturer that ships OpenWRT on their routers, and I really wanted to mess around with one.

I rarely use my Mango router. It live in my laptop bag. If I ever need a WiFi extender, it is there. If my home router fails, it would be my emergency spare. My Mango is a Tailscale subnet router, so if I am ever away from home and need to reach an odd device via my Tailscale network, then I can. It is pretty awesome!

I bought a new laptop a few months ago, and I have been tidying up my various laptop bags. I realized that I hadn’t updated OpenWRT on my Mango in two years, and my Tailscale install is just as old. It seems like it is time to update things!

I had some problems along the way, and I managed to lose all access to my Mango. It could hand out DHCP addresses. It could route traffic. It wouldn’t respond to pings, HTTP, or SSH.

I am really excited that I had problems, because I learned that the GL.iNet routers are even more awesome than I thought!

NOTE: I didn’t really have any problems with my Mango! Something weird was happening on my Windows 11 laptop.

There’s more available than just the stock firmware!

When I could no longer ping my Mango router, I first tried resetting to factory defaults. That didn’t work. Then I tried re-flashing the latest firmware, and it still didn’t work.

Then I noticed that GL.iNet supply several different firmware images for their routers. There’s the stock image with their own GUI called Domino. There’s another that skips Domino and just has the office OpenWRT LuCi GUI. Then there’s a third firmware that routes all your traffic through the Tor network. How cool is that?!

I flashed the LuCi-only firmware and my Mango starting working correctly. All the official GL.iNet firmware images for the Mango are based on OpenWRT 19.07.08. That’s not too bad. The OpenWRT folks are still updating version 19, but first release of version 21 happened last year.

You can definitely download a version 21 build or a release candidate of version 22 for the Mango directly from the OpenWRT site.

Should I just run LuCi, or do I want the Domino GUI?

I love LuCi. If I were permanently installing a GL.iNet router in my home I would most definitely skip GL.iNet’s Domino GUI. I would most likely be installing that release candidate of OpenWRT 22.03 just to avoid a major upgrade in the near future.

My Mango doesn’t have a permanent home. It is a tool that lives in my laptop bag. There’s a very good chance that I might let a friend borrow it. The Domino GUI is WAY more friend-friendly than LuCi!

The Domino GUI also makes some difficult things as easy as clicking a button.

The GL.iNet interface has a simple interface to allow you to use another WiFi network as your WAN port. It has an equally simple dialog to configure the Mango as a WiFi repeater.

Either of those configurations would require dozens of clicks in OpenWRT’s LuCi GUI, and Domino even lets you tie those configuration settings to a physical switch on the router.

I definitely want the Domino GUI on my toolkit’s router.

Should I have bought a higher-end GL.iNet router?

Two really cool things came into my life at about the same time two years ago: the GL.iNet Mango and Tailscale. The Mango only has three or four megabytes of free disk space, and the Tailscale static binaries add up to more than 20 megabytes. One cool thing doesn’t fit on the other cool thing!

Two years ago, the only way to get Tailscale onto an OpenWRT router was to install it manually. Now you can just install it with the OpenWRT package manager, and that is awesome!

I cheated and put the Tailscale binary on a USB flash drive when I set things up two years ago. It’d be nice to not have to do this, but in a way, I am pleased with this configuration.

What if I loan my Mango to a friend? What if they’re less than trustworthy? I can just pop the USB drive out! All the Tailscale configuration and keys live on that drive. If they don’t have that, they can’t access my Tailnet.

I am pretty sure the OpenWRT Tailscale package will work on the Mango

The Tailscale package is only around 2.8 megabytes. That would nearly fit on a fresh Mango router with the stock GL.iNet firmware!

The GL.iNet firmware is running OpenWRT 19, and there don’t seem to be any Tailscale packages in the OpenWRT 19 repositories. Even if you could squeeze the package in, you’re going to have trouble getting an official OpenWRT package.

I did notice that when I installed the clean OpenWRT 19 image from GL.iNet that there’s around 7 megabytes of free space. That’s plenty of room to install the Tailscale package!

You should be in good shape if you download the latest version of OpenWRT for your Mango straight from the OpenWRT site. It sure looks as though you’ll have enough room, and the packages will be in the repository for you to install right away.

I didn’t want to give up the Domino GUI. Being able to connect to the router and click a few buttons to switch modes between routing, repeating, and other things is ridiculously handy.

How do I run Tailscale on the Mango if the Mango doesn’t have enough storage?

I have been arguing with myself for five minutes about how much information to include in this section. A step-by-step guide would make this blog way too long, and a 10,000’ overview seems too broad. Let’s see if I can land in a good spot near the middle.

I mostly repeated what I did to install Tailscale on my Mango in 2020, but I made room on the diminutive SanDisk flash drive for Ventoy. I also cleaned things up so I can modify the Tailscale startup job without logging in to the Mango.

Ventoy is occupying the first two partitions on my USB drive, so I added a small ext3 filesystem as the third partition. This has a copy of my tsup.sh script, the state file for Tailscale, and it is where I unpacked the Tailscale mipsle package. For the convenience of future upgrades, I created a symlink pointing to the current version of Tailscale. This is the root directory of the ext3 filesystem:

1
2
3
4
5
6
7
8
pat@zaphod:~$ ls -l /mnt/sda3
total 17744
drwx------ 2 root root    16384 Jul 24 16:49 lost+found
lrwxrwxrwx 1 root root       25 Sep 18 06:52 tailscale -> tailscale_1.31.71_mipsle/
drwxr-xr-x 3 root root     4096 Jul 18 12:58 tailscale_1.28.0_mipsle
drwxr-xr-x 3 root root     4096 Sep 15 22:54 tailscale_1.31.71_mipsle
-rw------- 1 root root     1418 Sep 18 07:05 tailscale.state
-rwxr-xr-x 1 root root      676 Sep 18 07:12 tsup.sh

This is my tsup.sh:

1
2
3
4
5
6
7
8
9
10
#! /bin/sh

# Not sure if the sleep is necessary!
sleep 10

/mnt/sda3/tailscale/tailscaled -state /mnt/sda3/tailscale.state > /dev/null 2>&>
 
# Make sure my bootable USB partition is unmounted cleanly
/bin/umount /mnt/sda2
/bin/umount /mnt/Ventoy

To make this work, I used the advanced settings tab to add this one line to the end of OpenWRT’s startup script:

1
(sleep 15; /mnt/sda3/tsup.sh) &

This could all be better, but it works. I did have to sign in once via ssh to run tailscaled and tailscale up manually so I could authorize the Mango on my Tailnet.

The various sleep commands sprinkled around are just laziness. You can probably guess why each of them exist.

I purposely chose to store the tailscale.state file on the flash drive. If I loan out my Mango to a friend, I might not want them connecting to my Tailscale network. If I pop the flash drive out, they won’t have any of the data needed to make a connection.

My GL.iNet Mango can’t use Tailscale as an exit node

And I am not sure exactly why! Tailscale routes packets without issue. I have this node configured as a Tailscale subnet router for its own local subnet. That seems to work correctly, so it is able to route packets from WiFi clients to nodes on my Tailnet.

I was hoping to be able to have the Mango route traffic through an exit node. That was a FireTV or AppleTV or something similar could watch American Netflix from Ireland, but it isn’t cooperating with me.

At first I tried tailscale up --exit-node=seafile, but that immediately cut off all access to local clients connected to the Mango. I was able to ssh in via Tailscale and verify that the Mango was using the exit node.

I updated that command to tailscale up --exit-node=seafile --exit-node-allow-lan-access, and my Mango’s local devices were able to talk to the mango again, but they weren’t able to pass traffic any farther than the Mango.

I am close, but not quite close enough!

UPDATE: I got my Mango routing properly through an exit node just a few hours after publishing this blog! This should most likely get a proper write-up, but here’s the short answer. I added the tailscale0 interface as an unmanaged interface in the LuCI interface and made sure it was attached to the WAN firewall group. I am guessing this let the OpenWRT NAT rules do their thing!

What else can I do with my 32 gigabyte Tailscale USB drive?!

When I tested the viability of running Tailscale on a USB flash drive, I used a drive I had on hand. It was an extremely large drive in the physical sense. Once I knew it was working, I bought the smallest Sandisk Cruzer Fit that I could fine. It was 32 GB, which was nearly 32 GB more storage than I needed!

While I was redoing things this week, I decided that I should find a use for the rest of that space. I installed Ventoy and a whole mess of bootable disk images. Ventoy should let the drive boot on both UEFI and legacy BIOS systems. Ventoy’s installation script even had an option to leave some space on the end of the drive, so I added a little 512 megabyte ext3 partition for OpenWRT to use.

My little Ventoy drive has images for:

  • Memtest86
  • FreeDOS
  • Ubuntu 22.04 installer
  • Xubuntu 22.04 installer
  • Windows 10 installer
  • Windows 11 installer

None of this is terribly exciting. I only boot up a computer with a USB drive once every few years now, but did have to make several USB boot drives over the last few months. I had to reinstall Windows 10 a laptop with a dead NVMe. I had to install Xubunu 22.04 on my desktop when I upgraded to an NVMe. I had to run Memtest86 when I bought new RAM a few weeks ago.

I wish I thought to set this up sooner!

I should be carrying an identical bootable drive in my laptop bag, but I figure it can’t hurt to have spare boot images squirrelled away in my travel router’s USB port!

Conclusion

I think I made the correct choice by continuing to use the stock GL.iNet firmware on my Mango. If this were my permanent home router, it would be way more valuable having an extra 10 megabytes of flash for packages, but this isn’t my home firewall. This is a Swiss Army Knife that I keep in my laptop bag.

Being able to quickly configure the Mango to be a router using a wired connection, a router using WiFi, or a WiFi extender is so much more valuable in my laptop bag! Why can’t I do this easily with stock OpenWRT? Is there a package I don’t know about?!

How Much RAM Do You Need in 2022?

| Comments

I probably wouldn’t have given this much thought if I didn’t have a stick of RAM fail on me last year. I don’t know that I can remember another time when a stick of RAM that passed a long Memtest86+ test failed on me, and this was a total failure. The machine locked up and wouldn’t boot until I found and removed the bad stick.

Four sticks of RAM, One is Dead!

I couldn’t figure out whether I could do an advance RMA of my memory, and Corsair wanted me to RMA all four sticks as a set. I didn’t want to deal with downtime, and I didn’t want to buy RAM to hold me over while I waited, so I figured I’d limp along with this single-channel 24 GB configuration until it caused problems.

Running while short 8 GB really didn’t cause problems. Everything that I do fit pretty well into 24 GB of RAM. Even so, I bought a faster pair of inexpensive 16 GB DDR4-3200 DIMMs a few weeks ago, so I am back at 32 GB, back to a dual-channel configuration, and my slightly overclocked Ryzen 1600’s RAM is running at 2933 instead of 2666.

Some benchmarks are a quite a bit faster with dual-channel RAM, but I’m not noticing the extra 8 GB.

I’ve always bought as much RAM as possible

Within reason. There are always diminishing returns, but any extra RAM will be used for disk caching. For the last two or three decades, disks have been slow. Really slow. Especially when it comes to random access.

A 7200 RPM disk can perform between 100 and 200 random reads or writes per second. That was true for my 13 GB IBM Deskstar drives twenty years ago, and is true even for the latest 18 TB 7200 RPM drives today. The heads in a mechanical disk have to wait until the data they need passes underneath. Any given point on the disk only passes under the read head 120 times each second.

In those days, extra RAM was the only thing hiding those slow seek times.

1
2
3
4
5
pat@zaphod:~$ free -h                                                      130 
               total        used        free      shared  buff/cache   available
Mem:            31Gi       9.0Gi       1.6Gi       917Mi        20Gi        20Gi
Swap:           23Gi        93Mi        23Gi
pat@zaphod:~$ 

My memory usage today is definitely higher than it was a decade ago, but I have always bought enough RAM to make sure at least 50% would be used for disk cache.

My last two workstations have had 32 GB of RAM

I am doing my best to think back to all past desktop computers. The timeline is pretty fuzzy, but it seems like I approximately doubled the memory each time I have upgraded. I almost listed each machine off here, but that feels unnecessary.

My FX-8350 had 32 GB of RAM, which was double the RAM in [the giant HP laptop][lt] that it replaced. I put 32 GB into the Ryzen 1600 when I built it in 2017, and I left it at 32 GB when I replaced the RAM last month.

What’s different today?

NVMe and SSD drives are really fast!

We’ve needed to cache our disks with RAM for decades because our disks had been stuck at 200 random I/O operations per second. My first SSD could manage more than 5,000 I/O operations per second, and my new NVMe can handle 500,000 operations per second.

We don’t have to spackle over slow disks any longer. If I push my machine to the point where it only has a couple of gigabytes of free RAM to use as disk cache, it doesn’t matter. I won’t notice the difference.

While I am just sitting here writing this blog, my machine is using around nine gigabytes of memory for actual work. If I fire up a game, that will likely eat up another 10 or 12 gigabytes.

While I was limping along with only 24 GB in my machine, this was never a problem. Running a game might bring me down to just several gigabytes of RAM for disk cache, yet I didn’t notice any sort of slowdowns or stutters when I would switch back and forth between my game and productivity tasks.

My SATA SSD was fast enough!

I forgot to mention that the vast majority of the months where I was limping along with 24 GB of RAM happened before I upgraded to an NVMe. My 280 gigabyte per second Crucial SSD that could only manage a few tens of thousands of I/O operations per second was plenty fast enough for me to never notice when I as down to just a couple of gigabytes of memory available for caching.

In the old days before solid-state drives, this would never have worked out. In the days when I only had eight gigabytes of RAM, my workstation would have felt like it was struggling if I only had a gigabyte or two of free RAM available for caching. If I didn’t have half my RAM available for cache, I would have been shopping for a memory upgrade!

The future we are living in is fantastic.

Your mileage may vary!

I was getting by just fine with 24 gigabytes, and I bet I could just barely squeak by with just 16 gigabytes of memory, but I wouldn’t want to bother trying. I definitely wouldn’t want to give up dual-chanel RAM, but if it were possible to buy 12 gigabyte DIMMs, I might have enjoyed having a dual-channel 24-gigabyte setup!

I’m using a handful of gigabytes of memory for Firefox, Thunderbird, Discord, and various other programs. The stuff that is normally running eats up around 9 gigabytes or memory.

The heaviest things I run are Davinci Resolve and games, but never at the same time. I don’t have enough GPU memory for that.

gigabytes of ram meme

In the old days, I would have at least one or two virtual machines running on my workstation. Today, I have had a separate server in my office handling that job.

It used to be handy having my virtual machines on my laptop in the days when my only workstation was my laptop. It was awesome having everything with me at home, at the office, or on the road.

I get more value today having those virtual machines on a dedicated box. I don’t want Home Assistant or Octoprint to reboot just because my Nvidia driver goobers things up forcing me to reboot my desktop!

Besides which, it isn’t 2008 anymore. I don’t have to hope a coffee shop has WiFi. I don’t have to wiggle through an ssh tunnel to get to my data at home or in the office. I can share my phone’s 500-megabit Internet connection and connect to my machines around the world using Tailscale, and they’ll work just like they do when I’m at home.

You might need more memory for those tasks that I don’t have!

If you’re running a mess of virtual machines on your workstation, then you probably already know how much memory you need for those to comfortably fit. If the VM disk images live on an SSD or NVMe, maybe those machines don’t need as much RAM allocated as you think they do.

Those virtual machines are still computers, even if they are sharing processors and disks. Just like my desktop PC, our virtual machines don’t need to rely on cache memory nearly as heavily as they did before we had NVMe drives. The old rules of thumb from the days of slow mechanical disks just don’t apply anymore.

If you’re running make -j 32 on your 16-core Ryzen 5950X, you know you might need a lot of memory just to support all those compiler tasks, but it almost definitely isn’t a big deal if your whole source tree doesn’t stay cached all day. Your NVMe can touch hundreds of thousands of files every second without breaking a sweat!

Is swapping to an NVMe fast?!

I spent a week unscientifically messing with various swap and dirty page settings. I figured that Apple must be leaning on fast NVMe swap and paging to make their 8 GB M1 MacBook Air a usable machine. If they can do it, maybe I can force Linux to dip deeper into swap.

I was able to get about 5 or 6 GB onto my swap partition. When I did, things usually acted just fine. I couldn’t even tell you that my machine was swapping.

Every once in a while, though, things would get really goofy. The whole machine would just grind to a halt without much warning. I never timed it, but if I walk away, it would usually have worked itself out by the time I got back.

There’s probably somewhere between my current settings and the problematic settings that would work alright. Maybe the defaults would push me a few gigabytes into swap if I disabled half my RAM.

This was fun to experiment with a bit, but not worth spending a real amount of time working on.

Conclusion

It is better to err on the side of caution. If you need to round your memory requirements to the nearest pair of sticks of RAM, you should definitely round up instead of down. If you’re like me, and you think you can get by with 24 GB of RAM, then you had better buy 32 GB!

For decades I always made the same choice. If you asked me to choose between more memory or faster memory, I would always choose more memory. It wasn’t always a problem you could spend your way out of. Sometimes DIMMs with double the capacity were only available in slower speeds. Sometimes your chipset only supports faster RAM speeds with two DIMMs and not four.

My FX-8350 build in 2013 had 32 GB of RAM. My Ryzen 1600 build from 2017 has 32 GB of RAM. If I upgrade to a Ryzen 7800X, it will also have 32 GB of RAM. Before my FX-8350, every major computer upgrade I have gone through has at least doubled my RAM. This feels weird, but it is also amazing and awesome!