Trying Out Tailscale Funnel and Tailscale’s New Proxy

| Comments

I am excited about the new Tailscale Funnel feature. I have been asking for an option like this ever since the very first time anyone at Tailscale asked me for my opinion. What the folks at Tailscale have done isn’t exactly what I asked for. Funnels are better than the feature I was envisioning, and they’re only worse in a few minor ways.

My wish was to be able to click a button in the Tailscale admin console to have an IPv4 address assigned to a machine on my Tailnet. The use case floating in my mind was that I could leave a clone of my Digital Ocean droplet running on my homelab server. If something went sideways with Digital Ocean, I could immediately point Cloudflare away from the old server and toward my new Tailscale IP address and have all our blogs back up and on the Internet with very little effort.

Tailscale has done a good job with their Funnel. I only need to touch the admin console once to set up the ACLs to allow my machines to create funnels. Everything else is handled on the nodes that want traffic to be funneled their way. That’s awesome!

How is this different than the feature I’ve been patiently hoping for over the last year or two? Tailscale Funnel can only tunnel TLS connections. It can forward traffic to a web server on your Tailnet just fine, but I don’t expect you can use a Funnel to let the entire world connect to your Minecraft server.

I am OK with that.

What functionality did I replace (and upgrade!) with Tailscale Funnel?

Brian Moses and I have some rather simple infrastructure set up so we can both push new blog posts to butterwhat.com, and they will be automatically published. The blog posts are stored in a private repository on Gitlab.com. I have a virtual machine here on my homelab server that regularly pulls the latest changes down and publishes if there are any changes.

That virtual server also runs continuous previews of the jekyll blog so that Brian can see what he’s working on.

Instead of connecting to Gitlab every two minutes looking for changes, I figured I could configure a webhook on that repository and Gitlab could just tell me when there is a change.

Setting up the Funnel and webhook was easy!

I have never done any of this before. I was chatting about my progress on our Discord server, and it sure looks like it took me less than 30 minutes. That even includes tearing down most of what I saw in the webhook tutorial I was following so I could reconfigure things so the webhook server wouldn’t run as root. I also goofed up and accidentally configured the webhook server to listen for Github instead of Gitlab, and I had to spend a few minutes scratching my head before I noticed.

Aside from the Tailscale funnel, the star of the show is the little webhook server available in Ubuntu’s apt repositories. I didn’t really know how I was going to listen for Gitlab’s connection and kick off a shell script. The answer wound up being quite simple!

I followed a nice tutorial on how to use Gitlab webhooks. The tutorial tells you how to install and configure webhook, how to configure webhook for Github, and how to configure Github to call your webhook. There are links to sample configurations for Gitlab as well.

Tailscale’s documentation is always fantastic, but the documentation for the Funnel is a little sparse. That is OK. Funnel is a new feature, and it isn’t even a production feature.

To use a funnel, you need to enable HTTPS certificates in your Tailscale admin console, allow funnels in your ACLs, and then turn on the funnel at the client. It was the second step there that I wasn’t sure about. I was wondering if I could enable funnels based on ACL tags, and it turns out that you can.

1
2
3
4
5
6
"nodeAttrs": [
  {
      "target": ["tag:blogdev"],
      "attr":   ["funnel"],
  },
],

The webhook server defaults to listening on port 9000. I had to run two commands to pass that through to the funnel:

1
2
root@butterwhat:~# tailscale serve / proxy 9000
root@butterwhat:~# tailscale serve funnel on

I was going to leave out my webhook config file. I just pasted in the sample from the documentation and edited a few lines. I may as well include it, so here it is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[
  {
    "id": "butterwhat-gitlab",
    "execute-command": "/home/pat/bin/webhook.sh",
    "command-working-directory": "/home/pat",
    "pass-arguments-to-command":
    [
      {
        "source": "payload",
        "name": "user_name"
      }
    ],
    "response-message": "Executing redeploy script",
    "trigger-rule":
    {
      "match":
      {
        "type": "value",
        "value": "footballislifebutalsodeath",
        "parameter":
        {
          "source": "header",
          "name": "X-Gitlab-Token"
        }
      }
    }
  }
]

It has been slightly modified to protect my little webhook server.

Did I do a good job with my blog publishing script updates?

Not really! I did set things up so that the pull-and-publish job only runs one process at a time. The bad news is that the wrong thing will happen if we manage to push another commit before the job finishes. We are a small team. The odds of this happening are not exactly high.

When I started setting this up, I assumed I would be able to completely eliminate the publish loop. Then I remembered that the script is smart enough to only publish blog posts with timestamps in the past. The loop still needs to run to eventually publish blog posts in the future.

Instead of pulling from Gitlab and publishing every three minutes, it is now doing so only once an hour. That seems much more reasonable, though I am certain I can better optimize this.

The webhook over the funnel works great, but it is the wrong solution for us!

Our blog-publishing infrastructure seems out of place today. It is not only possible to have Github or Gitlab publish our Jekyll and Octopress blogs directly every time we push to the appropriate repo. It was possible to do this when Butter, What?! was born, but we had a virtual machine publishing previews of upcoming posts anyway. Making that machine rsync to our web server didn’t add much in the way of friction.

Brian Moses and I set up our collaboration on Butter, What?! before we heard of Tailscale, and possibly even before Tailscale released a product. We used a private repo on Gitlab so that we would both easily have access to the Markdown files.

Butter What on Coffee

I am excited about trying out Tailscale Funnel today to streamline our process, but it also made me realize that using Gitlab for this is an artifact left over from the days when we didn’t use Tailscale.

The Butter, What?! test server is on my Tailnet, and I already share it with Brian. He could just as easily be pushing directly to a repo on that server. We don’t need to host anything on the Internet. We don’t need to be punching a hole to our development box for Gitlab to send us a notification.

Everyone at Butter, What?! is all-in on Tailscale. I don’t like to redo things that are already working, but the idea of moving the blog repo onto our Tailnet is in my mind now.

I might be most excited about Tailscale serve

The new serve command is awesome. It seems to be new, and it looks like the whole thing arrived with the Funnel update.

I am not entirely certain of everything you can accomplish with this new TCP forwarding and proxy system. It seems to have plenty of seemingly artificial limits so far, but even so, it sure looks like it will come in handy. Right now you can’t point the proxy at a different machine on the local network. It can only proxy localhost.

You can use the proxy to wrap an HTTP server on the Tailscale node in TLS. This seems neat. You won’t have to figure out how to correctly install your certificates after running tailscale cert. As far as I can tell, the Tailscale proxy will just use the generated certificates.

Why is this a big deal? I already know how to set up my certificates with nginx or apache. Why would I need Tailscale to do it for me?

Octoprint doesn’t run as root, so it can’t run its embedded web server on port 80 or port 443. I also have no idea what web server software it uses or how to configure it. I can just run tailscale cert octoprint.humpback-rooster.ts.net then tailscale serve / proxy 5000 on my Octoprint server, and it will wrap my Octoprint instance up in some sweet encrypted TLS.

I don’t need to know how my appliances configure their web servers. I don’t have to worry about an update wiping out my TLS config changes. I can just wrap them in TLS without installing any extra software on the node, and I think that is awesome. I can set up that proxy in less time than it would take me to figure out that Octoprint isn’t using nginx or apache.

Forget the funnels. The real gem that was made available to me this week is the proxy built into tailscale serve!

Conclusion?!

I am excited. Did I already say that? I really am. Tailscale just keeps making my life easier. Tailscale makes it easier to keep all my machines connected and safely off the public Internet. Then Tailscale made it really easy to let friends and associates connect to my private servers. Then Tailscale started managing 90% of my SSH keys for me.

Now they’re letting me poke tiny holes in my mesh firewall and even stick a TLS proxy in front of both my public and private web servers. This is all fantastic stuff, and I expect Tailscale will just continue to become even more awesome and convenient!

Can I Use Octoprint with a Networked Serial Port?

| Comments

The short answer is yes! Holy crap! I have my Prusa MK3S plugged into the USB port on my oldest OpenWRT router, and I am running Octoprint in a virtual machine on my homelab server. I have run five print jobs for a total of around five hours of print time, and I even slept through the night before starting the later jobs.

Everything is working great. Almost everything.

This is not a tutorial!

I don’t think I am doing this correctly. I am using socat on each end to move the serial port data across the network, and it is doing an absolutely fantastic job when the connection is active.

Hacker 3 Stable Diffusion

If the Octoprint server’s socat process dies, the socat process on the OpenWRT box just keeps on running, and it won’t let a new socat process connect. That means if I reboot the Octoprint server, I will have to ssh to the OpenWRT end to restart socat.

The man page for socat is a mile long. It is likely that I am missing something obvious. If I ever do find it, that might be the time for a tutorial.

Why on Earth would I run Octoprint this way?

The majority of Octoprint users run Octoprint on a Raspberry Pi. I have never done this here at home. When I bought my first 3D printer, I already had my little virtual machine host sitting next to the table where the printer was going to live. No reason to add a Raspberry Pi to the mix when I can just spin up a fresh VM for Octoprint.

I am trying to make my home office quieter for recording podcasts, so I have been considering the idea of moving my homelab server into the room with my network cupboard. How can I move my Octoprint server to the opposite side of the house?!

I could buy a Raspberry Pi, but they’re pretty expensive and hard to come by. I also thought about trying Octo4a on an old Android phone, but I’d need to buy cabling to let me charge the phone and plug it into my Prusa MK3S at the same time.

Then I remembered that I now have a spare OpenWRT router left over from my WiFi upgrade shenanigans, and it has a USB port. I was hoping I could add some extra gigabit switch ports to my office, gain the use of an extra WiFi access point to roam to, and extend my Octoprint server’s serial port all the way across the house.

The old D-Link tube router that I am using is roughly equivalent to the GL.iNet Mango OpenWRT travel router that I carry in my laptop bag. They have similar CPU, RAM, and storage specs, though the Mango lacks a 5.8 GHz radio and only has a pair of 10/100 Ethernet ports. It is nice to know that if I didn’t already have the D-Link ready to go ,that I could have bought another Mango for about $20.

Should anyone else do this?

Do you know what is awesome about the OctoPi disk images for the Raspberry Pi? You don’t need to be proficient with Linux, command lines, or shell scripts to get it up and running. You follow a short tutorial, plug a micro SD card into your Pi, and you’re off to the races.

I’ve been running Linux at home since the mid-nineties, and I was doing professional Linux work by the late nineties. Getting this up and printing didn’t take much more than ten minutes.

My Prusa MK3S and OpenWRT Serial Port Extender

Getting my init scripts to do the right thing on the router took longer than I’d like to admit. I needed to make sure socat didn’t try to start before Tailscale was downloaded and running, and I needed to make sure socat would restart itself if things went weird.

The number of times I had a typo in a path or accidentally left a wrong socat option on a command line was ridiculously high, and I have to wait two minutes every time I rebooted the router for a test. Those two-minute waits add up fast!

Is this a workout for the OpenWRT router from a decade ago?!

Not in the slightest. My old 16 MHz 80386SX in the early nineties didn’t even break a sweat transferring data over a modem at 115,200 baud. Sure, the fastest modem I had on that machine was a 14,400 baud, but it could go quite a lot faster when compressing text with v.42bis.

The MIPS processor in my D-Link DIR-860L is orders of magnitude faster. We could probably hang several printers off the back of this router if we used a USB hub, but I only have the one 3D printer.

socat on the D-Link doesn’t even register as using any CPU according to top.

Could I put a camera on the OpenWRT router?!

Maybe, but I don’t even want to try. I haven’t bothered with having a camera on my 3D printer in years. I don’t use Octolapse. I just print stuff.

You can connect a UVC webcam to an OpenWRT router, and there’s even an mjpeg-streamer package available. I don’t know how much horsepower you need to run mjpeg-streamer on a MIPS router, but it should be easy to point Octoprint and mjpeg-streamer on the router. If it works.

I would be trying this out today if I knew where my spare USB hub went to!

How long can your virtual serial cable be?!

I had to be silly, so I figured I should pretend to be one of those hackers from the movies who bounces their connection through all sorts of cities and satellites. I don’t have any satellites, and I don’t have all that many physical locations on my Tailscale network, but I pushed things as far as I could!

I added two more nodes to my socat loop. I ran a 3D printing job from Octoprint on my homelab server in Plano to a Digital Ocean droplet in New York. That node pointed its instance of socat at my Raspberry Pi at Brian Moses’s house, and that one finished the loop by pointing to my OpenWRT router here in Plano, TX.

What should have been a 13-minute print job took about an hour. My virtual serial cable would have been something like 3,000 miles long. Octoprint was about 80 ms away from the Prusa MK3S. Math says that we may have literally added 80 ms of print time to each of the 34,000 lines of gcode. How cool is that?!

The printer stuttered a lot, and the print was extremely ugly on account of all the short pauses. There were no smooth printing moves.

Was it slow and stuttering because of flow control on the serial port? Do you think finding a way to disable flow control in Octoprint would help with this symptom? Do you think the problem was that I added in an extra hop? I imagine things would be a lot better if I were running Octoprint in New York and printing in Plano with only a single socat connection between them. That would at least cut the latency in half!

This was silly. There’s probably no good reason to do this other than to say I 3D-printed an object over a 3,000-mile serial cable.

How do you make socat work?

I’ll say it one more time. I don’t think I am doing this quite correctly. These commands are working exceedingly well as long as I don’t have to reboot anything. I am pretty sure that if the power went out, everything would start up and connect correctly, but if I reboot one half, I almost definitely have to go in and restart the other machine’s socat process.

This is the script I am running on the OpenWRT router:

1
2
3
4
5
while true;
  do
   /usr/bin/socat -d -d /dev/ttyACM0,b115200,raw,echo=0 TCP-LISTEN:9998,bind=100.84.13.80,fork
  sleep 5;
done

You can leave out the bind option. I am using that to make sure socat is only listening on my Tailscale interface.

My socat process will error out over and over again until Tailscale starts. I thought about adding some logic to wait for the tailscale0 interface to show up before firing up socat, but I want socat restarting if it dies. I figured it’d be OK to be lazy and kill two birds with just the one loop.

Here is the script on my Octoprint server:

1
2
3
4
5
6
sleep 20

while true; do 
  socat TCP:100.84.13.80:9998 PTY,link=/dev/ttyACM1,raw,crnl,mode=666; 
  sleep 2;
done

I have forgotten why there is a 20-second sleep at the beginning of this script. Was I working around a problem, or was I hoping this would cover up a problem that doesn’t even exist anymore? I would comment that out and reboot everything to test it out for you, but my printer is in the middle of a job!

UPDATE: Maybe this is less fragile than I think it is!

I moved my homelab server out of my office last night. I don’t have a shelf for it yet, the cables are all just running out of the network cupboard’s doors, and the machine is just sitting on top of a cardboard box to keep it off the floor. We wouldn’t want a leaky washing machine getting my server wet!

It took me long enough to power the server down, unplug everything from the UPS, and finagle the UPS out from behind my desk that by the time I got the server booted up and the Octoprint virtual machine going, the socat process in the Octoprint’s init script was able to connect to the OpenWRT router without a problem.

I don’t know how long it takes for the socat server to time out, but it definitely times out for me in some number of minutes. That means that as long as I am not too efficient in restarting my Octoprint server, I shouldn’t have to restart the socat server manually.

Conclusion

I am happy enough with how this all worked out. When it is working, it sure seems to work absolutely perfectly. I no longer have any reason to keep my homelab server in my office, and I am super excited that I will be able to eliminate the noise of four extra hard drives constantly seeking. Not only that, but I’ve added an extra 450 megabit of WiFi bandwidth to this corner of my house.

I will do my best to remember to report back in a few months if things manage to keep working this smoothly. If there’s a problem, I’ll be sure to report back immediately!

Is It Safe to Use a Big, Honking USB Hard Drive on Your Raspberry Pi Server?

| Comments

The short answer seems to be yes, but why did I decide to write about this question today?!

Through the magic of Tailscale, my Seafile server, a Dropbox-like cloud-storage server, has been running remotely on a Raspberry Pi in Brian Moses’s home office for 22 months. The Pi boots off of a cheap micro SD card with all logging turned off, and Seafile stores all its data on a 14 TB Seagate USB hard drive.

Last week I had a scare! The ext4 filesystem on my USB hard drive set itself to read-only, and there were errors in the kernel’s ring buffer. Did my big storage drive fail prematurely?!

What happened?!

We may never know. I rebooted my little server, and then I ran an fsck. When that finished, I mounted the LUKS encrypted filesystem on the 14 TB USB hard drive, fired Seafile back up, and everything was happy. I haven’t seen another error in 13 days.

dmesg output

What do you think might have gone wrong? Is my USB cable flaky? Did the USB-to-SATA hardware in the USB drive get into a weird state? Did my little friend Zoe run through Brian’s office and bang into the cabinet and jostle the USB cable enough at just the right time to generate an error?

This is the first error I’ve seen in 22 months outside of a couple of random power outages.

This is good enough for me to say that it is safe to run a USB hard drive on my Raspberry Pi server. Maybe that’s not good enough for you. If not, then I hope this is at least a useful data point in your research.

The power outages all wind up being long Seafile outages, because I have to SSH in to punch in the passphrase to mount the encrypted filesystem before Seafile can start. Not only that, but I have to notice that Seafile is down before I can spend two minutes firing it back up again!

I am pleased that my USB hard drive didn’t fail!

I have been comparing my Dropbox-style server to Google Drive’s pricing. Google sells 2 TB of cloud sync storage for $100 per year, and I paid about $300 for my 14 TB drive and my Raspberry Pi. I am getting my off-site colocation for free by storing my Pi at Brian Moses’s house on his 1-gigabit fiber connection. You can just barely see my Pi server in the background of our Butter, What?! Show episodes!

I have done a bad job of properly keeping track of how much money I haven’t paid Google Drive or Dropbox. When I started out, I would have needed to rent 4 TB of storage. At some point during the first year, I would have had to add another 2 TB, and then I would have had to add another 2 TB at some point during this year.

I am currently using 6.8 TB out of a possible 14 TB.

Is it OK if I simplify the payments I may have had to pay to Google Drive? How about we just say I would have spent $200 last year, $400 this year, and I would be spending another $400 in two months? I expect that’s underestimating by something around $100.

I have already paid for all the hardware, and I would still be at least $50 ahead even if I had to order a replacement 14 TB drive last week.

That would have been fine, but I would have been bummed out if I had to do the work of resyncing my data to my new Seafile server. I also might have to set up accounts for the friends I share work with, and they might have to get data synced back up. That would all be a bummer.

I don’t account for the cost of labor in my “savings”

I know what I used to charge in the short spans of time when I did hourly consulting. I understand that my time is valuable, and that value almost certainly exceeds the cost of a few years of Google Drive storage.

There are quite a number of advantages to what I am doing, and they are very difficult to put a price on.

  • I own all my data
  • My server is only accessible from my Tailscale network
  • Everything except Tailscale is blocked on the LAN port
  • My hard disk is LUKS encrypted
  • My Seafile libraries are encrypted client side
  • I don’t get stopped to pay more at every 2 TB

Aside from all of this, it is a bit disingenuous for me to imply that getting my Seafile setup running requires time and effort while setting up Google Drive or Dropbox requires none. I have my data split up into a dozen Seafile libraries, and I have no idea what the equivalent functionality would look like with Dropbox or Drive. I’d still have to make sure that the correct data is synced to the correct devices.

Even though getting Seafile running on a Pi was more persnickety and time consuming than I anticipated, I still spent the majority of the time setting up clients on all my machines and syncing the correct data to each one.

I didn’t intend for this to be an update on the Seafile Pi with Tailscale

Even though this wasn’t my intention, this seemed like a good time for an update, since the server has been in operation for nearly two years. I am quite pleased with how things have been working.

A few months ago, I started booting Raspbian’s 64-bit kernel, and I replaced the 32-bit Tailscale binaries with 64-bit static binaries. That increased my Tailscale throughput from 70 megabits per second to just over 200 megabits per second.

I also swapped out the 4 GB Pi with a 2 GB model. They’re both identical except for the RAM, but the 2 GB Pi will not negotiate gigabit Ethernet. It is stuck at 100 megabit, so I have given up my some of my Tailscale encryption speed boost!

Would I do this with a Raspberry Pi today?

Not a chance! The hardware is working just fine, but Raspberry Pi boards are never in stock. When they are in stock, they are overpriced.

My friend Brian Moses recently grabbed a Beelink box for $140 from Amazon. It has a Celeron N5095, 8 GB of RAM, and a 256 GB SSD.

You get so much more with the Beelink! There’s a real SSD inside. The CPU is several times faster, and so is the GPU. You get full-size HDMI ports. You get 802.11ac. The Beelink isn’t just a bare board. It comes in a nice case, and it comes with a power supply!

The Raspberry Pi 4 is more than enough for my purposes, and I would gladly use one for this purpose again if they were still under $50.

One of the neat things about the Beelink boxes is that you can get a lot more horsepower than the Celeron version that Brian bought. You can also get a Beelink with 16 GB RAM and a 6-core Ryzen 5560U that is about three times faster. One of our friends on our Discord server bought one of those when it was on sale for only $300.

Conclusion

I am relieved that my unexpected and unexplained SATA error didn’t wind up being a drive failure, and I am super excited that the minor gamble that is this entire Seafile Pi experiment is paying off. I have invested less than $300 in hardware, and in two months I will have not had to pay Google $1,000 or Dropbox $1,200 for cloud storage.

Even if we assign me a rather high hourly rate, I am confident that I have crossed the point where I am saving money. Isn’t that awesome?!

Enabling WiFi Fast Transition Between Access Points with OpenWRT

| Comments

This isn’t a tutorial. The steps to enable 802.11r on two or more access points running any reasonably recent release of OpenWRT are pretty simple. I will spell those steps out shortly, but there won’t be screenshots walking you through the process. I am writing today to document what I have done, what I might do next, and how things are working out so far.

As I fact check this post, it just keeps getting longer and longer. So much longer than I expected! It feels like every time I look up specifics to make sure my memories of various events are reasonably accurate I wind up finding something that I had an incorrect understanding or memory of.

Everything in here should be pretty accurate now, but I am definitely not an expert when it comes to any of the newer WiFi roaming specifications of technologies. This is just my experience while messing around with this stuff over the last week or so.

Why is Pat monkeying with his WiFi setup?!

I have not been entirely unhappy with the WiFi signal around my house for the last few years, so how did we get here? Why have I upgraded things?

I was at Brian Moses’s house a couple of weeks ago. He’d just replaced one of his OpenWRT access points, a TP-Link Archer A7 WiFi 5 router, with a GL.iNet WiFi 6 router. That got my gears turning. I was thinking I could use Brian’s old router to solve my poor connectivity issues with my Raspberry Pi Zero W on my CNC machine in the garage. It’d get me a switch port to plug the laptop into as well, so I asked him how much he wanted for it.

TP-Link Archer A7 v5 with OpenWRT

The next week he sent me home with two of his old OpenWRT-compatible routers. That TP-Link and [a Linksys WRT3200ACM][wrt3200acm]. Now that I had an overabundance of WiFi routers, I figured it was time to reengineer my home network and eliminate my only access point that can’t run OpenWRT.

NOTE: You shouldn’t buy the TP-Link or Linksys router. The Linksys seems terribly overpriced, and it feels like the TP-Link should costs less, too. You can get two better equipped WiFi 6 routers from GL.iNet for the price of the Linksys WRT3200ACM WiFi 5 router, and the GL.iNet routers ship from the factory with OpenWRT.

The layout of our house

There are much bigger homes in our neighborhood, but the homes with significantly more square footage will have a second story. Our house is shaped like the letter U, and Zillow claims our house is around 2,200 square feet. I also need to cover the garage which adds another 500 square feet. Let’s just say I need to cover 2,500 square feet with WiFi, and reaching out into the yard wouldn’t be a bonus!

My house and its wifi access points

I have two access points, and each access point has a separate SSID on each radio. Chicanery2.4 and Chicanery5 are on the gateway router that lives in the network cupboard. Shenanigans2.4 and Shenanigans5 are on an access point in the living room. Both devices are D-Link DIR-860L routers. Both have faster WiFi when running D-Link firmware, but the one in the living room can’t even run OpenWRT.

Shenanigans is named after Farva’s favorite restaurant, and Chicanery is named after its Canadian equivalent from the sequel.

This is how things were configured last week.

Why not just put the same SSID on every access point?

We can call this my poor man’s WiFi steering. The access point in the living room does a good job reaching the entire house, but there are some important devices that are close to the network cupboard.

There is a television in the room opposite the network cupboard, and it is connected to chincanery5, and the television in the living room is also connected to chicanery5.

Why is the TV in the living room not connected to the access point in the living room? It is close enough to the cupboard to get a good connection, so there is no reason to waste 20 or 30 megabit on the house’s primary access point.

These devices are still connected to the access point in the cupboard today.

What have I changed?

I wound up putting the Linksys WRT3200ACM in the cupboard as our new Internet gateway and our upgraded chicanery access point. I did some tests, and this router can easily manage routing and NAT at 920 megabits per second. That’s only about 20 megabits shy of the maximum speed of gigabit Ethernet, so we won’t have to worry about swapping routers if we upgrade the speed of our FiOS service.

I put the TP-Link Archer A5 in the living room. Both routers are running the latest release of OpenWRT, and they have all their old SSID settings copied over.

I also added a new SSID called kerchow to every radio. I set up 802.11r fast transition on that SSID for nearly instantaneous roaming.

How do you enable 802.11r roaming in OpenWRT?

I gave every radio that I wanted participating in the roaming the same SSID and WPA2 key. I also had to choose a mobility domain for my roaming group. Then I had to:

  • Tick the checkbox on that interface to enable 802.11r
  • Enter my mobility domain
  • Choose FT over the Air for the FT protocol
  • Set DTIM Interval to 3 (supposedly helps iOS devices roam)

The first two are the only steps that are truly required. Lots of advice on the Internet recommended using FT over the Air, but I haven’t tested both options. They said to do it, it seems to work well, so I am sticking to it.

I have no iOS devices, so I have no idea about the DTIM Interval.

How do I know if 802.11r is working?

I am not entirely sure that I have tested things correctly, but I am mostly confident that I’ve tested well.

I used ssh to connect to both routers and ran logread -f to follow the logs to see when machines were connecting to WiFi. I had the OpenWRT LuCI GUI open in two browser tabs, and I would go in and click the disconnect button for my laptop.

Stable Diffusion Hacker

I would immediately see output in one of the logs saying the laptop reconnected. I had a ping running on the laptop the entire time, and I never missed a ping. Sometimes one or two responses would bump up to 200 or 300 milliseconds, but they always made it.

This seems like a success to me!

How do you know if your 802.11r setup is working well?!

I had a glitch. My Windows 11 laptop very much preferred 2.4 GHz connections. Once it picked a 2.4 GHz radio, it would always connect to another 2.4 GHz radio after being disconnected. I had to turn WiFi off and on again on the laptop to get it to connect to 5.8 GHz again.

This is a bummer, because the 5.8 GHz radios are about four times faster. This happens because 802.11r doesn’t care about bandwidth. It only cares about signal strength. A 2.4 GHz signal has a much easier time penetrating walls than a 5.8 GHz signal, so the slower radio often has a better radio signal even if throughput will be lower.

I solved this problem in a simple, ham-fisted, but very effective way. I now have three SSIDs and three different mobility domains.

I have kerchow with a mobility domain of beef on every single radio. I have kerchow2.4 with a mobility domain of b0b0 on the 2.4 GHz radios, and I have kerchow5 with a mobility domain of b00b on the 5.8 GHz radios.

I wish I discovered both DAWN and usteer before setting this up.

I added a GL.iNet WiFi 6 router to the mix

I have a GL.iNet GL-AXT1800 travel router here, and I was curious if the 802.11r configuration would even work on this hardware. These particular routers have radio chips that aren’t yet supported by OpenWRT, so they are shipped with an oddball proprietary Qualcomm fork of OpenWRT.

Brian said his GL.iNet WiFi 6 router got pretty goofed up when he configured the WiFi through LuCI instead through GL.iNet’s interface, but it did let him add additional SSIDs to each interface through the LuCI GUI. I was worried that other things like 802.11r wouldn’t work.

GL.iNet Mango and AXT1800 on my desk

I ran out of Farva-related restaurants, so I put the SSID of slammin on my GL-AXT1800. Then I added the three kerchow SSIDs with the correct mobility domains.

I don’t really have a good location for a third access point. I just found an open network jack near a power outlet that is about half way between the other two access points. My laptop rarely decides to roam to this access point. I had to click disconnect in the OpenWRT GUI so many times, but it did eventually connect right up to the WiFi 6 router.

As far as I can tell, 802.11r fast transition does work with the oddball OpenWRT firmware.

What about DAWN and usteer?

I am bummed out about this. A few weeks ago I learned the name of one of these packages, but I didn’t remember to make a note of it. I tried finding it several times over the last week, but I failed miserably.

Do you know when I managed to find one of them? Minutes after I set up nine different variations of kerchow on three different routers.

I don’t have a lot to say about either one at this point. They are both available as OpenWRT packages. They are tools that help you configure how devices roam between your WiFi access points.

Both seem fiddly and complicated, and success seems to involved a lot of trial and error. I’d be happy if either package could easily manage to accomplish two things.

I’d like to be able to eliminate two of the three kerchow variations. It sounds like configuring one of these tools to make your 2.4 GHz less appealing only requires a few lines of configuration. That would be awesome.

Why am I waiting before trying DAWN or usteer?!

Both require upgrading the default wpad-basic OpenWRT package with the more feature complete version. I’m sure this will go smoothly, but I am just not ready to try something that might require that much effort!

If I were just monkeying around with a few test routers, this would be fine. That’s not what I am doing. These are the routers running my home network. I need them to work, and they are working right now.

NOTE: I might be confused about this. My fresh installs of OpenWRT 22.03 all have wpad-basic installed, and the description for this package claims it supports 802.11r. I am guessing that wpad-basic has replaced wpad-mini which lacks this support. I am not certain if wpad-basic has the features required for DAWN or usteer, but I am suspecting that it will work fine!

Your setup is probably more complicated than mine!

I have learned that I can cover my entire house with acceptable WiFi speeds using a single access point. Sure, the speed drops to 20 or 30 megabits per second in the bathroom and garage, but that’s more than enough to watch YouTube videos. Anywhere with a desk can manage 200 megabits per second.

I don’t truly need multiple access points with fast roaming, but maybe you actually do! If you do, I imagine things get more complicated!

Stable Diffusion hacker

My access points have their radios cranked up to maximum output. If you’re trying to cover a slightly larger area with two access points, you probably don’t want to be blasting quite so hard. You want your devices to notice that one access point is weaker than the other so they can switch over.

I am also cheating because there is a gigabit Ethernet port at every desk in the house. I only need just enough WiFi reaching every comfy chair for my phone and laptop to at most play YouTube videos, and I need a couple of dozen megabits to get 4K video streaming to two televisions. Things get more complicated if you really need a solid 200 megabit wireless connection at specific workstations in the house.

Why does Pat keep using numbers like 200 megabit?! Can’t modern WiFi do gigabit?!

I’ve been doing iperf tests to various access points from all sorts of distances and different rooms all week. The fastest WiFi 6 router I have access to has managed to pull numbers really close to 700 megabit, but that only happens in the same room, and it only happens some of the time. I am way more likely to see 400 megabits per second while in the same room.

Once you start putting a couple of walls in between your devices, these numbers drop. It seems like 250 megabits per second is about what I can expect to see from a about one room away from the access point. The signal has to either bounce around corners through doorways, or has to go through the pair of walls that make up the hallway.

The fastest advertised WiFi speeds are only possible under ideal conditions.

Something that surprised me!

First of all, if you are buying a fresh set of OpenWRT routers for your project, you should probably be buying GL.iNet routers that ship from the factory with OpenWRT. At last that way you know what you’re going to get. One of my old D-Link tube routers isn’t running OpenWRT because I had no way to know which revision of the hardware I was going to get before it arrived.

Not only that, but many WiFi routers that can run OpenWRT have radio chipsets that don’t run as well with the open source drivers. My older revision D-Link router’s WiFi is about 50% faster than the newer revision with OpenWRT.

I was handed a box of free OpenWRT-compatible WiFi hardware. I already knew the hardware would work. If this didn’t all happen accidentally, I would have bought GL.iNet routers instead.

I guess I should get to the weird part. The Linksys WRT3200ACM can be set to 200 mW on 5.8 GHz and 1,000 mW on 2.4 GHz. The TP-Link Archer is the opposite! It can be set to 1,000 mW on 5.8 GHz and 200 mW on 2.4 GHz!

Does the OpenWRT driver really know how much power the radio is going to put out? Is it anywhere near correct? We have no idea. I never trust numbers like this.

Radio is weird. You have to quadruple the transmit power to double your range. I have no way to properly verify just how much power the Archer A7 is pumping out, but the WiFi analyzer seems to think the TP-Link 5.8 GHz signal is quite a bit stronger than the WRT3200ACM!

Conclusion

I don’t know if there is a conclusion. I set up 802.11r. I pointed a few of my WiFi devices at the 802.11r SSIDs. I think I am at the point where I wait and see how things work out.

Maybe the conclusion is that 802.11r was really easy to set up on my OpenWRT hardware. It isn’t much more work than checking a box, though it does get a bit more fiddly when you are trying to put the correct mobility domain on three different SSIDs on three different routers. I know I goofed at least one up on my first attempt!

OpenWRT, Two GL.Inet Routers, and Tailscale: Successes and Failures

| Comments

This blog isn’t going to lead to some keen piece of insight or an interesting conclusion. I’ve been messing around with a pair of very different OpenWRT routers from GL.iNet: the GL-MT300N-V2 Mango and the GL-AXT1800 Slate AX.

I learned some things I didn’t know about last week. I learned just a little more about Tailscale. This is mostly just to write down what has worked and what hasn’t worked for me.

GL.iNet Mango and Slate AX 1800

I figured I should write some of these things down both for future me and for anyone else who might be trying to accomplish something similar. Mostly for future me.

How did we get here?

A friend of ours is moving to Ireland. He just wants to watch Jeopardy on his Hulu Live TV subscription. He’s been watching Brian and I monkey with Tailscale on our GL.iNet Mango routers for ages. I know our friend has had his Mango for a while, quite possibly for about as long as Brian and I have had our Mangos, but now he’s been trying to use it to connect his Apple TV to an exit node in the United States.

The story is more complicated than this, but the Mango didn’t work out for him, so he tried the much beefier GL.iNet Slate AX, and he couldn’t make that work. That’s how I wound up having a Slate AX here in my possession. He is currently using a Raspberry Pi 4 with Tailscale as a router to forward his Apple TV to America.

I am able to get my tiny Mango to route traffic of connected devices through one of my Tailscale exit nodes. I am not able to do the same with the much nicer Slate AX router.

Who is Pat blaming?!

I don’t think anyone needs to accept the blame here. I don’t believe that running Tailscale on an OpenWRT router isn’t officially supported. Running Tailscale on the Mango is a bit of a hack because the Mango doesn’t have enough storage to even hold Tailscale.

Not only that, but as I learned long after I started trying to figure this out, the GL.iNet WiFi6 routers aren’t even built upon official upstream OpenWRT!

We can probably blame me for not doing science correctly. I am also only a sample size of one, so just because something is working well for me several times on the day I happen to test it does not always mean it works smoothly next week.

Keeping track of what is going on is problematic. The logs on the OpenWRT routers are ephemeral. Not only that, but I didn’t think I would need to troubleshoot anything on my Mango, so my startup scripts redirect all tailscaled output to /dev/null!

Pat is bad at science

I did a bad job writing things down. I didn’t know I was going to have to. All I really managed to do was write down some things after the fact on our Discord server. Those were usually the weirdest happenings.

We also aren’t helped by the fact that not everything happened at the same time. I updated Tailscale and OpenWRT on my Mango a few weeks ago, and everything there was fine. Then, when I tried booting up the router a few days later, things weren’t going as smoothly there as I thought. Then even more time passed before I got to fart around with the beefier WiFi 6 router.

Science is hard. I was kind of expecting to make things work, wipe everything clean, and make it work again. That last step would have given me all the documentation I needed, but I never made things work on the beefy router.

Connecting my GL.iNet Mango to a Tailscale exit node

First of all, I am cheating when I load Tailscale on my tiny Mango. It only has 16 megabytes of flash, and most of that is taken up by GL.iNet’s OpenWRT firmware. The official Tailscale binaries are bigger than that, so I wound up installing them on a USB flash drive.

It was a rather manual process, and I documented it as best as I could. I had to make a few tweaks in the OpenWRT LuCI GUI to get exit node traffic routing correctly, but once I did, I thought I had it working pretty well. I powered the router off and on several times, watched it succeed every time time, then I packed the Mango away in my laptop bag to use in an emergency.

The Mango is pretty slow. It can only encrypt traffic over my Tailscale link at about 4 megabits per second. That’s way slower than Wireguard in the kernel on this machine, and probably only just barely enough for video streaming. I am guessing Netflix would bump you down to standard definition via a Tailscale exit node on the Mango.

A few weeks later when my friend was having trouble, so I pulled my Mango out of the bag and plugged it in. My Mango didn’t want to connect to my Tailnet unless I killed and restarted tailscaled. Adding a long sleep to my Tailscale script seems to help with this, but it isn’t perfect.

I should also note that I only managed to get the Mango to route traffic through my exit node if the Mango is using Ethernet for its WAN connection. If I set up the Mango to use WiFi as its WAN, it won’t route traffic via the exit node.

I didn’t notice this right away, and I haven’t investigated what is going on here. This is only a problem when trying to route traffic through an exit node. The Mango works fine for me with WiFi WAN as long as it is just a regular node or subnet router.

NOTE: On the Mango, I have to create an interface in LuCI for the tailscale0 interface and make sure it is attached to the WAN firewall rules. This coaxes OpenWRT’s firewall scripting to apply the correct iptables rules for packets to flow.

The GL.iNet Slate AX is built on Qualcomm’s OpenWRT fork

Everything that went wrong for me when attempting to route the GL-AXT1800 through a Tailscale exit node was completely different than on the Mango. I assumed something was just a little different between the newer release of OpenWRT that this router was built on, so I investigated the idea of installing a different version.

I couldn’t downgrade the AXT-1800 to match the Mango. This made sense to me. Why would GL.iNet start crafting their build for a new device with an older version of OpenWRT for a new piece of hardware?

I also noticed that I couldn’t upgrade the Mango to match the Slate. I knew there were official OpenWRT builds for the Mango, so I checked the OpenWRT site for firmware for the Slate, but I couldn’t find it. There wasn’t any open-source firmware for any of GL.iNet’s WiFi6 devices.

I thought I read that there wasn’t yet any OpenWRT support for WiFi6 devices

At least I thought I read this, but GL.iNet sells a few OpenWRT routers with WiFi6 chips. How can that be?! We just figured it out, and GL.iNet doesn’t hide what they’re doing. They spell it out right in the product description.

Qualcomm fork of OpenWRT

OpenWRT doesn’t yet have much support for WiFi6 devices, but Qualcomm has a fork of OpenWRT that supports their latest WiFi6 chipsets.

I haven’t decided precisely how I feel about this, but it is definitely making my troubleshooting more difficult. That is assuming you’re willing to call my ham-fisted attempts at messing about here troubleshooting!

I understand that I don’t truly understand how Tailscale works

I understand well enough how packets get from a Tailscale device at my house to a Tailscale device at another location. What I don’t understand is how packets on my Linux machines find their way into the Tailscale process, but I do understand that Tailscale has its own little IP stack hiding in there.

There are no entries in my routing tables listing any of my Tailscale addresses. They should all be matched by my default gateway, yet they manage to get snagged by Tailscale and routed appropriately.

What works for me with Tailscale on the GL.iNet GL-AXT1800?

I installed the Tailscale OpenWRT package and immediately noticed that it is built with an old enough version of Tailscale that it doesn’t support exit nodes. I cheated and replaced the package’s binaries with the latest Tailscale binaries. It fires right up, connects to my Tailnet, and I can connect to things.

I figured cheating was the right thing to do. It seemed smart to let the OpenWRT Tailscale package install startup scripts and maybe even LuCI GUI configuration bits and pieces for me, and I could just replace the outdated binaries.

What happens when I try to route traffic through an exit node?

When I run the command to route traffic through an exit node, things get weird.

As soon as the exit node is enabled, I can no longer reach the exit node’s IP address, but I can reach other nodes just fine. At least I thought I could reach other nodes just fine.

One time I was accidentally pinging the wrong node in the background while enabling the exit node. I got really excited when it continued to respond, but then I noticed that the round-trip time increased quite a bit. A few seconds later I noticed that I was pinging the wrong address.

You can pass Tailscale the --netfilter-mode=off option to prevent Tailscale from creating any firewall rules. This gave me the same results.

What is the solution?

Our friend’s solution is Tailscale on a Raspberry Pi. That’s a fine solution, but I did something that made me feel a bit dirty. I set up a Wireguard server in a Docker container.

I followed somebody’s guide. I don’t think it was this exact guide, but it was similar. I was disappointed when I saw that I had to hit my page-down key twelve times to get to the end of the documentation. This is an order of magnitude more work that setting up Tailscale and a handful of usable exit nodes.

Not only was it long, but the documentation didn’t work for me. I had to install the wireguard-dkms package on the Docker machine. This makes perfect sense, but it took longer than I’d like to admit for me to figure out that I needed Wireguard support in my host kernel.

The good news is that GL.iNet’s awesome Domino interface makes it extremely easy to connect to a Wireguard or OpenVPN server.

The Domino GUI on the Slate AX is fancier than the tiny Mango!

These are very different travel routers. The Mango cost me $20 two years ago, only has 2.4 GHz WiFi, only a pair of 10/100 Ethernet ports, and it happily boots up when plugged into a 0.5 amp USB hub.

The Slate AX has a beefy heat sink with a little fan. It comes with a 5-volt 4-amp power supply. It has near bleeding-edge WiFi speeds. It is a dense beast of a little machine, and it costs nearly six times more than the Mango. I am not surprised that there are significantly more options in the stock Domino interface.

Their Domino GUI is a really nice feature on both routers. Either will let you do things like connect to WiFi or tether your phone to use as your WAN. You could do this with OpenWRT and LuCI, but you’d have to click dozens of buttons and change so many settings, and you better not mess any of it up!

The Domino GUI lets you do this with just a few clicks. I’m not excited about having this at home, but it is extremely handy on a travel router.

The Slate AX has options to allow you to use Ethernet, WiFi, and a tethered cell phone as a weighted multiport WAN. I haven’t had a chance to test this, but I am impressed that the option is available, and it looks easy to configure. You can do this sort of thing with stock OpenWRT, but you will have to work very hard to do it!

Conclusion

It is a bummer that we couldn’t get a GL-AXT1800 routing through a Tailscale exit node, but I am mostly only bummed out about it because we couldn’t get our friend watching Jeopardy through an OpenWRT router with Tailscale. The things that we are currently able to do with Tailscale on an OpenWRT router are still quite impressive to me.

When I bought the Mango two years ago, loading Tailscale on it was an afterthought, and exit nodes didn’t even exist yet. I just thought it was neat that I could leave my Mango behind and still be able to ssh to it. That option alone has a ton of value to me, and there are dozens of times when something like this would have come in handy over the last twenty years.

Two years later, and my Mango can route traffic through an exit node or route outside traffic to its own local subnet. I am hopeful that things will improve over time from every angle. GL.iNet seems to really want to get the IPQ6000 support ported to upstream OpenWRT. OpenWRT should eventually have a more recent version of Tailscale in their package repositories. And of course Tailscale is constantly improving.

This conclusion has gotten away from me. What I am trying to say is that it sure seems like everyone involved is probably doing a good job.

Is It Time For You to Set Up Tailscale ACLs?

| Comments

If you’re a lone Tailscale user like me, there’s a good chance that you have no pressing need to set up Tailscale’s access control lists (ACLs). Until quite recently, I didn’t feel there was much reason to lock anything down.

Pretty much every computer I own has been running Tailscale for more than a year now. They could all ping each other. In fact, most of them are on the same LAN, and they could ping each other before I had Tailscale. Tailscale already locked them down a bit more thoroughly for me. Why lock them down any more?

Then I started using Tailscale SSH

As soon as I started enabling Tailscale SSH, I needed to set up some access controls. I wanted to emulate my previous setup.

My desktop and two laptops had their own SSH private keys, and their matching public keys were distributed to all my other machines. That meant these three computers could connect to any computer I own.

1
2
3
4
5
6
7
8
9
"ssh": [
        // I don't actually use this rule anymore!
      {
          "action": "accept",
          "src":    ["tag:workstation"],
          "dst":    ["tag:server", "tag:workstation"],
          "users":  ["autogroup:nonroot", "root"],
      },
  ],

I gave those three devices a tag of workstation, and I stuck a server tag on everything else. Then I set up an ssh rule in Tailscale to allow any workstation to ssh into any server or any other workstation.

So far, so good. This configuration does happen on Tailscale’s Access Controls tab, but it isn’t in the acls section of the file. At this point, my Tailnet was still wide open.

I got worried when I added my public web server to my Tailnet

I have a tiny Digital Ocean droplet running nginx hosting patshead.com, briancmoses.com, and butterwhat.com. I always said I should install Tailscale out there, but my web server droplet has been running an outdated operating system for a while, so I knew I would be creating a fresh VPS at some point.

I finally did that. I spun up one of the new $4 per month droplets, copied my nginx config over, and installed Tailscale. I am super excited about this because it means I don’t even have to have an ssh port open to the Internet on my web server.

However, this means that a scary server that I don’t personally own that is sitting out there listening for connections on the Internet is connected directly to my Tailnet. Yikes!

Tagging all your machines for use in ACLs is hard!

It isn’t hard because you have to click on every machine to add tags. It is challenging because choosing names for your tags is easy to goof up!

My original decision that a workstation would connect to a server and never the other way around was too simple. It wasn’t the right way for me to break things down, and as I started adding more tags, I wasn’t able to easily set things up the way I wanted.

I’ve been doing my best to make sure my Tailscale nodes don’t have any services open on their physical network adapters. My workstations are mostly locked down well, and I moved things like my Octoprint virtual machine behind the NAT interface of KVM instead of being bridged to my LAN.

Even so, I have two servers at home that need to be accessible from outside my Tailnet. My NAS shares video to my Fire TV devices just in case I need to watch Manimal, and I have lots of unsafe devices around the house that need to connect to my Home Assistant server.

This seemed easy. I immediately tagged my NAS, my Home Assistant server, and my public web server with a tag of dmz.

What was the problem with this?

I want my workstations to be able to see everything. I want my servers to be able to communicate with each other, but I don’t want my servers in the dmz to be able to connect to my internal servers or workstations.

This all seemed simple and smart until I realized that everything in my dmz already had a server tag. I also very quickly realized that my Home Assistant server listening to my LAN is much less threatening than my web server listening to the public Internet. One of those should be on an even more restricted tag!

Where did I actually land?

I have four main tags now:

  • workstation
  • server-ts
  • server-dmz
  • server-external

My personal workstations can connect to anything. Machines tagged server-ts can connect to machines tagged server-ts and server-dmz, while the server-dmz servers can only talk to other server-dmz machines.

1
2
3
4
5
6
7
8
  "acls": [
    {"action": "accept", "src": ["tag:workstation"],   "dst": ["*:*"]},
    {"action": "accept", "src": ["tag:server-ts"],     "dst": ["tag:server-ts:*", "tag:server-dmz:*", "autogroup:internet:*"]},
    {"action": "accept", "src": ["tag:server-dmz"],     "dst": ["tag:server-dmz:*"]},
    {"action": "accept", "src": ["tag:blogdev"],      "dst": ["tag:blogprod:22"]},
    {"action": "accept", "src": ["nas"],              "dst": ["seafile:*"]},
    {"action": "accept", "src": ["autogroup:shared"], "dst": ["tag:shared:22,80,443"]},
  ],

These are all my ACLs as of writing this. There are a couple of more specific rules there that I didn’t talk about yet.

There’s a rule there that allows one of my virtual machines here at home to publish content to my public web server.

My NAS is in the dmz, so I had to give it its own rule to allow it to connect to my Seafile Pi*. My NAS syncs extra copies of some of my data for use as a local backup!

I goobered up my exit nodes!

I am more than a little embarrassed by how many times I had to go back and forth between desks to figure out why the exit node on my GL.iNet Mango stopped passing traffic to the Internet.

The Mango had a tag that allowed it to access the exit node. If I took that tag away, it couldn’t ping the exit node. I’d add it back, and while it could ping the exit node, it couldn’t route any farther. If I dropped the original ACL that leaves everything wide open, the Mango could route traffic just fine. What was going wrong?!

It seems like I had this idea in my head that Tailscale’s ACLs only applied to Tailscale nodes and addresses. I didn’t immediately realize that I had to explicitly allow access to the Internet or even other subnets I might be routing!

1
{"action": "accept", "src": ["tag:server-ts"], "dst": ["tag:server-ts:*", "tag:server-dmz:*", "autogroup:internet:*"]},

I just had to add autogroup:internet to the allowed destinations for the appropriate tag. Duh!

Don’t think too hard before implementing your ACLs

This is especially true if you are down here at my scale with a couple dozen nodes and only a few shared nodes. Just drop some tags on things and set up some access controls that allow nodes access to what they need.

You probably won’t set things up optimally. I know I didn’t on my first try, and I am already seeing things I’d like to do differently. Even if my initial attempt left things more open than I might like, it was still a huge win just because it blocked my public web server from connecting to the rest of my Tailnet. Any other improvements are minor by comparison.

If money and other people’s livelihoods are on the line, maybe you should spend some time having meetings and planning things out on whiteboards. It only takes a few seconds to switch back to the single default ACL that leaves your Tailnet wide open, so if you do find a problem, you can at least revert your changes quickly and easily!

Tailscale SSH is affected by Tailscale network ACLs!

This seems obvious, but I wasn’t positive that this would be the case! Tailscale seems to always make the best possible default choices, and that got me thinking that it might be the case that Tailscale’s own SSH server would ignore the ACLs if the connection were allowed in the ssh section of the access control configuration.

This does not seem to be the case. If you want to use Tailscale SSH, then your networking ACLs have to allow it. To be clear, I think this was the correct thing for Tailscale to do.

Shared nodes are allowed access by default

I wasn’t sure about this. The default single ACL just has one line that allows everyone access to everything. The first thing you do when designing your own ACLs is delete that entry. At that point nobody has access to anything, so I assumed I would need to add a line similar to this:

1
{"action": "accept", "src": ["autogroup:shared"], "dst": ["tag:shared:*"]},

We tested this. This wasn’t necessary, but I figured it would be a good idea to lock down my shared nodes just a bit, so I wound up using this ACL:

1
{"action": "accept", "src": ["autogroup:shared"], "dst": ["tag:shared:22,80,443"]},

It is a bit lazy. Three people need access to ports 80 or 443 on the Seafile server, and Brian needs SSH access to rsync files to his blog. It gets the job done.

I did test out removing ports 80 and 443 from this ACL, and I watched the connections on my Seafile server. All the Tailscale IP addresses that I didn’t own dropped off the netstat list, and when I put those ports back in the ACLs, everyone connected back up immediately.

I am sure the documentation explains this, but I doubt I am the only one who likes to see things work in practice just to make sure!

Forgetting you have Tailscale ACLs configured makes troubleshooting a real challenge!

This happened to me yesterday! A friend sent me a GL.iNet GL-AXT1800 router to help him get his identical router to pass local traffic through a Tailscale exit node.

I installed the ancient OpenWRT Tailscale package, replaced the binaries with the official Tailscale static ARM binaries, ran tailscale up, and it gave me the URL to open to authenticate this new node. Everything went smoothly, except I couldn’t ping any of my other Tailscale devices!

Derp. Since I didn’t remember to put any tags on my new Tailscale device, it wasn’t matching any of my Tailscale ACLs, so it couldn’t actually connect to anything!

This was a simple mistake, but I walked back and forth between two desks and rebooted the GL.iNet router at least twice before remembering that I even configured any Tailscale ACLs in the first place!

Conclusion

If you’re just a home gamer like I am, you probably don’t need to worry about Tailscale ACLs. If you have one or more nodes on your Tailnet that have services running on the open Internet, you may want to lock things down a bit. It would be a real bummer if someone managed to crack open your public web server, because they might be able to ride Tailscale past your other routers and firewalls.

One of the awesome things about Tailscale is that I have absolutely no idea what you’re doing with it. You might just be one person sharing a Minecraft server with some friends. You might be sharing a couple of servers with business partners like I am. You might even be managing a massive and complicated Tailnet at a giant corporation.

You and your Minecraft server probably don’t need to worry about ACLs, but if you are in a position where you should be thinking about tightening up your access controls, I hope my thoughts have been helpful!

The OpenWRT Routers from GL.iNet Are Even Cooler Than I Thought!

| Comments

I have had my little GL.iNet Mango router for about two years now. It was an impulse buy. It was on sale for less than $20 on Amazon, and I just couldn’t pass it up. It was exciting for me to learn that there is a manufacturer that ships OpenWRT on their routers, and I really wanted to mess around with one.

I rarely use my Mango router. It live in my laptop bag. If I ever need a WiFi extender, it is there. If my home router fails, it would be my emergency spare. My Mango is a Tailscale subnet router, so if I am ever away from home and need to reach an odd device via my Tailscale network, then I can. It is pretty awesome!

I bought a new laptop a few months ago, and I have been tidying up my various laptop bags. I realized that I hadn’t updated OpenWRT on my Mango in two years, and my Tailscale install is just as old. It seems like it is time to update things!

I had some problems along the way, and I managed to lose all access to my Mango. It could hand out DHCP addresses. It could route traffic. It wouldn’t respond to pings, HTTP, or SSH.

I am really excited that I had problems, because I learned that the GL.iNet routers are even more awesome than I thought!

NOTE: I didn’t really have any problems with my Mango! Something weird was happening on my Windows 11 laptop.

There’s more available than just the stock firmware!

When I could no longer ping my Mango router, I first tried resetting to factory defaults. That didn’t work. Then I tried re-flashing the latest firmware, and it still didn’t work.

Then I noticed that GL.iNet supply several different firmware images for their routers. There’s the stock image with their own GUI called Domino. There’s another that skips Domino and just has the office OpenWRT LuCi GUI. Then there’s a third firmware that routes all your traffic through the Tor network. How cool is that?!

I flashed the LuCi-only firmware and my Mango starting working correctly. All the official GL.iNet firmware images for the Mango are based on OpenWRT 19.07.08. That’s not too bad. The OpenWRT folks are still updating version 19, but first release of version 21 happened last year.

You can definitely download a version 21 build or a release candidate of version 22 for the Mango directly from the OpenWRT site.

Should I just run LuCi, or do I want the Domino GUI?

I love LuCi. If I were permanently installing a GL.iNet router in my home I would most definitely skip GL.iNet’s Domino GUI. I would most likely be installing that release candidate of OpenWRT 22.03 just to avoid a major upgrade in the near future.

My Mango doesn’t have a permanent home. It is a tool that lives in my laptop bag. There’s a very good chance that I might let a friend borrow it. The Domino GUI is WAY more friend-friendly than LuCi!

The Domino GUI also makes some difficult things as easy as clicking a button.

The GL.iNet interface has a simple interface to allow you to use another WiFi network as your WAN port. It has an equally simple dialog to configure the Mango as a WiFi repeater.

Either of those configurations would require dozens of clicks in OpenWRT’s LuCi GUI, and Domino even lets you tie those configuration settings to a physical switch on the router.

I definitely want the Domino GUI on my toolkit’s router.

Should I have bought a higher-end GL.iNet router?

Two really cool things came into my life at about the same time two years ago: the GL.iNet Mango and Tailscale. The Mango only has three or four megabytes of free disk space, and the Tailscale static binaries add up to more than 20 megabytes. One cool thing doesn’t fit on the other cool thing!

Two years ago, the only way to get Tailscale onto an OpenWRT router was to install it manually. Now you can just install it with the OpenWRT package manager, and that is awesome!

I cheated and put the Tailscale binary on a USB flash drive when I set things up two years ago. It’d be nice to not have to do this, but in a way, I am pleased with this configuration.

What if I loan my Mango to a friend? What if they’re less than trustworthy? I can just pop the USB drive out! All the Tailscale configuration and keys live on that drive. If they don’t have that, they can’t access my Tailnet.

I am pretty sure the OpenWRT Tailscale package will work on the Mango

The Tailscale package is only around 2.8 megabytes. That would nearly fit on a fresh Mango router with the stock GL.iNet firmware!

The GL.iNet firmware is running OpenWRT 19, and there don’t seem to be any Tailscale packages in the OpenWRT 19 repositories. Even if you could squeeze the package in, you’re going to have trouble getting an official OpenWRT package.

I did notice that when I installed the clean OpenWRT 19 image from GL.iNet that there’s around 7 megabytes of free space. That’s plenty of room to install the Tailscale package!

You should be in good shape if you download the latest version of OpenWRT for your Mango straight from the OpenWRT site. It sure looks as though you’ll have enough room, and the packages will be in the repository for you to install right away.

I didn’t want to give up the Domino GUI. Being able to connect to the router and click a few buttons to switch modes between routing, repeating, and other things is ridiculously handy.

How do I run Tailscale on the Mango if the Mango doesn’t have enough storage?

I have been arguing with myself for five minutes about how much information to include in this section. A step-by-step guide would make this blog way too long, and a 10,000’ overview seems too broad. Let’s see if I can land in a good spot near the middle.

I mostly repeated what I did to install Tailscale on my Mango in 2020, but I made room on the diminutive SanDisk flash drive for Ventoy. I also cleaned things up so I can modify the Tailscale startup job without logging in to the Mango.

Ventoy is occupying the first two partitions on my USB drive, so I added a small ext3 filesystem as the third partition. This has a copy of my tsup.sh script, the state file for Tailscale, and it is where I unpacked the Tailscale mipsle package. For the convenience of future upgrades, I created a symlink pointing to the current version of Tailscale. This is the root directory of the ext3 filesystem:

1
2
3
4
5
6
7
8
pat@zaphod:~$ ls -l /mnt/sda3
total 17744
drwx------ 2 root root    16384 Jul 24 16:49 lost+found
lrwxrwxrwx 1 root root       25 Sep 18 06:52 tailscale -> tailscale_1.31.71_mipsle/
drwxr-xr-x 3 root root     4096 Jul 18 12:58 tailscale_1.28.0_mipsle
drwxr-xr-x 3 root root     4096 Sep 15 22:54 tailscale_1.31.71_mipsle
-rw------- 1 root root     1418 Sep 18 07:05 tailscale.state
-rwxr-xr-x 1 root root      676 Sep 18 07:12 tsup.sh

This is my tsup.sh:

1
2
3
4
5
6
7
8
9
10
#! /bin/sh

# Not sure if the sleep is necessary!
sleep 10

/mnt/sda3/tailscale/tailscaled -state /mnt/sda3/tailscale.state > /dev/null 2>&>
 
# Make sure my bootable USB partition is unmounted cleanly
/bin/umount /mnt/sda2
/bin/umount /mnt/Ventoy

To make this work, I used the advanced settings tab to add this one line to the end of OpenWRT’s startup script:

1
(sleep 15; /mnt/sda3/tsup.sh) &

This could all be better, but it works. I did have to sign in once via ssh to run tailscaled and tailscale up manually so I could authorize the Mango on my Tailnet.

The various sleep commands sprinkled around are just laziness. You can probably guess why each of them exist.

I purposely chose to store the tailscale.state file on the flash drive. If I loan out my Mango to a friend, I might not want them connecting to my Tailscale network. If I pop the flash drive out, they won’t have any of the data needed to make a connection.

My GL.iNet Mango can’t use Tailscale as an exit node

And I am not sure exactly why! Tailscale routes packets without issue. I have this node configured as a Tailscale subnet router for its own local subnet. That seems to work correctly, so it is able to route packets from WiFi clients to nodes on my Tailnet.

I was hoping to be able to have the Mango route traffic through an exit node. That was a FireTV or AppleTV or something similar could watch American Netflix from Ireland, but it isn’t cooperating with me.

At first I tried tailscale up --exit-node=seafile, but that immediately cut off all access to local clients connected to the Mango. I was able to ssh in via Tailscale and verify that the Mango was using the exit node.

I updated that command to tailscale up --exit-node=seafile --exit-node-allow-lan-access, and my Mango’s local devices were able to talk to the mango again, but they weren’t able to pass traffic any farther than the Mango.

I am close, but not quite close enough!

UPDATE: I got my Mango routing properly through an exit node just a few hours after publishing this blog! This should most likely get a proper write-up, but here’s the short answer. I added the tailscale0 interface as an unmanaged interface in the LuCI interface and made sure it was attached to the WAN firewall group. I am guessing this let the OpenWRT NAT rules do their thing!

What else can I do with my 32 gigabyte Tailscale USB drive?!

When I tested the viability of running Tailscale on a USB flash drive, I used a drive I had on hand. It was an extremely large drive in the physical sense. Once I knew it was working, I bought the smallest Sandisk Cruzer Fit that I could fine. It was 32 GB, which was nearly 32 GB more storage than I needed!

While I was redoing things this week, I decided that I should find a use for the rest of that space. I installed Ventoy and a whole mess of bootable disk images. Ventoy should let the drive boot on both UEFI and legacy BIOS systems. Ventoy’s installation script even had an option to leave some space on the end of the drive, so I added a little 512 megabyte ext3 partition for OpenWRT to use.

My little Ventoy drive has images for:

  • Memtest86
  • FreeDOS
  • Ubuntu 22.04 installer
  • Xubuntu 22.04 installer
  • Windows 10 installer
  • Windows 11 installer

None of this is terribly exciting. I only boot up a computer with a USB drive once every few years now, but did have to make several USB boot drives over the last few months. I had to reinstall Windows 10 a laptop with a dead NVMe. I had to install Xubunu 22.04 on my desktop when I upgraded to an NVMe. I had to run Memtest86 when I bought new RAM a few weeks ago.

I wish I thought to set this up sooner!

I should be carrying an identical bootable drive in my laptop bag, but I figure it can’t hurt to have spare boot images squirrelled away in my travel router’s USB port!

Conclusion

I think I made the correct choice by continuing to use the stock GL.iNet firmware on my Mango. If this were my permanent home router, it would be way more valuable having an extra 10 megabytes of flash for packages, but this isn’t my home firewall. This is a Swiss Army Knife that I keep in my laptop bag.

Being able to quickly configure the Mango to be a router using a wired connection, a router using WiFi, or a WiFi extender is so much more valuable in my laptop bag! Why can’t I do this easily with stock OpenWRT? Is there a package I don’t know about?!

How Much RAM Do You Need in 2022?

| Comments

I probably wouldn’t have given this much thought if I didn’t have a stick of RAM fail on me last year. I don’t know that I can remember another time when a stick of RAM that passed a long Memtest86+ test failed on me, and this was a total failure. The machine locked up and wouldn’t boot until I found and removed the bad stick.

Four sticks of RAM, One is Dead!

I couldn’t figure out whether I could do an advance RMA of my memory, and Corsair wanted me to RMA all four sticks as a set. I didn’t want to deal with downtime, and I didn’t want to buy RAM to hold me over while I waited, so I figured I’d limp along with this single-channel 24 GB configuration until it caused problems.

Running while short 8 GB really didn’t cause problems. Everything that I do fit pretty well into 24 GB of RAM. Even so, I bought a faster pair of inexpensive 16 GB DDR4-3200 DIMMs a few weeks ago, so I am back at 32 GB, back to a dual-channel configuration, and my slightly overclocked Ryzen 1600’s RAM is running at 2933 instead of 2666.

Some benchmarks are a quite a bit faster with dual-channel RAM, but I’m not noticing the extra 8 GB.

I’ve always bought as much RAM as possible

Within reason. There are always diminishing returns, but any extra RAM will be used for disk caching. For the last two or three decades, disks have been slow. Really slow. Especially when it comes to random access.

A 7200 RPM disk can perform between 100 and 200 random reads or writes per second. That was true for my 13 GB IBM Deskstar drives twenty years ago, and is true even for the latest 18 TB 7200 RPM drives today. The heads in a mechanical disk have to wait until the data they need passes underneath. Any given point on the disk only passes under the read head 120 times each second.

In those days, extra RAM was the only thing hiding those slow seek times.

1
2
3
4
5
pat@zaphod:~$ free -h                                                      130 
               total        used        free      shared  buff/cache   available
Mem:            31Gi       9.0Gi       1.6Gi       917Mi        20Gi        20Gi
Swap:           23Gi        93Mi        23Gi
pat@zaphod:~$ 

My memory usage today is definitely higher than it was a decade ago, but I have always bought enough RAM to make sure at least 50% would be used for disk cache.

My last two workstations have had 32 GB of RAM

I am doing my best to think back to all past desktop computers. The timeline is pretty fuzzy, but it seems like I approximately doubled the memory each time I have upgraded. I almost listed each machine off here, but that feels unnecessary.

My FX-8350 had 32 GB of RAM, which was double the RAM in [the giant HP laptop][lt] that it replaced. I put 32 GB into the Ryzen 1600 when I built it in 2017, and I left it at 32 GB when I replaced the RAM last month.

What’s different today?

NVMe and SSD drives are really fast!

We’ve needed to cache our disks with RAM for decades because our disks had been stuck at 200 random I/O operations per second. My first SSD could manage more than 5,000 I/O operations per second, and my new NVMe can handle 500,000 operations per second.

We don’t have to spackle over slow disks any longer. If I push my machine to the point where it only has a couple of gigabytes of free RAM to use as disk cache, it doesn’t matter. I won’t notice the difference.

While I am just sitting here writing this blog, my machine is using around nine gigabytes of memory for actual work. If I fire up a game, that will likely eat up another 10 or 12 gigabytes.

While I was limping along with only 24 GB in my machine, this was never a problem. Running a game might bring me down to just several gigabytes of RAM for disk cache, yet I didn’t notice any sort of slowdowns or stutters when I would switch back and forth between my game and productivity tasks.

My SATA SSD was fast enough!

I forgot to mention that the vast majority of the months where I was limping along with 24 GB of RAM happened before I upgraded to an NVMe. My 280 gigabyte per second Crucial SSD that could only manage a few tens of thousands of I/O operations per second was plenty fast enough for me to never notice when I as down to just a couple of gigabytes of memory available for caching.

In the old days before solid-state drives, this would never have worked out. In the days when I only had eight gigabytes of RAM, my workstation would have felt like it was struggling if I only had a gigabyte or two of free RAM available for caching. If I didn’t have half my RAM available for cache, I would have been shopping for a memory upgrade!

The future we are living in is fantastic.

Your mileage may vary!

I was getting by just fine with 24 gigabytes, and I bet I could just barely squeak by with just 16 gigabytes of memory, but I wouldn’t want to bother trying. I definitely wouldn’t want to give up dual-chanel RAM, but if it were possible to buy 12 gigabyte DIMMs, I might have enjoyed having a dual-channel 24-gigabyte setup!

I’m using a handful of gigabytes of memory for Firefox, Thunderbird, Discord, and various other programs. The stuff that is normally running eats up around 9 gigabytes or memory.

The heaviest things I run are Davinci Resolve and games, but never at the same time. I don’t have enough GPU memory for that.

gigabytes of ram meme

In the old days, I would have at least one or two virtual machines running on my workstation. Today, I have had a separate server in my office handling that job.

It used to be handy having my virtual machines on my laptop in the days when my only workstation was my laptop. It was awesome having everything with me at home, at the office, or on the road.

I get more value today having those virtual machines on a dedicated box. I don’t want Home Assistant or Octoprint to reboot just because my Nvidia driver goobers things up forcing me to reboot my desktop!

Besides which, it isn’t 2008 anymore. I don’t have to hope a coffee shop has WiFi. I don’t have to wiggle through an ssh tunnel to get to my data at home or in the office. I can share my phone’s 500-megabit Internet connection and connect to my machines around the world using Tailscale, and they’ll work just like they do when I’m at home.

You might need more memory for those tasks that I don’t have!

If you’re running a mess of virtual machines on your workstation, then you probably already know how much memory you need for those to comfortably fit. If the VM disk images live on an SSD or NVMe, maybe those machines don’t need as much RAM allocated as you think they do.

Those virtual machines are still computers, even if they are sharing processors and disks. Just like my desktop PC, our virtual machines don’t need to rely on cache memory nearly as heavily as they did before we had NVMe drives. The old rules of thumb from the days of slow mechanical disks just don’t apply anymore.

If you’re running make -j 32 on your 16-core Ryzen 5950X, you know you might need a lot of memory just to support all those compiler tasks, but it almost definitely isn’t a big deal if your whole source tree doesn’t stay cached all day. Your NVMe can touch hundreds of thousands of files every second without breaking a sweat!

Is swapping to an NVMe fast?!

I spent a week unscientifically messing with various swap and dirty page settings. I figured that Apple must be leaning on fast NVMe swap and paging to make their 8 GB M1 MacBook Air a usable machine. If they can do it, maybe I can force Linux to dip deeper into swap.

I was able to get about 5 or 6 GB onto my swap partition. When I did, things usually acted just fine. I couldn’t even tell you that my machine was swapping.

Every once in a while, though, things would get really goofy. The whole machine would just grind to a halt without much warning. I never timed it, but if I walk away, it would usually have worked itself out by the time I got back.

There’s probably somewhere between my current settings and the problematic settings that would work alright. Maybe the defaults would push me a few gigabytes into swap if I disabled half my RAM.

This was fun to experiment with a bit, but not worth spending a real amount of time working on.

Conclusion

It is better to err on the side of caution. If you need to round your memory requirements to the nearest pair of sticks of RAM, you should definitely round up instead of down. If you’re like me, and you think you can get by with 24 GB of RAM, then you had better buy 32 GB!

For decades I always made the same choice. If you asked me to choose between more memory or faster memory, I would always choose more memory. It wasn’t always a problem you could spend your way out of. Sometimes DIMMs with double the capacity were only available in slower speeds. Sometimes your chipset only supports faster RAM speeds with two DIMMs and not four.

My FX-8350 build in 2013 had 32 GB of RAM. My Ryzen 1600 build from 2017 has 32 GB of RAM. If I upgrade to a Ryzen 7800X, it will also have 32 GB of RAM. Before my FX-8350, every major computer upgrade I have gone through has at least doubled my RAM. This feels weird, but it is also amazing and awesome!

Six Months of lvmcache on My Desktop

| Comments

I admit it. It hasn’t quite been six full months since I put a fresh NVMe into my desktop machine and turned on lvmcache. I am nearly three weeks short of that date as I am writing this sentence, but it will probably be another week before I finish this blog, and if I wait any longer I might miss the target by a few months!

I believe I only have good news to report. I’ve torn down, rebuilt, or reconfigured the cache at least three times: once when I installed the NVMe drive, once when I split my slow storage volume into two pieces, and again when I replaced the ancient 4 TB drive with a 12 TB drive.

Here’s the tl;dr!

The cache is fantastic. It works well enough to cache all the games that I play, and they load just as fast as they would if they were installed directly on the NVMe.

I am no longer the least bit concerned about wearing out my flash storage. It sure looks like I won’t run out of write endurance until 10 years after Samsung’s 5-year warranty expires.

I split my slow storage into two separately cached volumes

I did some math a few weeks after setting up my lvmcache. My lvmcache partition on the NVMe is 300 GB, and I process around 200 GB of video files each month. That much is just fine.

Quite a few of my Steam games are over 100 GB in size.

Testing says that the video files I am working on do indeed wind up nearly 100% cached. If we oversimplify the way lvmcache works, and we assume that the cache will be smart enough to always evict the older video files that I won’t be working on in the near future, this only leaves me enough room in cache for a single game.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-------------------------------------------------------------------------
LVM [2.03.11(2)] cache report of given device /dev/mapper/zaphodvg-slow
-------------------------------------------------------------------------
- Cache Usage: 99.9% - Metadata Usage: 6.6%
- Read Hit Rate: 66.2% - Write Hit Rate: 56.7%
- Demotions/Promotions/Dirty: 27129/27165/0
- Feature arguments in use: metadata2 writeback no_discard_passdown
- Core arguments in use : migration_threshold 8192 smq 0
  - Cache Policy: stochastic multiqueue (smq)
- Cache Metadata Mode: rw
- MetaData Operation Health: ok

-------------------------------------------------------------------------
LVM [2.03.11(2)] cache report of given device /dev/mapper/zaphodvg-churn
-------------------------------------------------------------------------
- Cache Usage: 85.2% - Metadata Usage: 24.4%
- Read Hit Rate: 50.1% - Write Hit Rate: 3.1%
- Demotions/Promotions/Dirty: 0/183205/0
- Feature arguments in use: metadata2 writethrough no_discard_passdown
- Core arguments in use : migration_threshold 8192 smq 0
  - Cache Policy: stochastic multiqueue (smq)
- Cache Metadata Mode: rw
- MetaData Operation Health: ok

The math says I should have used 300 GB for my operating system and 700 GB for the cache. Resizing the encrypted root filesystem and juggling everything around felt like too much effort, so I just set up a separate cache on my old Crucial 480 GB SATA SSD.

I call my cached volumes slow and churn

I probably need better names. The slow volume’s name has been grandfathered in, but the churn volume’s name is rather appropriate.

One of those volumes is where I churn through data. I dump 100 GB of video onto that volume at a time, work on it for a few weeks, then I dump another 100 GB of video. This just keeps happening every few weeks. This sounds a bit like churning, doesn’t it?!

The churn volume is an 8 TB slice of my new 12 TB hard drive, and it is cached by the old 480 GB SSD. That SSD is plenty fast enough to handle the 50 and 100 megabit files our Sony ZV-1 and Sony A7S3 cameras create.

The slow volume got its name because it just isn’t fast like the NVMe. This is where my Steam library lives. I have installed just over 2.3 TB of games that are being cached by a 300 GB partition on my 1 TB NVMe.

Is my mid-range NVMe going to survive being a cache?!

I was going to tell you that it depends on who you ask, but both the conservative and pessimistic answers are both positive!

My 1 TB Samsung 980 has a 5-year warranty with a guarantee of 320 TB of written data. I am right around 10 TB of writes after six months. That means Samsung thinks I will make it 10 years past their warranty period.

1
2
3
4
5
Percentage Used:                    1%
Data Units Read:                    22,669,873 [11.6 TB]
Data Units Written:                 19,392,238 [9.92 TB]
Host Read Commands:                 188,764,027
Host Write Commands:                308,581,800

The data in the SMART report says I have only used 1% of my writes. If that’s correct, then this NVMe will outlive me.

I assume that my writes have slowed considerably. I had to drop the lvmcache every time I resized my slow and churn volumes. That means 1 TB of those writes had to happen just to refill the cache.

I’m also no longer passing 300 GB of video files through the NVMe’s cache partition every month. My old Crucial SSD is bearing that weight now.

That little old SSD is doing a good job. It spend eight years being the primary storage device in my desktop computer, and SMART says it has 33% of its life remaining. The data sheet says the Crucial SSD is rated for 72 TB of writes, so it will probably make it through the next couple of years!

The conundrum of two caches

Is it better for me to have two separate caches? Would I be better off with one 800 GB cache instead of a 300 GB cache and a 480 GB cache? It is complicated, and I just can’t make up my mind about how I feel about this.

I do know for certain that I would much rather have both these caches on my NVMe. If they were on the NVMe the proportions would be adjusted.

On one hand, dealing with a single storage volume can be much more convenient. When I installed my 12 TB hard drive, I had to decide how much space I needed for my Steam volume and how much space I would need for video files.

If I made the wrong choices, I will have to shrink one volume and extend the other in a year or two. I will have to disable and recreate both lvmcache caches to make that happen.

On the other hand, having two different caches handling two different kinds of data is a much more effective use of cache space. My Steam games that I play regularly tend to just stay put in the cache, and it doesn’t matter how long the older videos stay in their separate cache, because they won’t be pushing games out!

If I had one unified cache I bet it would probably take a month or more for old videos to get demoted. It wouldn’t surprise me if that means I’d have 300 GB of unnecessary video that I’ve already finished editing clogging up my cache at any given time.

One could argue that I could have sidestepped that problem by buying a 2 TB NVMe and using a bigger cache, but that doesn’t eliminate the issue. It just makes it a lot smaller, right? Besides, the goal was to save money by buying less flash storage!

I’m not running Linux! Is there some way I can do this on Windows?!

Yes. Maybe. Most likely.

My friend Brian Moses has been watching me talk about lvmcache for ages, and he’s been watching me post screenshots full of cache data for just as long on our Discord server. When he built his new gaming PC, he did some research and wound up buying a copy of Primocache.

I don’t think he’s run much in the way of benchmarks, and if he has, he hasn’t posted the results of this tests. I asked Google about primocache for gaming and the first hit is Leonard Putra’s video showing side-by-side footage of a few games loading with and without Primocache.

Primocache seems to be doing the job for Leonard. Three out of four games had a 98% cache hit rate. Just Cause 4 had a slightly lower hit rate, and it didn’t load much faster.

Some games just don’t benefit from faster disks. Most of the games I tested load just quickly from my SATA SSD as they do from my much faster NVMe.

I have no first-hand experience with Primocache, but it certainly looks like it is worth checking out.

Conclusion

This sort of caching is only a Band-Aid. In a five years we will likely have more NVMe storage than we know what do with.

In the mean time, I am excited to have lvmcache available in the mean time. I only have a 1 terabyte NVMe, but I have 2.4 terabytes of games installed. How awesome is that?!

Are you thinking about using a solid-state disk cache in front of a slow disk on your desktop or workstation? Are you already caching your workstation with lvmcache or something similar? How is it working out for you? Let me know in the comments, or stop by the Butter, What?! Discord server to chat with me about it!

So Many Tailscale Exit Nodes!

| Comments

I don’t know how I managed to notice this, because I almost never open the Google Play store on my phone, but I did open it a few nights ago, and there was a Tailscale update waiting. I clicked the update button, and I think I might have had to open Tailscale to fire the VPN connection back up.

That’s when I noticed a menu option to enable using my phone as an exit node. What?! My phone is set to install Tailscale beta releases. This says it is a release candidate, so I guess this feature has been hiding on my phone for a little while already.

Of course I had to try it out. It works just fine. This did make me realize that I have yet to set up any exit nodes on my Tailnet, so it must be time to put exit nodes on all the things.

I set up an exit node on one of my virtual servers in the house, my Android phone, and on my Raspberry Pi server at Brian Moses’s house.

Then I got an email telling me that I paid $4.26 for the month for my Digital Ocean droplet that runs the Nginx server for several of our blogs. Why didn’t I think to enable the droplet as an exit node?! It is an exit node now.

What is an exit node? Why would you need one?

An exit node is how you get yourself some of the functionality of something like NordVPN or Private Internet Access for free. Once a machine is configured to be an exit node, any other machine on your Tailnet can force all their Internet traffic through that node.

Exit Nodes Everywhere!

What if you’re on your laptop at Starbucks and want to make sure the barista who owns the WiFi can’t snoop on your traffic? What if the network in your hotel is blocking access to YouTube? What if you’re in Ireland and want to watch shows that are only on American Netflix?

You just click on your Tailscale icon, choose the exit node option, and choose which exit node you want to route this computer’s Internet traffic through. All your traffic will flow through an encrypted Wireguard connection from your laptop in Ireland to your other computer in Plano, TX, and from there it will travel the unencrypted Internet to Netflix.

Tailscale does the right thing again

It wasn’t until the next morning that I worried I had committed an offense! It seemed sensible to turn on at least one exit node at every physical location where I have a Tailscale node, and one of those nodes is my Seafile server at Brian Moses’s house.

I remembered that I am sharing the Seafile Pi with Jeremy Cook and my wife. Neither of these are nefarious characters that I would expect to abuse Brian’s Internet connection, but I certainly hadn’t thought about this, and I most definitely didn’t want to abuse my free colocation facility!

I didn’t need to worry. Tailscale does the right thing. If you activate an exit node after you’ve already shared the node, they won’t have access to the exit node. Not only that, but you can’t give your friends access to the exit node after the fact without their knowledge.

Tailscale Sharing Dialog

You have to send them a new share invite with the exit node enabled. I verified this by having Brian check to see if my Seafile server showed up in his list of available exit nodes.

Conclusion

Tailscale exit nodes are neat. Sometimes you need Netflix to think you’re in a different country. Sometimes you want to hide your traffic from Starbucks or your employer. Sometimes you just need to test that your website is working as expected from another physical location. A Tailscale exit node can cover all these situations and more.

I am not sure when I will need an exit node on an Android phone, but I am excited that I have the option, and I am excited about the idea of repurposing old Android hardware. You can run Octoprint on a phone using Octo4a, someone has set up a backup server on their old cracked Android phone, and now you can throw Tailscale on a cheap old phone from your junk drawer and leave an exit node behind anywhere you want. That’s awesome!