Everyone Struggles to Identify Bottlenecks

| Comments

Identifying bottlenecks is a skill that many of us struggle with. Learning to pinpoint bottlenecks is important, and learning which bottlenecks to ignore can save you a lot of money.

In this blog post, we’ll explore the concept of bottlenecks and how they can impact your computing setup. We’ll discuss common types of bottlenecks, such as CPU, GPU, network, and IOPS, and provide tips and strategies for identifying and addressing them. Whether you’re a seasoned homelab enthusiast or just starting out, understanding bottlenecks is crucial for optimizing your system’s performance and getting the most out of your hardware investments.

What is a bottleneck?

You have seen a bottle. When you pop the top and tip that bottle over, the liquid won’t instantly exit the container. Its exit is slowed by the narrow neck of the bottle. The wider the neck, the faster the liquid can flow.

Stable Diffusion Bottleneck Guy

You see bottlenecks everywhere. The door to your house will only let one person through at a time. The escalator at the mall only has room for two people on each step. The ramp onto the 4-lane highway is one lane wide. Your faucet is only an inch wide, so it takes a long time to fill a pot of water.

Every IT department faces similar constrictions at every level of their operation.

You will always have a bottleneck, and that is OK!

This section almost came in right before the conclusion, but I think it is a better idea to discuss this closer to the top. No system will be perfectly balanced. There will always be a bottleneck somewhere. Every time you eliminate one bottleneck, that will just move the bottleneck elsewhere.

What matters most is that your system is designed in such a way that your bottleneck is acceptable. Performance may be the primary driver behind your design, but cost is usually a significant factor as well.

10-gigabit Ethernet has gotten pretty cheap, while 40-gigabit Ethernet hardware is still extremely expensive. Just because your 10-gigabit network is slower than the disks in your NAS doesn’t mean that the bottleneck is a problem. Would upgrading to 40-gigabit Ethernet ACTUALLY improve your workflow? Four times faster than fast enough is also fast enough.

You may learn that plain old gigabit Ethernet will do the job well enough, and that 2.5-gigabit Ethernet won’t cost you much more.

Your CPU might be bottlenecking your GPU!

PC gamers seem to love using the word “bottleneck,” and they really seem to enjoy using it as a verb. I dislike the verbing of the word bottleneck because it seems like a significant percentage of the people using “bottleneck” as a verb aren’t using the term correctly.

There are real bottlenecks in a gaming PC. The CPU, the GPU, and your monitor each have to do work for every frame that is displayed. Every game needs a different balance, but whichever component is maxed out first is your bottleneck.

If your GPU is near 100% utilization and your FPS isn’t keeping up with your monitor, then your GPU is a bottleneck.

If your GPU is significantly below 100% utilization and your FPS isn’t keeping up with your monitor, that means your CPU is the bottleneck.

When your CPU and GPU are both barely breaking a sweat while running your game at your monitor’s maximum refresh rate, then it might be time for a monitor upgrade! Your 60 Hz 1080p monitor might be a bottleneck, because your other components have room to render at a higher resolution or more frames per second. It might be time for a nice 34” 3440x1440 165 Hz upgrade!

I didn’t have a terribly appropriate photo to use in the section, so I dropped in a test clip of Severed Steel running on my Ryzen 5700X with a Radeon 6700 XT. You can see that the game can’t quite manage to maintain a constant 144 frames per second, and my GPU is at 97% utilization and max wattage, so GPU is holding me back from keeping up with my monitor!

The network is always the bottleneck of your NAS

I hate to use the word “always.”“ It always feels like a lie. As long as your NAS is ONLY being used as a NAS, then this is almost definitely correct. Sharing files tends to be light-duty work.

A 10-gigabit Ethernet connection can move data at about 1 gigabyte per second. That sounds fast, but it really isn’t! Three 20-terabyte hard disks can nearly max that out. A pair of SATA SSDs will be a little faster, and any NVMe you can buy will be a lot faster.

I paid less than $100 for a low-power motherboard and CPU combo ten years ago, and it could share files over CIFS or NFS as fast as the PCIe slot could go via my 20-gigabit Infiniband network. The available PCIe slot was my bottleneck then because it maxed out at around 8 gigabits per second, so I was definitely underutilizing my network at the time!

Stable Diffusion Bottleneck Guy 2

This is what encouraged me to start writing this post. There was a tiny Celeron N100 server posted on r/homelab recently that had slots for four NVMe drives. It is the more compact sibling of my own Celeron N100 homelab box with 5 NVMe slots.

So many comments on that Reddit post were complaining that each NVMe slot only has a single 1x PCIe 3.0 connection. These folks are all bad at finding bottlenecks! That little server only has a pair of 2.5-gigabit Ethernet ports, so any single NVMe installed in that miniature server will be twice as fast as all the network ports combined.

What if your NAS isn’t just a NAS?

I wholeheartedly believe that you cram as much stuff onto your homelab server hardware as possible. Serving files over CIFS or NFS might occasionally max out your network port, but it usually leaves you with a ton of CPU and disk I/O left to burn. You might as well put it to good use!

Stable Diffusion stack of hard disks

Running Jellyfin or Plex in a virtual machine or container on a NAS is quite common. I am fortunate enough that my Jellyfin server rarely has to transcode any video, because most of my Jellyfin clients can directly play back even the 2160p 10-bit files in my meager collection.

I do have one device that requires transcoding. My Jellyfin server can transcode 2160p 10-bit video at 77 frames per second, and it can transcode 1080p 8-bit video at well over 200 frames per second. That means the GPU on my little Celeron N100 is the bottleneck when transcoding video, and I will be in trouble if I need to feed more than three devices a 10-bit 2160p video at the time.

That bottleneck is so wide that I will never need to worry about it, and the transcoding runs a little faster when two or more videos are going simultaneously, so I wouldn’t be surprised if I could squeeze in a fourth playback session!

Sometimes, the bottleneck is IOPS and not throughput

Old mechanical hard disks have reasonably fast sequential throughput speeds. A 12-terabyte hard disk will be able to push at least 100 megabytes per second on the slow inside tracks, and a 22-terabyte has enough platter density to push nearly 300 megabytes per second on the fast outer tracks.

Every 7200 RPM hard disk has a worst-case seek time of 8 milliseconds. That works out to only 120 I/O operations per second (IOPS). A single hard disk has enough throughput to easily stream a dozen 4K Blu-Ray movies, but it might only be able to insert 120 records per second into your PostgreSQL database.

The cheapest SATA SSD can handle tens of thousands of IOPS, while the fastest NVMe drives are starting to reach 1,000,000 IOPS. These drives are fast when streaming a Blu-Ray, and they don’t slow down when you start updating random people’s phone numbers in you 750 gigabyte customer database.

The vast majority of people adding a NAS to their home network are storing video files from the high seas, or they are storing backups. If you fit either of these descriptions, then you probably only need inexpensive 3.5” hard disks.

My personal video storage is mostly footage taken straight off my own cameras, and I work with that footage in DaVinci Resolve. I layered a few hundred gigabytes of lvmcache on top of slow video storage,because 8-millisecond seek times add up to noticeable lag when you are bouncing around a timeline that references three or four videos.

One seek to get to the correct video frame, at least one more seek to backtrack to the previous keyframe, then maybe a third seek to pull in enough video to start rendering—that adds up to around 100 milliseconds on a mechanical hard disk before the GPU even gets to start decoding and rendering the video, while it would take less than one millisecond on any solid-state storage device. That is a difference you can feel!

Caching to an SSD is a great way to smooth out some of the rough edges. The SSD can catch thousands of those database updates and flush them back to the slow disk later on. My SSD cache is big enough to hold one or two projects’ worth of video files, so it is usually only holding on to the data that I need to work with this week.

Conclusion

In summary, understanding and addressing bottlenecks is crucial for optimizing the performance of your NAS and homelab setup. Identifying which component is constraining your system can make a world of difference, and recognizing these limitations can help you make informed decisions about upgrades and configurations, or even whether or not you should worry about upgrading anything at all!

It is your turn to contribute to the conversation! Share your insights, experiences, or questions related to this topic in the comments below. Have you encountered any unexpected bottlenecks in your own setup? How did you overcome them? Was the upgrade to reduce your bottleneck worth the expense? Let’s learn from each other and continue to refine our systems.

If you’re interested in connecting with a community of homelab, NAS, and even gaming enthusiasts, consider joining the *Butter, What?! Discord community. You can engage in discussions, share knowledge, and stay up to date on the latest trends, developments, and deals in the world of homelabbing.

Comments