Measuring Battery Runtime Improvement With the Intel X25-m

| Comments

There seem to be plenty of X25 performance benchmarks all over the Internet. Performance may have been one of the major reasons I upgraded my laptop to an X25, but it most certainly wasn't the only one.

Most of the power consumption benchmarks I have found don't seem to align very well with my usual on-battery workload. I am armed with a fresh battery and a new 80 GB X25-M, so I have to do some testing!

The Test Hardware

The laptop is a Dell Inspiron 6400 with a 1.66 ghz Core 2 Duo, ATI Radeon x1400 (using open source Radeon driver), Intel 3945 wireless card, and 4 GB RAM (only 3.16 GB usable). The laptop is running 64-bit Ubuntu 9.04.

My Testing Workload

I made sure most of my usual applications were up and running before I unplugged the power cord. That would include emacs, Firefox, Thunderbird, Pidgin, and Powertop. Bluetooth is turned off, Wi-Fi is connected to an 802.11a network, and the LCD brightness is set to 48%.

I spent run time of the battery doing some or all of the following:

  • Editing Perl code with emacs
  • Reading email
  • Reading my RSS feeds with Google Reader
  • Stumbling
  • Trying to beat Hard creeps in the Desktop Tower Defense Sandbox mode
  • Editing this blog entry
  • Cooking a frozen pizza
  • Brewing coffee in my Moka Express pot

The graphs!

For some reason my custom-built 2.6.31 kernels are very power hungry. They are built based on the options in Ubuntu’s 2.6.28 kernel config from my /boot directory, so they should not be configured too much differently than the Ubuntu 2.6.28 kernel. I don't have enough evidence to make me believe that 2.6.31 is any less energy efficient than 2.6.28.

It looks like I'm getting 10% more runtime with the SSD. I'm pretty happy with that. My 3-year-old laptop has lots of outdated and power hungry components. The lowest wattage number I have seen out of powertop during these tests was in the 18.5 watt range. The long-term averages powertop was giving me were in the 22-23 watt range.

I'm under the impression that modern laptops with LED backlights, better chipsets, and newer, faster processors can get into the 16 watt range with the LCD brightness turned all the way up (I was running mine at half). It wouldn't surprise me at all if a newer laptop would get a 15-20% improvement in runtime since their mechanical disk would be a larger percentage of total power.

Two Surprises Pointed Out By powertop

My laptop is already tweaked pretty heavily to save power, so I was surprised to see these two programs causing wake-ups.

PostgreSQL was causing about 5% of my CPU wake-ups. I really didn't want to have to shut down PostgreSQL so I was very happy to learn that there is a configuration option that can be tweaked in postgresql.conf. My postgresql.conf had the option commented out:

#bgwriter_delay = 200ms                        # 10-10000ms between rounds

I changed mine to 10000ms and I now I rarely see it show up in powertop.

The other surprise was gnome-power-manager. I thought I remembered it being fixed a few years ago. It used to wake up many times per second the entire time it was running. The current version that came with Ubuntu 9.04 seems to only be partially fixed. It seems that if it is running and the AC power state is changed it will revert to its old behavior of waking up 10-20 times per second.

Killing gnome-power-manager and restarting it will fix it until the next time the AC power state changes. I need to work on making that happen automatically.

Some Final Thoughts

The Intel X25-M is supposed to use 150mw when active and about half that when idle. Most mechanical drives don't even get down to 150mw when idle, and they require 1 to 2 watts or more when active.

A 10% increase in runtime on my 20-watt laptop fits those numbers pretty well.

I would imagine that power gap grows wider as the disk load increases. My usual on-battery workload is very light on the disks but it sure doesn’t give them much chance to spin down.

Intel X25-M G2 vs. Old Laptop Drive Benchmarks

| Comments

I thought it would be a good idea to run some quick benchmarks on my old mechanical drive before I wipe it and use it in another laptop. I have some pseudo-scientific charts and numbers here, but I also have some "me with a stopwatch" boot time numbers. They mostly only show how abysmal my boot-up times are, mostly because of all the extra junk I have starting up.

I clocked the time from hitting the power button to seeing grub at well under 2 seconds. For both tests, I brought up the grub menu and I did my best to hit enter and start my stopwatch (a.k.a. my old Treo 650) at the same time. When the GDM login appeared, I stopped the clock.

The 120 GB 7200 RPM laptop drive came in at a whopping 36 seconds. The X25 gave a much better result of 15 seconds. The software installed on both drives is exactly the same, but the root file system on the mechanical drive is ext3 instead of ext4. It isn't quite apples to apples, but it is what I have here in front of me.

Some Real Bonnie++ Benchmarks

Show:

Charts are nice, but here's the actual numbers to go along with it:

Version 1.03c                           ------Sequential Output------ --Sequential Input- --Random-
                                        -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine                              Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
Laptop PIIX Deadline X25-M          6464M 48265  96 81809  23 37340  11 36500  70 108486  18  5502  12
Laptop AHCI Deadline X25-M          6528M 45836  96 74973  15 36691  10 43531  78 139407  17 13431  39
Laptop PIIX CFQ X25-M               6464M 47577  96 82931  21 38706  11 46920  91 116400  20  4482  26
Laptop PIIX CFQ 120 GB 7200 RPM     6464M 45043  91 53992  14 19104   5 37053  81  46116   7 161.7   0
Laptop AHCI CFQ 120 GB 7200 RPM     6528M 41883  87 49245  10 17348   4 43423  82  56122   7 178.1   0
Xen Server Deadline RAID 10         1080M 68909  97 128789 45 48402  18 55770  91 106948  24 326.2   0
Laptop PIIX DL X25-M v1.4*          6280M 41521  77 69559  15 23754   6 30857  78 142922  13  5309   0
Laptop PIIX DL X25-M v1.4 TRIMed**  6280M 50354  98 66394  16 24923   8 31937  80 119348  19  4569  10
Laptop AHCI DL X25-M v1.4 TRIMed*** 6528M 42587  81 49367  11 23277   7 41588  81 150217  21 14115  35
Core I7 Laptop AHCI DL X25-M        8192M 56415  89 87157  11 39827   9 69707  98 298590  29 16150  45
KVM to FreeNAS                      4096M   349  92 15619   4  6191   2  1884  95  57328   8   590  37

I included an old benchmark of my web server. It has four 7200 RPM SATA drives in a Linux software RAID 10. I am very happy with how well the X25-M compares to the array.

I tested both the CFQ and Deadline IO schedulers on the X25. The machine felt more responsive when the test was running on the Deadline scheduler. I started a to test the noop scheduler but the machine felt much worse than it did during the CFQ test. I didn't bother letting that test continue after that.

Having AHCI and NCQ improves read performance pretty significantly and it seems to improve seek time dramatically, especially for the SSD. If you are able to use AHCI, I would recommend it.

I wanted to run some bonnie++ benchmarks against the X25 because I couldn't find any anywhere else. Bonnie++ doesn't do a very good job of highlighting the biggest advantage of the X25 because it only has one random access test.

Results From the "Butt Dyno"

Every time you make a performance modification to your car you have to take it for a test on the "butt dyno." It just means you get in the car and see how much faster it feels. Sometimes the performance boost is mostly in your head, like when you upgrade to a K&N air filter or a bigger exhaust.

Sometimes you install a bigger turbocharger. I remember the last time I did that. I bet I had a great big smile on my face during that first shift from first to second gear. It made a very noticeable difference in the performance of the car.

That's the way the X25-M feels. Everything loads a bit faster and I/O intensive background tasks are much less likely to bog down the rest of the machine. I sure didn't enjoy booting up the old platter drive to run the benchmarks on it…

How Hot is the X25-M?

When I pulled the X25 out to swap in the old drive, I noticed that it was warmer than I expected it to be. I imagine the metal housing makes it feel a bit warmer than an overworked SD card. My brain thought it was almost as warm as the old drive got.

I was very wrong. I pulled the mechanical drive out almost immediately after the bonnie++ runs. I would describe it as actually being hot. I put it down pretty quickly; it was uncomfortable to hold.

*Update 2009-10-27: X25-M v1.4 Firmware Update

I reran Bonnie with the latest firmware update. My test methodology is a little bit unfair. The previous tests were done on a very fresh drive. I'm now over 80% full and I probably have hundreds of gigs of rewrites on this drive.

I probably should have run a quick benchmark before I updated the firmware, but it is too late for that now! I am not surprised that most of the numbers went down. I am very surprised the sequential input speed is up by 30%. That is outperforming even my previous speeds with AHCI enabled.

**Update 2009-11-15: X25-M after a TRIM

I was able to TRIM my X25-M, so I reran the bonnie. Small sequential output was improved. I have no idea why sequential block input was up so high on the last test.

***Update 2009-12-15: X25-M freshly TRIMed with AHCI and 2.6.31 kernel with BFS

I was pointed to a better AHCI quirks patch in a comment by felix krull.

The patch is working very well so far, and I was itching to run a benchmark with AHCI and the newer v1.4 X25-M firmware. This benchmark may not be quite apples to oranges, though… I'm currently running Linux kernel version 2.6.31 patched with the BFS scheduler and most of the preempt options turned on.

Update 2010-03-24: X25-M transplanted into my new HP DV8T Core i7 laptop

The SATA 150 bottleneck is now gone. I expected it to break 200MB/s on the sequential input test but I certainly didn't expect to be approaching 300MB/s.

Update 2012-02-08: Bonnie++ in a KVM virtual machine…

This one is quite unrelated to the original topic of this article but this seems to be the place where I’ve been dumping all my random disk benchmarks…

I recently helped a friend build his FreeNAS box and I figure try out its iSCSI functions and hammer on his new home network a little bit. I’m running a KVM virtual machine on my laptop. KVM is using connecting iSCSI target over gigabit ethernet to the FreeNAS server. The disk is visible up as a KVM/QEMU disk to the virtual machine.

The iSCSI target is a file-based extent sitting on a four-drive RAID-Z6 volume. The results are less than stellar but not really any worse than I was expecting. I’m impressed that the IOPS is nearly twice as high as my benchmark on my old RAID 10 Xen server.

Intel X25-M G2 Upgrade, and a Lack of AHCI

| Comments

My shiny new 80 GB second-generation Intel X25-M has arrived.  So far I am very happy with it, and it is very fast.  I don't have many real numbers yet, just a couple of bonnie++ benchmarks:

    Version 1.03c       ———Sequential Output——— —Sequential Input- —Random-                         -Per Chr- —Block— -Rewrite- -Per Chr- —Block— —Seeks—     Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP     cfq           6464M 47577  96 82931  21 38706  11 46920  91 116400  20  4482  26     deadline      6464M 48265  96 81809  23 37340  11 36500  70 108486  18  5502  12

    xenhost       1080M 68909  97 128789  45 48402  18 55770  91 106948  24 326.2   0

The first two entries are my laptop with the cfq and deadline schedulers. The third entry is an old benchmark of the server hosting all my Xen virtual machines.  It is running Linux software RAID 10 on four 400 GB 7200 RPM SATA disks.

At first glance I was pretty happy with how well the X25 kept up with the RAID 10, and the numbers certainly beat my old laptop disk by a huge margin.  I was especially happy with the 4500-5500 random seeks per second. The numbers seemed a bit low to me, though. So I tried a simpler test:

    root@zaphod:~# dd if=/dev/sda2 of=/dev/null bs=2M count=500     500+0 records in     500+0 records out     1048576000 bytes (1.0 GB) copied, 10.5582 s, 99.3 MB/s

All my simple tests with dd are pegged out at around 100 MB/sec.  After doing some research, I learned that the BIOS in my laptop is setting my ICH7 chipset to compatibility mode.  This is limiting the drive to UDMA/133 speeds, which probably puts a real world upper limit in the 100MB/sec range.

The BIOS in my Dell Inspiron 6400 does not let me change the mode of the ICH7.  There seems to be at least one kernel patch that attempts to enable AHCI after boot up.  I might give it a try in the next few days and see what the numbers look like.

This has still been a huge performance increase over my 120 GB, 7200 RPM laptop drive, even without being able to use the full potential of the X25.  My unscientific "one-hippopotamus, two-hippopotamus" boot-up test easily comes in at under 10 seconds from grub to login screen (I'm running Ubuntu 9.04).  I am pretty certain that the old drive was in the 15-16 second range.  I'll have to boot the old disk, use a stopwatch, and run some benchmarks later in the week.

The BIOS in my Dell Inspiron 6400 does not allow me to set the ICH7 SATA controller to AHCI mode.  I grabbed a fresh copy of the Linux 2.6.31 kernel source and applied this AHCI quirks patch to it.  I copied the Ubuntu /boot/config-2.6.28-15-generic to the new source directory and ran a make oldconfig.

With the stock Ubuntu 2.6.28-15-generic amd64 kernel dmesg showed that the controller was using the ata_piix driver and NCQ was disabled:

    [    1.696401] scsi0 : ata_piix     [    1.696594] scsi1 : ata_piix     [    1.860550] ata1.00: 156301488 sectors, multi 8: LBA48 NCQ (depth 0/32)

With the 2.6.31 kernel with the patch applied dmesg showed:

    [    1.918218] scsi0 : ahci     [    1.918414] scsi1 : ahci     [    2.400540] ata1.00: 156301488 sectors, multi 8: LBA48 NCQ (depth 31/32)

This is definitely an improvement, so I ran another quick dd to see if there was a change:

    root@zaphod:~# dd if=/dev/sda2 of=/dev/null bs=2M count=500     500+0 records in     500+0 records out     1048576000 bytes (1.0 GB) copied, 8.45229 s, 124 MB/s

That was definitely an improvement, so I thought it was time to run bonnie++ again:

    Version 1.03c                     ———Sequential Output——— —Sequential Input- —Random-                                       -Per Chr- —Block— -Rewrite- -Per Chr- —Block— —Seeks—     Machine                      Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP     2.6.31 ahci deadline        6528M 45836  96 74973  15 36691  10 43531  78 139407  17 13431  39     2.6.28 piix deadline        6464M 48265  96 81809  23 37340  11 36500  70 108486  18  5502  12     2.6.28 piix cfq             6464M 47577  96 82931  21 38706  11 46920  91 116400  20  4482  26

Enabling AHCI and NCQ gave me a small decrease in sequential output performance, a very nice increase in sequential input performance, and an insane increase in seeks per second.  I can only assume that the seeks per second was helped so tremendously by NCQ.

So far I have found two major issues with using this kernel patch. The laptop won't resume from a suspend, and my optical drive has disappeared. From what I can tell, these two problems vary by machine.

On an unrelated note, my Wi-Fi (iwl3945) connects much faster with 2.6.31 than it did with 2.6.28.

Forcing AHCI To Increase Intel X25-M Performance

| Comments

The BIOS in my Dell Inspiron 6400 does not allow me to set the ICH7 SATA controller to AHCI mode. I grabbed a fresh copy of the Linux 2.6.31 kernel source and applied this AHCI quirks patch to it. I copied the Ubuntu /boot/config-2.6.28-15-generic to the new source directory and ran a make oldconfig.

With the stock Ubuntu 2.6.28-15-generic amd64 kernel, dmesg showed that the controller was using the ata_piix driver and NCQ was disabled:

[    1.696401] scsi0 : ata_piix
[    1.696594] scsi1 : ata_piix
[    1.860550] ata1.00: 156301488 sectors, multi 8: LBA48 NCQ (depth 0/32)

With the 2.6.31 kernel with the patch applied, dmesg showed:

[    1.918218] scsi0 : ahci
[    1.918414] scsi1 : ahci
[    2.400540] ata1.00: 156301488 sectors, multi 8: LBA48 NCQ (depth 31/32)

This is definitely an improvement, so I ran another quick dd to see if there was a change:

root@zaphod:~# dd if=/dev/sda2 of=/dev/null bs=2M count=500
500+0 records in
500+0 records out
1048576000 bytes (1.0 GB) copied, 8.45229 s, 124 MB/s

That was definitely an improvement so I thought it was time to run bonnie++ again:

Version 1.03c                     ------Sequential Output------ --Sequential Input- --Random-
                                  -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine                      Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
2.6.31 ahci deadline        6528M 45836  96 74973  15 36691  10 43531  78 139407  17 13431  39
2.6.28 piix deadline        6464M 48265  96 81809  23 37340  11 36500  70 108486  18  5502  12
2.6.28 piix cfq             6464M 47577  96 82931  21 38706  11 46920  91 116400  20  4482  26

Enabling AHCI and NCQ gave me a small decrease in sequential output performance, a very nice increase in sequential input performance, and an insane increase in seeks per second. I can only assume that the seeks per second was helped so tremendously by NCQ.

So far I have found to major issues with using this kernel patch—the laptop won't resume from a suspend and my optical drive has disappeared. From what I can tell, these two problems vary by machine.

On an unrelated note, my Wi-Fi (iwl3945) connects much faster with 2.6.31 than it did with 2.6.28.

Experimenting with Compressed Swap

| Comments

The Intel X25 has once again encouraged me to find something very interesting. I've been reading a bit about the X25’s “endurance management” feature. It sounds as though the drive switches to some sort of slower write strategy if the average writes per day continually exceed an average of 20 GB per day.

My laptop doesn't tend to swap at all most of the time. Even under my heaviest memory usage, when I have multiple test virtual machines running, I only seem to dip about 200 MB or so into my swap.

Testing Out compcache

Ubuntu 9.04 seems to ship with an older version of compcache. I decided to build my own modules from the latest tarball. The instructions of the website were very straightforward, and I had things up and running in no time.

So far I have not compiled a kernel with the 'swap free notify' patch applied. It does look like it would be very useful, since it would allow compcache to free up unused swap space much faster.

For testing purposes I disabled my original swap partition. I initialized a ramzswap device with the default size (25% of total memory) and activated it. Here is what free showed before I tried to fill up more memory than I ever would in practice:

             total       used       free     shared    buffers     cached
Mem:       3310124    3105004     205120          0     194312    1777532
-/+ buffers/cache:    1133160    2176964
Swap:       827524          0     827524

swapon -s shows:

Filename                            Type            Size    Used    Priority
/dev/ramzswap0                          partition   827524  0       100

I fired up six QEMU machines with memory sizes between 384 and 512 MB for a total of 2432 MB.

Here is what free looked like after letting the machines boot up and settle down for a bit:

             total       used       free     shared    buffers     cached
Mem:       3310124    3293424      16700          0       1768      86740
-/+ buffers/cache:    3204916     105208
Swap:       827524     600916     226608

At this point, cached data dropped from 1.7 GB down to under 100 MB and the poor little laptop was pushing 600 MB into swap. The rzscontrol program can show us some interesting statistics about the swap space that is in use:

OrigDataSize:     575584 kB
ComprDataSize:    157969 kB
MemUsedTotal:     159260 kB

rzscontrol is showing that 575 MB of data is currently swapped out and it is only taking up 159 MB of RAM. If that ratio holds steady, I could fill that 800 MB swap space and only eat up a little over 200 MB of RAM.

I even got a little meaner and started running memtest86+ in the virtual machines to make sure I wasn't getting extra special results because of zeroed out pages. I couldn't get the compression ratio to drop past 3:1.

I also tried a similar experiment using physical swap… It brought my laptop to a crawl. When swapping to compcache it didn't feel any different than when I'm not swapping at all.

Compcache, Suspend to Disk, and Swap Files

I am currently running compcache on my laptop. I don't plan to use it as my only swap space, though. I activated my old swap partition with a lower priority than my compcache swap space:

Filename                            Type            Size    Used    Priority
/dev/ramzswap0                          partition   827524  0       100
/dev/sda1                               partition   4000176 0       1

I haven't done much testing with how this will behave. I'm hoping it won't reach for the physical swap space until the ramzswap device fills up. I don't believe I will be that lucky, though. Compcache has an option to let you back a ramzswap device with a physical swap device. When I tested this, dstat was showing disk activity on my swap partition the whole time swap was in use. This was pretty much what I expected based on the documentation.

Suspend to disk seemed to “Just Work” as long as I have my old swap partition activated. Ubuntu uses swsusp to suspend to disk, which means that it can't suspend to a swap file. I was hoping I could just create a swap file for those rare occasions when I actually need to suspend to disk instead of to RAM.

How I Reduced My Virtual Machine Disk Images By Over 75% With QEMU

| Comments

Lately I've been wanting to replace the hard drive in my laptop with a nice fast second-generation Intel X25-m. Unfortunately, my disk footprint is just a little too high to comfortably fit on the disk I want to buy, so I have been looking to free up a little space wherever I can.

What I Started With

I had three test VirtualBox images totaling around 8.5 GB. I figured that since two of the images were running the same version of Ubuntu, that QEMU's qcow2 copy-on-write disk images would be able to save me a bit of room. My original three machines looked like this:

total 8.5G
3.8G OpenVZ 1 Test.vdi
2.8G OpenVZ 2 Test.vdi
1.9G patshead.com Test.vdi

The first two were Ubuntu 8.04 servers, the last was running Ubuntu 9.04. I decided that it would be easiest to rebuild these machines from scratch and migrate over whatever data was needed.

Creating The Base Disk Images

In order to keep organized, I created a pair of directories to keep disk images in. One for the read only base images, and another for the writable images:

~/qemu/qcow-ro
~/qemu/qcow-rw

I created two temporary disk images to hold fairly stripped-down installations of Ubuntu Hardy and Jaunty server. I created the initial machines without swap space. I also made sure to install all the latest updates and remove any extra outdated kernel packages (they are surprisingly large!). I also made sure to install any software that I knew I would want to have available in all my future machines.

Once everything was installed, I needed to clean things up, and I think I made a small error here. I ran apt-get clean to remove any downloaded packages. I probably should have zeroed out the files with a command like shred -n 1 -z /var/cache/apt/archives/*.deb instead.

I also made sure to delete the ssh host keys (rm /etc/ssh/ssh_host_*). When I boot a new image, I make sure to run dpkg-reconfigure openssh-server to generate a new set of keys for the new server.

At this point I was left with two images that needed to be shrunk down. The compression of qcow2 images is only applied when the image is created; any later writes to the same disk image will be uncompressed. I used qemu-img convert to recompress the images.

This left me with two smaller base images:

total 834M
438M hardy-root.qcow2
397M jaunty-root.qcow2

From here I just had to create more qcow2 images using one of these two as the base. Specifying the full path for the base_image seemed to be important. For example:

qemu-img create -b ~/qemu/qcow-ro/hardy-root.qcow2 -f qcow2 ~/qemu/qcow-rw/new-hardy-root.qcow2

To save a bit more space, I created an empty swap disk image and ran mkswap on it. The scripts I wrote for starting my QEMU machines will make a copy of my empty swap image as needed. They are deleted when the machine shuts down.

Squeezing Out a Bit More Space

After I finished loading all the data into my test images I decided to try to recompress the images using qemu-img convert. In one case, I saved about 200 MB, which was about 40%. In most cases, the images got bigger! Zeroing out files instead of deleting them probably would have helped quite a bit in this case as well.

What I Ended Up With

After setting up equivalents of my original three machines, my disk footprint was just under 2 GB. I was very happy with that result. I have added another server and I am now right around 2.5 GB. Here are the images I am currently using:

qcow-ro:
total 834M
438M hardy-cow-root.qcow2
397M jaunty-cow-root.qcow2

qcow-rw:
total 1.6G
 28K EMPTYSWAP.qcow2
757M OpenVZ-Test-1-root.qcow2
348M OpenVZ-Test-2-root.qcow2
3.0M Patshead-dev-root.qcow2
517M movabletype-root.qcow2
 28K movabletype-swap.raw

Overall, I am very happy with the savings in disk space. It is also less costly in both time and disk space to throw up a temporary test machine if I need one.

The biggest drawback is that QEMU, even with the KQEMU accelerator, is quite a bit slower than Virtualbox or VMware Server. Fortunately, the performance is more than acceptable for what I'm using it for. Your mileage may vary, of course.

Now I just have to wait for the G2 X25-m drives to ship again…