From ArchWiki
Revision as of 09:13, 15 November 2013 by Mikeserv (Talk | contribs) (Installing the Kernel)

Jump to: navigation, search

Example installation

Here is mine Bcache installation. Maybe you can use some idea's. :)

Prepare the storage drive

Bcache-tools installation

fakeroot and binutils are only needed. but it will fail in a error if you only install them. wget is from a failed archbang instal :(

pacman -Syy
pacman -S base-devel fakeroot binutils wget git
cd ~/
tar -xzvf bcache-tools-git.tar.gz
cd bcache-tools-git
makepkg -si --asroot

Making the partitions

For /boot i use the slower HDD because i had troubles that mine SSD loaded the kernel while mine HDD was still spinning up :P

Starting with the faster SSD drive

fdisk /dev/sda

Partition type: p
Partition-number: 1
First-sector: [enter]
Last-sector: +90G

Partition type: p
Partition-number: 3
First-sector: [enter]
Last-sector: [enter]
write and exit

Now the slower 2 TB Harddrive

fdisk /dev/sdb

Partition type: p
Partition-number: 1
First-sector: [enter]
Last-sector: +1G

Partition type: p
Partition-number: 2
First-sector: [enter]
Last-sector: +16G [enter]

Partition type: p
Partition-number: 3
First-sector: [enter]
Last-sector: [enter]

write and exit

Making Bcache Device

The --wipe-bcache is to remove a error :) and the dd commands is or cleaning out old partition data.

make-bcache --wipe-bcache -B /dev/sdb3 -C /dev/sda2
dd if=/dev/zero count=1 bs=1024 seek=1 of=/dev/sda2
dd if=/dev/zero count=1 bs=1024 seek=1 of=/dev/sdb3

If next commands give a error: invalid argument Then the device is already registered.

echo /dev/sda2 > /sys/fs/bcache/register
echo /dev/sdb3 > /sys/fs/bcache/register

Formating the drives

mkfs.ext4 -E discard /dev/sda1
mkfs.ext4 -E discard /dev/sdb1
mkfs.ext4 /dev/bcache0
mkswap /dev/sdb2
swapon /dev/sdb2

Mount the partitions

cd ~/
mkdir -p /mnt/install
mount /dev/sda1 /mnt/install
mkdir -p /mnt/install/{boot,home,srv,data,var,mnt/.hdd}
mount /dev/sdb1 /mnt/install/boot
mount /dev/bcache0 /mnt/install/mnt/.hdd
mkdir -p /mnt/install/mnt/.hdd/{home,srv,data,var}
mount -o bind /mnt/install/mnt/.hdd/home /mnt/install/home
mount -o bind /mnt/install/mnt/.hdd/srv /mnt/install/srv
mount -o bind /mnt/install/mnt/.hdd/data /mnt/install/data
mount -o bind /mnt/install/mnt/.hdd/var /mnt/install/var

Bcache Check After mounting

Reason for /boot on the HDD is because of boot up failure because of HDD spin-up.


sda                 8:0    0 167.7G  0 disk 
|-sda1              8:1    0    90G  0 part /mnt/install
`-sda2              8:2    0  77.7G  0 part 
  `-bcache0       253:0    0   1.8T  0 disk /mnt/install/mnt/.hdd
sdb                 8:16   0   1.8T  0 disk 
|-sdb1              8:17   0     1G  0 part /mnt/install/boot
|-sdb2              8:18   0    16G  0 part [SWAP]
`-sdb3              8:19   0   1.8T  0 part 
  `-bcache0       253:0    0   1.8T  0 disk /mnt/install/mnt/.hdd

Generate an fstab

Generate an fstab file with the following command. UUIDs will be used because they have certain advantages.

genfstab -U -p /mnt/install >> /mnt/install/etc/fstab
sed -i '/dev\/sda/ s/rw,relatime,data=ordered/defaults,noatime,discard/g' /mnt/install/etc/fstab
sed -i '/mnt\/install/ {s/\/mnt\/install//g; s/rw,relatime,data=ordered/defaults/g; s/none/bind/g}' /mnt/install/etc/fstab
more /mnt/install/etc/fstab

Install and configure kernel and bootloader

Adding Bcache module and hook

Edit /etc/mkinitcpio.conf as needed and re-generate the initramfs image with: adding the "bcache" module adding the "bcache_udev" hook between block and filesystems

sed -i '/^MODULES=/ s/"$/ bcache"/' /etc/mkinitcpio.conf
sed -i '/^HOOKS=/ s/block filesystems/block bcache_udev filesystems/' /etc/mkinitcpio.conf
more /etc/mkinitcpio.conf

Registering and attaching Bcache

This part is a pain in the A.. The problem is that Bcache wil give a error at startup if this is not done.

ERROR: Unable to find root device 'UUID=b6b2d82b-f87e-44d5-bbc5-c51dd7aace15'.
You are being dropped to a recovery shell
Type 'exit' to try and continue booting

First get the correct "SET UUID" with this ls command. And use the UUID like the example underneath. Also check for the use of the correct devices. echo: write error: Invalid argument <<< This error is good !!! means that your value was already stored.

ls /sys/fs/bcache/
2300f944-0eb5-4a9e-b052-8d5ea63cbb8f  register  register_quiet
sudo echo /dev/sda2 > /sys/fs/bcache/register
sudo echo /dev/sdb3 > /sys/fs/bcache/register
sudo echo 2300f944-0eb5-4a9e-b052-8d5ea63cbb8f > /sys/block/sdb/sdb3/bcache/attach
mv /usr/lib/initcpio/hooks/bcache ~/bcache.old
cat >> /usr/lib/initcpio/hooks/bcache << EOF
run_hook() {
    echo /dev/sda2 > /sys/fs/bcache/register
    echo /dev/sdb3 > /sys/fs/bcache/register

Installing the Kernel

mkinitcpio -p linux

I hope some stuff is helpfull :P 17:58, 11 November 2013 (UTC)

I'm curious about two things. First, though I forget what the '-E' switch signifies, it seems you're formatting the raw, underlayer partitions themselves. I don't understand why.
Also, the "pain in the a?" So udev is not assembling your array automatically as the devices are discovered? That is unexpected behavior, and I suspect it has to do with the formatting you performed and udev's superblock discovery. Check the very bottom of the bcache wiki on their site. At least try lsblk to verify "bcache" partition types are not registering as "ext4" before assembling the array.
Oh. And please sign talk pages.

Mikeserv (talk) 03:35, 12 November 2013 (UTC)

Well this is a experiment with Bcache with mine own Wiki page and 100x testing the install, i am not a linux expert just a newbie who was reading a lot of howto's. Just see it as a compilation of different web pages and some logical thinking. :)

The -E is just a copy and past ... not sure if it was needed.I just thought it was needed to get the discard flag while formating.

The pain in the A.... this was before Udev ... i was busy getting Bcache to work 3 months ago ... 75% of boot up failed when booting up cold. Later i found out that the SSD was too fast with booting up and the normal HDD was still spinning up.... After i splitted up the SSD for root directory and bcache and moved the /boot to the HDD since then i can bootup 100% of the time without any error... This was before the udev thingy (wich i dont understand and need to read up) :)

So please understand that i am just a linux noob trying to contribute :)

Emesix (talk)Emesix (talk) 00:02, 13 November 2013 (UTC)

No, I get it, and I didn't mean to insinuate that you were doing anything wrong, exactly, just that, as designed, it could operate better.
Basically, and you should check the man-pages and the wiki page here to be sure just in case I'm not entirely correct (which does happen), udev is the program that runs during boot (and pretty much all of the rest of the time, too) to detect your hardware and load drivers as necessary. Udev is a relatively recent addition to the Linux boot process and is unique in that it discovers and handles hardware as it shows up, can handle multiple devices simultaneously with non-blocking parallell operations, and, as a result, is significantly faster than the methods it has replaced.
I was the one who added the udev rules and step 7b to this wiki, and I was the one who wrote the "Rogue Superblock" section of "Troubleshooting" at the bottom of bcache's wiki. I only gained the knowledge to write either after spending several frustrating days with an issue that appears very similar to your "pain in the a."
As step 7a shows, the prevailing Arch Linux method for reassembling a bcache array at boot used to look something like:
1) Wait 5 secs.
2) Check to see if udev has discovered and added {disk UUIDS}. If yes goto 3), no, repeat 1) and 2).
3) echo {pain in the a stuff} /sys/bla/bla/bla
This does work, of course, but you're adding an unnecessary wait loop to your boot process and you're performing a task with brittle shell script that depends on specific UUIDs that could be performed flexibly and at discovery time.
That's what the udev rules do. Basically, the rules instruct udev to treat as part of your bcache array any partition it finds that reports itself as a "bcache" partition type. It builds the bcache array from disk partitions at every boot only as soon as those partitions are ready, which would of course resolve the race condition you mentioned and allow you to boot from SSD if you liked.
And here's where our experience might converge: I added a partition to my bcache array after previously formatting it as ext4. If you don't know about superblocks (or magic blocks or whatever they're called) then you're just like I was a few months ago. In short the filesystem has to have a way to report on itself, so it sets aside a small marker on disk that says, "Hey, OS, I'm a disk partition of type EXT4 and I start at block offset whatever and end at block offset whatever. Also, I enjoy long walks on the beach and prefer red wine to white. See you around."
The problem I experienced was that by default ext4 tended to begin at an earlier offset than bcache, and so even though I formatted over the previously ext4 partition with bcache as instructed, bcache never overwrote ext4's first superblock. When udev attempted to build my array, it would always miss my first partition because it would read the first superblock, identify the partition as ext4, then skip it. Echoing the add instruction to /sys/ after boot was my fix for a couple of days, but eventually I figured it all out and wrote that short bit on superblocks in bcache's wiki.
Maybe that helps?
EDIT: Just looked back at the wiki proper and somebody has performed some major changes recently. While it does appear cleaner, it no longer makes sense. There are references to non-existent steps throughout, and the actual why's that were included at least to some degree seem to have been occluded in favor of dubious, though possibly more efficient how's. Anyway, step 7b no longer exists, but it used to include the bcache_udev mkinitcpio hook I (very barely) adapted from mdadm_udev when trying to fix Arch's bcache boot problems. It's now included in the AUR package the wiki recommends I guess, despite the wiki also telling us we must "echo ... /sys/... at every bootup."

Mikeserv (talk) 07:08, 14 November 2013 (UTC)

Sorry about that, I missed a couple of references back to the sections I changed. Hopefully it makes more sense now. Mikeserv's udev rule is crucial to having a working bcache device on boot, as Emesix had discovered, it will often result in a failed boot. The echo in to /sys/bcache* step was missed when I updated it to mention the udev rule as the 'default' method.

--Justin8 (talk) 10:45, 14 November 2013 (UTC)

For clarity's sake, the udev rule was never mine. The used rule has been included in the AUR package's git pull from at least the time I first installed bcache some months ago. The package maintainer at that time opted not to use it, I guess, and instead installed the legacy-type shell-scripted for-loop mkinitcpio hook I outlined above, which was, to be fair, also the process recommended by every other source of information I was able to dig up at the time to include bcache's own wiki. Frustrated with having to wait for such things which seemed to me to defeat the whole purpose of buying and configuring an SSD boot device, I eventually stumbled upon the mdadm_udev hook in /use/share/initcpio and (only slightly) modified it to apply the 69-bcache-rules instead.
Justin, thanks for your recent change. NEVERMIND FOLLOWING: I'm curious why you specify "unless assembling after boot?" Why include anything to do with bcache in the initramfs at all if you're building the array after the boot sequence? MAKES PERFECT SENSE OF COURSE. GOT A 1 TRACK MIND SOMETIMES I GUESS. SORRY.

Mikeserv (talk) 11:11, 14 November 2013 (UTC)

Emesix, just looked closer at your instructions, and you actually do the thing I had to do and outlined in the bcache wiki to overwrite the rogue superblock here:
dd if=/dev/zero count=1 bs=1024 seek=1 of=/dev/sd{a2,b3}
And the ext4 formatting you do after that I earlier incorrectly assumed to be for the "raw, underlayer partitions" is revealed upon further inspection to be for non-bcache partitions and so isn't even relevant; I apologize for jumping to conclusions before reviewing it fully, the quick check I did a couple of days ago just struck me as really familiar.
The udev stuff above should still be relevant, and it might be worth noting that the particular dd command above is by no means a cure-all. It should only be used as written if you have previously identified a superblock located at byte offset 1024 that needs overwriting, though the actual superblock location can vary by filesystem. I've since learned that a much more flexible and useful tool for this is wipefs, which is much less dangerous than it sounds.
Mikeserv (talk) 14:28, 14 November 2013 (UTC)

Thanks for the explanation of the super-block Mikeserv's was very insightful. But that Udev stuff is still a layer to high for me :P I don't know if its possible to do a little differently to avoid a unwanted wait loop:

1) Check to see if udev has discovered and added {disk UUIDS}. If yes goto 3)
2) Wait 5 secs and goto 1.
3) echo {pain in the a stuff} /sys/bla/bla/bla

The dd command is just a safety step. :) And because formatting the /dev/bcache0 was soon to follow, it didn't hurt if i accidentally killed the file system on the bcache :) I got the feeling that resizing the bcache partition was also giving problems wich the dd command fixed..... BTW i like the piece added with this command set:

# cd /sys/fs/bcache
# echo "UUID" > /sys/block/bcache0/bcache/attach

p.s. I don't get why the wiki page is full of btrfs stuff. Is this not a combination of commands? And shouldn't a more conventional partition/file system scheme be more useful for novice users (like me) ??

Emesix (talk) 22:47, 14 November 2013 (UTC)
Regarding btrfs and its usefulness for new users... I don't know. With btrfs for instance, there are many normally high-level file-system concepts rendered nearly as mundane as directory maintenance like snapshotting and subvolumes, and still others that require only a single edit to a boot config like autodefrag and compress=lzo delivering inline filesystem compression and cleanup. In practice, you really don't have to understand the hows of these things at all to get incredible use out of them.
Consider just sub volumes: they can deliver, at the very least, a separated partition structure for the various segments of the Linux filesystem as is often recommended people set at installation with syntax and simplicity like unto cd and mkdir, only the btrfs --help implementation is probably a little friendlier. With traditional filesystems this same setup required knowledge of disk geometry, extent counts and all manner of other arcane things to configure in any sanely optimized way. Btrfs will create you a RAID in less than 30 seconds.
I guess what I'm saying is the research vs reward ratio is pretty significantly in your favor with a btrfs disk. Of course, should the btrfs filesystem crash, well, you should probably call someone else is all I'll say about that.
In this wiki, I definitely see your point, especially considering the added complexity bcache brings. When I was trying to set mine up I spent a good deal of time in both the btrfs and bcache IRC dev channels and both sides were less than optimistic about friendly cooperation between the two. Still, it works and has done for months. I know, don't call you, right?
Now about udev, I'll try again. So, your script implements a wait loop to check occasionally if udev has done a thing (bring up the disks and pull in basic drivers like SCSI for block devices and ext4 for their component filesystems) so it can do a thing (register the disks as small parts of a larger whole and pull in the bcache driver). The udev rules eschew the notion of a secondary monitor script such as yours and instead have udev do it all (any disk it brings up as a "bcache" type is registered as a small part of a larger whole and the bcache driver is pulled in).
So you couldn't boot from SSD reliably before because you couldn't reliably script the order in which udev would bring up the disks (a consequence of its faster parallel operations), but udev doesn't have to script its order, udev just udevs as udev udevs best.
See? You will always boot reliably from your chosen device because udev won't misinterpret its own report on your chosen device's readiness. And that's all - there's nothing wrong with the shell script exactly, except that it's just plain unnecessary and added complication.

Mikeserv (talk) 23:38, 14 November 2013 (UTC)

I have to say, I want to try out using btrfs instead of LVM on my desktop sometime, but the cache article is not the place for it. Mentioning it as a good alternative on the installation and beginners guide and having details in the btrfs article is going to be more productive and logical. I added the section near the top of the page for configuring a simple cache setup with no regards to file systems and partition layouts as they should be beyond the scope of this article.

--Justin8 (talk) 02:13, 15 November 2013 (UTC)

Agreed. There's a lot of it, too. I guess that sometimes happens on less visited pages. Probably it will level out as the module is more established. I compromised on some of the grub stuff in a minor edit yesterday after you rightly pointed out how much unrelated stuff is in there. Never thought about it before, but I've only ever written 7b and 2 notes here and a short section on UEFI kernel update syncing.
Above is just about everything I know relatable to bcache by now, I guess. I wish there was some more in the wiki here on benchmarking. The bcache wiki itself reads like a stereo repair manual on the subject. I never have managed to wrap my mind round it before my eyes would shut of their own accord.

Mikeserv (talk) 03:00, 15 November 2013 (UTC)

Ha, that is the perfect description for their wiki; it's all failure modes and fixes but no reasoning. From what I could see, benchmarking of bcache is really only useful to benchmark writes; reads will vary exponentially depending on whether or not you get a cache hit. On top of that, sequential reads/writes bypass the cache. the only real way to benchmark it would be in 'real-world' style tests instead of synthetic benchmarks. I.e. how long it takes to boot your system. how long it takes to compile X, Y or Z. I put a little bit up on my blog when testing out bcache, mostly focussing on IOps performance changes since everything else is relatively the same with or without bcache (other than the smart re-ordering of metadata changes) I was going to just paste the graphs, but they are all js objects; you can see the results I got here:

Justin8 (talk) 04:05, 15 November 2013 (UTC)

Justin: so you're the AUR bcache-tools maintainer now? I guess you dropped enough hints. That's cool. Reading through your blog (pretty colors) I was reminded of another little bcache fact that may not be immediately apparent to all. It seems most tend to assume that the cache:backing-store must be on a 1:1 ratio. This is not the case. A single cache can serve as many storage devices as you might like. Conversely, there's nothing preventing one from configuring several mechanical discs in a RAID and then backing it on a 1:1 ratio. Regarding a cached RAID, though, I do recall reading a bcache IRC conversation in which a dev mentioned hardware configuration of the device would be desirable as bcache is designed to interface with a block layer and dropping a filesystem below it can have unexpected results.
I only bring this up because your blog mentions you will switch from a 1tb to a 3tb soon. Personally, I use a 120GB Kingston SSD to cache 2 3tb WD Greens which are then formatted as a single btrfs RAID1 metadata/RAID0 data. The data is mostly downloaded video for a home theater server and as such is largely expendable in the event of catastrophe. Probably you know this, but just in case, I thought I'd point out that there's nothing preventing you from merely adding the the 3tb to the 1tb as you please.
Lastly, you can format and configure an entire array with basically one line like: make-bcache -B /dev/{1tb} /dev/{new_3tb} -C /dev/{60GB_SSD}. Then just reboot (or even reload udev) if you've got the udev rules loaded - udev will register the devices for you based on the auto-attach data recorded in the partition's superblocks as a result of the all-in-one add command. You wouldn't even need the bcache_udev hook or anything else in initramfs at this point, because obviously you shouldn't expect to boot to the bcache array yet as you haven't had a chance to format it with a usable filesystem or to copy any files to it, but it can simplify things.
Mikeserv (talk) 07:04, 15 November 2013 (UTC)
I did notice a reference to that from the previous maintainer in the aur comments. I currently have a 9x2tb RAID6 mdadm array that I really don't want to have to backup and restore, so I did the tests as it would be in my server (just on a smaller scale). I don't believe it would be a simple task to remove the device, bcache-ify it and put it back in. The blocks utility says that it supports it for any block device; but mdadm stores data about the offset in the superblocks instead of using the partition layout (which I discovered the hard way in the past and had to restore 10TB of backups from off-site :( ) It should be fine though doing it to the MDADM device itself. I wouldn't mind knowing the performance difference, so once I get an SSD for my server I might test it with both methods to do a bit of a comparison.
I didn't realize you could specify the backing and cache devices all in one line like that. I didn't see it in the wiki at all when I did my first lot of edits; seems useful. Does it automatically attach the cache to the backing devices as well?
Although I reboot fairly often for things like kernel updates/making sure pacman hasn't broken things, I am somewhat averse to rebooting when I don't have to. I migrated from a root device on an SSD and /home and /opt on a 3TB disk, on to a spare 1TB disk, did the tests for my blog on a different computer and then created the bcache device with the 3TB disk and 60GB SSD and migrated back all without a reboot. Sometimes rebooting can be easier, but it's more 'fun' not to :)
Justin8 (talk) 08:28, 15 November 2013 (UTC)
Well, as I mentioned, you could opt to restart udev rather than reboot, or alternatively you could choose to throw a udevadm trigger, or to have some fun with kexec, or merely to echo the block-device register commands to /sys/ as per usual. You have correctly surmised that the one-liner auto-attaches the backing-stores to the cache, but it does not perform the registration, which is necessary anyway every time an array is re/assembled as it is.
I added it to the bcache Arch wiki page myself after discovering it quite by accident between nods while browsing bcache's own documentation a few months ago. The command is represented in the wiki even now, if rather humbly as a side note to the initial backing-store formatting process. I mentioned it on your talk page as well, because you had apparently previously edited it not to fix its then anachronistic reference to no longer needing a now completely different step 4, but to note that it would no longer be necessary at boot. At first I was bewildered, but I've since corrected it to note that it now obviates step 6.
By the way, I might know of another thing worth saying, of which kexec's mention has reminded me: because of the way SystemD performs its unmounts at shutdown, it can be very slightly dangerous to run a multi-device write-back cache outside of its purview, as you must do if you boot to a bcache array but have not incorporated SystemD into your initramfs. If this is your situation currently, then likely you will be able to catch an occasional error message printed to the console about a forced unmount in the second or two just before your screen blanks its last. Of course bcache is supposed to be able to handle this and much more, and it never caused me any issues (though I run in write-through mode), but I did switch to the new SystemD hook just as soon as I noticed it. Specifically the SystemD kexec function should almost certainly be avoided whenever possible when operating a write-back cache without a true PID1 SystemD configuration.
Mikeserv (talk) 09:13, 15 November 2013 (UTC)