Difference between revisions of "Talk:Bcache"

From ArchWiki
Jump to: navigation, search
m
(Installing the Kernel)
Line 234: Line 234:
 
p.s. I don't get why the wiki page is full of btrfs stuff. Is this not a combination of commands? And shouldn't a more conventional partition/file system scheme be more useful for novice users (like me) ??
 
p.s. I don't get why the wiki page is full of btrfs stuff. Is this not a combination of commands? And shouldn't a more conventional partition/file system scheme be more useful for novice users (like me) ??
 
: [[User:Emesix|Emesix]] ([[User talk:Emesix|talk]]) 22:47, 14 November 2013 (UTC)
 
: [[User:Emesix|Emesix]] ([[User talk:Emesix|talk]]) 22:47, 14 November 2013 (UTC)
 +
 +
::Regarding btrfs and its usefulness for new users... I don't know. With btrfs for instance, there are many normally high-level file-system concepts rendered nearly as mundane as directory maintenance like snapshotting and subvolumes, and still others that require only a single edit to a boot config like {{ic|autodefrag}} and {{ic|<nowiki>compress=lzo</nowiki>}} delivering inline filesystem compression and cleanup. In practice, you really don't have to understand the hows of these things at all to get incredible use out of them.
 +
::Consider just sub volumes: they can deliver, at the very least, a separated partition structure for the various segments of the Linux filesystem as is often recommended people set at installation with syntax and simplicity like unto {{ic|cd}} and {{ic|mkdir}}, only the {{ic|btrfs --help}} implementation is probably a little friendlier. With traditional filesystems this same setup required knowledge of disk geometry, extent counts and all manner of other arcane things to configure in any sanely optimized way. Btrfs will create you a RAID in less than 30 seconds.
 +
::I guess what I'm saying is the research vs reward ratio is pretty significantly in your favor with a btrfs disk. Of course, should the btrfs filesystem crash, well, you should probably call someone else is all I'll say about that.
 +
::In this wiki, I definitely see your point, especially considering the added complexity bcache brings. When I was trying to set mine up I spent a good deal of time in both the btrfs and bcache IRC dev channels and both sides were less than optimistic about friendly cooperation between the two. Still, it works and has done for months. I know, don't call you, right?
 +
::Now about udev, I'll try again. So, your script implements a wait loop to check occasionally if udev has done a thing (bring up the disks and pull in basic drivers like SCSI for block devices and ext4 for their component filesystems) so it can do a thing (register the disks as small parts of a larger whole and pull in the bcache driver). The udev rules eschew the notion of a secondary monitor script such as yours and instead have udev do it all (any disk it brings up as a "bcache" type is registered as a small part of a larger whole and the bcache driver is pulled in).
 +
::So you couldn't boot from SSD reliably before because you couldn't reliably script the order in which udev would bring up the disks (a consequence of its faster parallel operations), but udev doesn't have to script its order, udev just udevs as udev udevs best.
 +
::See? You will always boot reliably from your chosen device because udev won't misinterpret its own report on your chosen device's readiness. And that's all - there's nothing wrong with the shell script exactly, except that it's just plain unnecessary and added complication.
 +
[[User:Mikeserv|Mikeserv]] ([[User talk:Mikeserv|talk]]) 23:38, 14 November 2013 (UTC)

Revision as of 23:38, 14 November 2013

Example installation

Here is mine Bcache installation. Maybe you can use some idea's. :)

Prepare the storage drive

Bcache-tools installation

fakeroot and binutils are only needed. but it will fail in a error if you only install them. wget is from a failed archbang instal :(

pacman -Syy
pacman -S base-devel fakeroot binutils wget git
cd ~/
wget https://aur.archlinux.org/packages/bc/bcache-tools-git/bcache-tools-git.tar.gz
tar -xzvf bcache-tools-git.tar.gz
cd bcache-tools-git
makepkg -si --asroot

Making the partitions

For /boot i use the slower HDD because i had troubles that mine SSD loaded the kernel while mine HDD was still spinning up :P

Starting with the faster SSD drive

fdisk /dev/sda

o
y
n
Partition type: p
Partition-number: 1
First-sector: [enter]
Last-sector: +90G

n
Partition type: p
Partition-number: 3
First-sector: [enter]
Last-sector: [enter]
 
write and exit

Now the slower 2 TB Harddrive

fdisk /dev/sdb

o
y
n
Partition type: p
Partition-number: 1
First-sector: [enter]
Last-sector: +1G
a

n
Partition type: p
Partition-number: 2
First-sector: [enter]
Last-sector: +16G [enter]
t
2
82

n
Partition type: p
Partition-number: 3
First-sector: [enter]
Last-sector: [enter]

write and exit

Making Bcache Device

The --wipe-bcache is to remove a error :) and the dd commands is or cleaning out old partition data.

make-bcache --wipe-bcache -B /dev/sdb3 -C /dev/sda2
dd if=/dev/zero count=1 bs=1024 seek=1 of=/dev/sda2
dd if=/dev/zero count=1 bs=1024 seek=1 of=/dev/sdb3

If next commands give a error: invalid argument Then the device is already registered.

echo /dev/sda2 > /sys/fs/bcache/register
echo /dev/sdb3 > /sys/fs/bcache/register

Formating the drives

mkfs.ext4 -E discard /dev/sda1
mkfs.ext4 -E discard /dev/sdb1
mkfs.ext4 /dev/bcache0
mkswap /dev/sdb2
swapon /dev/sdb2

Mount the partitions

cd ~/
mkdir -p /mnt/install
mount /dev/sda1 /mnt/install
mkdir -p /mnt/install/{boot,home,srv,data,var,mnt/.hdd}
mount /dev/sdb1 /mnt/install/boot
mount /dev/bcache0 /mnt/install/mnt/.hdd
mkdir -p /mnt/install/mnt/.hdd/{home,srv,data,var}
mount -o bind /mnt/install/mnt/.hdd/home /mnt/install/home
mount -o bind /mnt/install/mnt/.hdd/srv /mnt/install/srv
mount -o bind /mnt/install/mnt/.hdd/data /mnt/install/data
mount -o bind /mnt/install/mnt/.hdd/var /mnt/install/var

Bcache Check After mounting

Reason for /boot on the HDD is because of boot up failure because of HDD spin-up.

lsblk

NAME              MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                 8:0    0 167.7G  0 disk 
|-sda1              8:1    0    90G  0 part /mnt/install
`-sda2              8:2    0  77.7G  0 part 
  `-bcache0       253:0    0   1.8T  0 disk /mnt/install/mnt/.hdd
sdb                 8:16   0   1.8T  0 disk 
|-sdb1              8:17   0     1G  0 part /mnt/install/boot
|-sdb2              8:18   0    16G  0 part [SWAP]
`-sdb3              8:19   0   1.8T  0 part 
  `-bcache0       253:0    0   1.8T  0 disk /mnt/install/mnt/.hdd

Generate an fstab

Generate an fstab file with the following command. UUIDs will be used because they have certain advantages.

genfstab -U -p /mnt/install >> /mnt/install/etc/fstab
sed -i '/dev\/sda/ s/rw,relatime,data=ordered/defaults,noatime,discard/g' /mnt/install/etc/fstab
sed -i '/mnt\/install/ {s/\/mnt\/install//g; s/rw,relatime,data=ordered/defaults/g; s/none/bind/g}' /mnt/install/etc/fstab
more /mnt/install/etc/fstab

Install and configure kernel and bootloader

Adding Bcache module and hook

Edit /etc/mkinitcpio.conf as needed and re-generate the initramfs image with: adding the "bcache" module adding the "bcache_udev" hook between block and filesystems

sed -i '/^MODULES=/ s/"$/ bcache"/' /etc/mkinitcpio.conf
sed -i '/^HOOKS=/ s/block filesystems/block bcache_udev filesystems/' /etc/mkinitcpio.conf
more /etc/mkinitcpio.conf

Registering and attaching Bcache

This part is a pain in the A.. The problem is that Bcache wil give a error at startup if this is not done.

ERROR: Unable to find root device 'UUID=b6b2d82b-f87e-44d5-bbc5-c51dd7aace15'.
You are being dropped to a recovery shell
Type 'exit' to try and continue booting

First get the correct "SET UUID" with this ls command. And use the UUID like the example underneath. Also check for the use of the correct devices. echo: write error: Invalid argument <<< This error is good !!! means that your value was already stored.

ls /sys/fs/bcache/
2300f944-0eb5-4a9e-b052-8d5ea63cbb8f  register  register_quiet
sudo echo /dev/sda2 > /sys/fs/bcache/register
sudo echo /dev/sdb3 > /sys/fs/bcache/register
sudo echo 2300f944-0eb5-4a9e-b052-8d5ea63cbb8f > /sys/block/sdb/sdb3/bcache/attach
mv /usr/lib/initcpio/hooks/bcache ~/bcache.old
cat >> /usr/lib/initcpio/hooks/bcache << EOF
#!/usr/bin/ash
run_hook() {
    echo /dev/sda2 > /sys/fs/bcache/register
    echo /dev/sdb3 > /sys/fs/bcache/register
}
EOF

Installing the Kernel

mkinitcpio -p linux

I hope some stuff is helpfull :P 17:58, 11 November 2013 (UTC)

I'm curious about two things. First, though I forget what the '-E' switch signifies, it seems you're formatting the raw, underlayer partitions themselves. I don't understand why.
Also, the "pain in the a?" So udev is not assembling your array automatically as the devices are discovered? That is unexpected behavior, and I suspect it has to do with the formatting you performed and udev's superblock discovery. Check the very bottom of the bcache wiki on their site. At least try lsblk to verify "bcache" partition types are not registering as "ext4" before assembling the array.
Oh. And please sign talk pages.

Mikeserv (talk) 03:35, 12 November 2013 (UTC)


Well this is a experiment with Bcache with mine own Wiki page and 100x testing the install, i am not a linux expert just a newbie who was reading a lot of howto's. Just see it as a compilation of different web pages and some logical thinking. :)

The -E is just a copy and past ... not sure if it was needed.I just thought it was needed to get the discard flag while formating.

The pain in the A.... this was before Udev ... i was busy getting Bcache to work 3 months ago ... 75% of boot up failed when booting up cold. Later i found out that the SSD was too fast with booting up and the normal HDD was still spinning up.... After i splitted up the SSD for root directory and bcache and moved the /boot to the HDD since then i can bootup 100% of the time without any error... This was before the udev thingy (wich i dont understand and need to read up) :)

So please understand that i am just a linux noob trying to contribute :)

Emesix (talk)Emesix (talk) 00:02, 13 November 2013 (UTC)

No, I get it, and I didn't mean to insinuate that you were doing anything wrong, exactly, just that, as designed, it could operate better.
Basically, and you should check the man-pages and the wiki page here to be sure just in case I'm not entirely correct (which does happen), udev is the program that runs during boot (and pretty much all of the rest of the time, too) to detect your hardware and load drivers as necessary. Udev is a relatively recent addition to the Linux boot process and is unique in that it discovers and handles hardware as it shows up, can handle multiple devices simultaneously with non-blocking parallell operations, and, as a result, is significantly faster than the methods it has replaced.
I was the one who added the udev rules and step 7b to this wiki, and I was the one who wrote the "Rogue Superblock" section of "Troubleshooting" at the bottom of bcache's wiki. I only gained the knowledge to write either after spending several frustrating days with an issue that appears very similar to your "pain in the a."
As step 7a shows, the prevailing Arch Linux method for reassembling a bcache array at boot used to look something like:
1) Wait 5 secs.
2) Check to see if udev has discovered and added {disk UUIDS}. If yes goto 3), no, repeat 1) and 2).
3) echo {pain in the a stuff} /sys/bla/bla/bla
This does work, of course, but you're adding an unnecessary wait loop to your boot process and you're performing a task with brittle shell script that depends on specific UUIDs that could be performed flexibly and at discovery time.
That's what the udev rules do. Basically, the rules instruct udev to treat as part of your bcache array any partition it finds that reports itself as a "bcache" partition type. It builds the bcache array from disk partitions at every boot only as soon as those partitions are ready, which would of course resolve the race condition you mentioned and allow you to boot from SSD if you liked.
And here's where our experience might converge: I added a partition to my bcache array after previously formatting it as ext4. If you don't know about superblocks (or magic blocks or whatever they're called) then you're just like I was a few months ago. In short the filesystem has to have a way to report on itself, so it sets aside a small marker on disk that says, "Hey, OS, I'm a disk partition of type EXT4 and I start at block offset whatever and end at block offset whatever. Also, I enjoy long walks on the beach and prefer red wine to white. See you around."
The problem I experienced was that by default ext4 tended to begin at an earlier offset than bcache, and so even though I formatted over the previously ext4 partition with bcache as instructed, bcache never overwrote ext4's first superblock. When udev attempted to build my array, it would always miss my first partition because it would read the first superblock, identify the partition as ext4, then skip it. Echoing the add instruction to /sys/ after boot was my fix for a couple of days, but eventually I figured it all out and wrote that short bit on superblocks in bcache's wiki.
Maybe that helps?
EDIT: Just looked back at the wiki proper and somebody has performed some major changes recently. While it does appear cleaner, it no longer makes sense. There are references to non-existent steps throughout, and the actual why's that were included at least to some degree seem to have been occluded in favor of dubious, though possibly more efficient how's. Anyway, step 7b no longer exists, but it used to include the bcache_udev mkinitcpio hook I (very barely) adapted from mdadm_udev when trying to fix Arch's bcache boot problems. It's now included in the AUR package the wiki recommends I guess, despite the wiki also telling us we must "echo ... /sys/... at every bootup."

Mikeserv (talk) 07:08, 14 November 2013 (UTC)

Sorry about that, I missed a couple of references back to the sections I changed. Hopefully it makes more sense now. Mikeserv's udev rule is crucial to having a working bcache device on boot, as Emesix had discovered, it will often result in a failed boot. The echo in to /sys/bcache* step was missed when I updated it to mention the udev rule as the 'default' method.

--Justin8 (talk) 10:45, 14 November 2013 (UTC)

For clarity's sake, the udev rule was never mine. The used rule has been included in the AUR package's git pull from at least the time I first installed bcache some months ago. The package maintainer at that time opted not to use it, I guess, and instead installed the legacy-type shell-scripted for-loop mkinitcpio hook I outlined above, which was, to be fair, also the process recommended by every other source of information I was able to dig up at the time to include bcache's own wiki. Frustrated with having to wait for such things which seemed to me to defeat the whole purpose of buying and configuring an SSD boot device, I eventually stumbled upon the mdadm_udev hook in /use/share/initcpio and (only slightly) modified it to apply the 69-bcache-rules instead.
Justin, thanks for your recent change. I'm curious why you specify "unless assembling after boot?" Why include anything to do with bcache in the initramfs at all if you're building the array after the boot sequence?

Mikeserv (talk) 11:11, 14 November 2013 (UTC)

Emesix, just looked closer at your instructions, and you actually do the thing I had to do and outlined in the bcache wiki to overwrite the rogue superblock here:
dd if=/dev/zero count=1 bs=1024 seek=1 of=/dev/sd{a2,b3}
And the ext4 formatting you do after that I earlier incorrectly assumed to be for the "raw, underlayer partitions" is revealed upon further inspection to be for non-bcache partitions and so isn't even relevant; I apologize for jumping to conclusions before reviewing it fully, the quick check I did a couple of days ago just struck me as really familiar.
The udev stuff above should still be relevant, and it might be worth noting that the particular dd command above is by no means a cure-all. It should only be used as written if you have previously identified a superblock located at byte offset 1024 that needs overwriting, though the actual superblock location can vary by filesystem. I've since learned that a much more flexible and useful tool for this is wipefs, which is much less dangerous than it sounds.
Mikeserv (talk) 14:28, 14 November 2013 (UTC)

Thanks for the explanation of the super-block Mikeserv's was very insightful. But that Udev stuff is still a layer to high for me :P I don't know if its possible to do a little differently to avoid a unwanted wait loop:

1) Check to see if udev has discovered and added {disk UUIDS}. If yes goto 3)
2) Wait 5 secs and goto 1.
3) echo {pain in the a stuff} /sys/bla/bla/bla

The dd command is just a safety step. :) And because formatting the /dev/bcache0 was soon to follow, it didn't hurt if i accidentally killed the file system on the bcache :) I got the feeling that resizing the bcache partition was also giving problems wich the dd command fixed..... BTW i like the piece added with this command set:

# cd /sys/fs/bcache
# echo "UUID" > /sys/block/bcache0/bcache/attach

p.s. I don't get why the wiki page is full of btrfs stuff. Is this not a combination of commands? And shouldn't a more conventional partition/file system scheme be more useful for novice users (like me) ??

Emesix (talk) 22:47, 14 November 2013 (UTC)
Regarding btrfs and its usefulness for new users... I don't know. With btrfs for instance, there are many normally high-level file-system concepts rendered nearly as mundane as directory maintenance like snapshotting and subvolumes, and still others that require only a single edit to a boot config like autodefrag and compress=lzo delivering inline filesystem compression and cleanup. In practice, you really don't have to understand the hows of these things at all to get incredible use out of them.
Consider just sub volumes: they can deliver, at the very least, a separated partition structure for the various segments of the Linux filesystem as is often recommended people set at installation with syntax and simplicity like unto cd and mkdir, only the btrfs --help implementation is probably a little friendlier. With traditional filesystems this same setup required knowledge of disk geometry, extent counts and all manner of other arcane things to configure in any sanely optimized way. Btrfs will create you a RAID in less than 30 seconds.
I guess what I'm saying is the research vs reward ratio is pretty significantly in your favor with a btrfs disk. Of course, should the btrfs filesystem crash, well, you should probably call someone else is all I'll say about that.
In this wiki, I definitely see your point, especially considering the added complexity bcache brings. When I was trying to set mine up I spent a good deal of time in both the btrfs and bcache IRC dev channels and both sides were less than optimistic about friendly cooperation between the two. Still, it works and has done for months. I know, don't call you, right?
Now about udev, I'll try again. So, your script implements a wait loop to check occasionally if udev has done a thing (bring up the disks and pull in basic drivers like SCSI for block devices and ext4 for their component filesystems) so it can do a thing (register the disks as small parts of a larger whole and pull in the bcache driver). The udev rules eschew the notion of a secondary monitor script such as yours and instead have udev do it all (any disk it brings up as a "bcache" type is registered as a small part of a larger whole and the bcache driver is pulled in).
So you couldn't boot from SSD reliably before because you couldn't reliably script the order in which udev would bring up the disks (a consequence of its faster parallel operations), but udev doesn't have to script its order, udev just udevs as udev udevs best.
See? You will always boot reliably from your chosen device because udev won't misinterpret its own report on your chosen device's readiness. And that's all - there's nothing wrong with the shell script exactly, except that it's just plain unnecessary and added complication.

Mikeserv (talk) 23:38, 14 November 2013 (UTC)