User:Cvlc/Storage layout and alignment

From ArchWiki

Gnome-colors-add-files-to-archive.pngThis article is being considered for archiving.Gnome-colors-add-files-to-archive.png

Reason: page created (Discuss in User talk:Cvlc/Storage layout and alignment)

Newer hard drives normally use 4 KB sectors, instead of the conventional 512 bytes. Although their technology is different, SSDs can also support one or both formats.

If the device and the filesystems / stacked block devices do not use the same sector sizes, the firmware will have to do the mapping between file system sectors and physical drive sector. While this is usually transparently done by the translation layer, it is an overhead that can be avoided, and doing so will achieve increased performance.

HDDs - Advanced Format

The Advanced Format is a generic term pertaining to any disk sector format used to store data on magnetic disks in hard disk drives (HDDs) that uses 4 kilobyte sectors instead of the traditional 512 byte sectors. The main idea behind using 4096-byte sectors is to increase the bit density on each track by reducing the number of gaps which hold Sync/DAM and ECC (Error Correction Code) information between data sectors. The old format gave a format efficiency of 88.7%, whereas Advanced Format results in a format efficiency of 97.3%.

There are two types of AF drives:

  • Advanced Format drives, marked with an orange "AF" logo: internally, they use 4k sectors, but provide an emulation layer for compatibility with OSes which lack support for them.
  • Advanced Format 4k native drives, marked with a blue "4Kn" logo: they require OS support (Windows 8+, or Linux 2.6.31+). Because they do not need a translation layer, they are cheaper, however they might be incompatible with old tools.

Check supported sector sizes

The physical and logical sector size of hard disk /dev/sdX can be determined by reading the following sysfs entries:

$ cat /sys/class/block/sdX/queue/physical_block_size
$ cat /sys/class/block/sdX/queue/logical_block_size

Drives with a translation layer (see above) will usually report a logical block size of 512 (for backwards compatibility) and a physical block size of 4096 (indicating they are AF drives).

Tools which will report the sector of a drive (provided the drive will report it correctly) includes:

# fdisk -l /dev/sdX | grep 'Sector size'
Note: If your system is set to a Locale different than English, add LANG=en before the above command, as the output of fdisk will be localized.
# smartctl -a /dev/sdX | grep 'Sector Size:'
# hdparm -I /dev/sdX | grep 'Sector size:'

Note that both works even for USB-attached discs (if the USB bridge supports SAT aka SCSI/ATA Translation, ANSI INCITS 431-2007).

SSDs

Most SSDs report their sector size as 512 bytes, even though they use larger sectors - typically 4 KiB, 8 KiB, or sometimes larger. As a result, filesystems cannot automatically optimize for the native sector size. To avoid sub-optimal performance, one can either :

Check supported sector sizes

Use smartmontools to check supported sector sizes :

# smartctl -a device
...
Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1
...

Setting native sector size

As an alternative to manually overriding the auto-detected sector size, some SSDs can have their sector size changed during formatting, so that they report a number closer to their true sector size.

NVMe

To see whether a given NVMe device supports this, use the Identify Namespace command.

# nvme id-ns /dev/nvme0n1
nlbaf   : 0
[...]
lbaf  0 : ms:0   lbads:9  rp:0 (in use)

nlbaf is the number of LBA formats minus 1, so here there is only one format supported. The list of formats is at the end of the output. Here lbaf 0 means LBA format #0. It has an lbads (LBA data size) of 9, which means sectors are 29 or 512 bytes. If the device is capable of 4 KiB sectors, there will be another entry here with an lbads of 12. The rp (Relative Performance) value indicates which format will provide the best performance, with 0 being the best. ms is (probably) the number of extra metadata bytes per sector, and this is not well supported under Linux so best to select a format with a value of 0 here.

To change the sector size, use nvme format and specify the preferred value with the --lbaf parameter.

SATA

For SATA devices, manufacturer specific programs must be used. Not all SATA devices support having the sector size changed.

Alignment

Aligning partitions correctly avoids excessive read-modify-write cycles. A typical practice for personal computers is to have each partition aligned to start at a 1 MiB (= 1,048,576 bytes) mark, which covers all common page and block size scenarios, as it is divisible by all commonly used sizes - 1 MiB, 512 KiB, 128 KiB, 4 KiB, and 512 B. fdisk, gdisk and parted handle alignment automatically. See GNU Parted#Check alignment if you want to verify your alignment after partitioning.

Dm-crypt

As of Cryptsetup 2.4.0, luksFormat automatically detects the optimal encryption sector size for LUKS2 format [1].

However, for this to work, the device needs to report the correct default sector size, see #Setting native sector size.

After using cryptsetup luksFormat, you can check the sector size used by the Luks volume with

# cryptsetup luksDump device | grep sector

If the default sector size is incorrect, you can force create a LUKS2 container with a 4K sector size and otherwise default options with :

# cryptsetup luksFormat --sector-size=4096 device

If you encrypted your device with the wrong sector size, the device can be re-encrypted by running :

Warning: This will break filesystems
# cryptsetup reencrypt --sector-size=4096 device

Filesystems

mkfs.btrfs, mkfs.jfs, mkfs.nilfs2 and mkfs.reiserfs default to a 4096 byte sector size.

mkfs.ext4, mkfs.f2fs, and mkfs.xfs set the optimal block size by default.

If the storage device does not report the correct sector size, you can explicitly format the partitions according to the physical sector size.

In particular shingled magnetic recording (SMR) drives that are firmware-managed are severely and negatively impacted if using a logical sector size of 512 bytes if their physical sector size is of 4096 bytes. Those drives have different performance writing zones and remapping reallocation occurs while being idle, but during heavy active writes (e.g., RAID resilvering, backups, writing many small files, rsync, etc.), a different file system sector size could drop write speed to single digit megabytes/second, as the higher performance write areas get depleted, and the sector translation layer gets overworked on the shingled areas.

Here are some examples to set the 4096-byte sector size explicitly:

# mkfs.ext4 -F -b 4096 /dev/device
# mkfs.xfs -f -s size=4096 /dev/device

See also