Advanced Format

From ArchWiki

Tango-go-next.pngThis article or section is a candidate for moving to Storage layout and alignment.Tango-go-next.png

Notes: "Advanced Format" is HDD-specific. (Discuss in Talk:Advanced Format#Rewrite Advanced Format to a new Sector Sizes page)

All storage devices have a minimum storage unit that they can use. This smallest usable unit is generally referred to as a sector, as this was the smallest subdivision of the rotating parts of traditional storage devices (solid state drives use memory cells, and their smallest unit is a page). See [1]

Different storage devices use different sector sizes. Since 2011, modern hard disk drives normally use 4 KiB sectors, instead of the conventional 512 bytes. Solid state drives can sometimes support multiple formats.

The different "layers", ie. device / stacked block devices / file systems should use the same sector sizes. If they do not use the same sector sizes, the firmware will have to do the mapping between file system sectors and physical drive sectors. While this is usually transparently done by the translation layer, it is an overhead that can be avoided, and doing so will achieve increased performance.

Hard disk drives – Advanced Format

Tango-view-fullscreen.pngThis article or section needs expansion.Tango-view-fullscreen.png

Reason: Some HDDs with 4 KiB physical sector size support changing their logical sector size. E.g. Seagate FastFormat. (Discuss in Talk:Advanced Format)

The Advanced Format is a generic term pertaining to any disk sector format used to store data on magnetic disks in hard disk drives that uses 4 kibibyte sectors instead of the traditional 512 byte sectors. The main idea behind using 4096-byte sectors is to increase the bit density on each track by reducing the number of gaps which hold Sync/DAM and error correction code (ECC) information between data sectors. The old format gave a format efficiency of 88.7%, whereas Advanced Format results in a format efficiency of 97.3%.

There are two types of Advanced Format drives:

  • Advanced Format drives, marked with an orange "AF" logo: internally, they use 4k sectors, but provide an emulation layer for compatibility with operating systems which lack support for them.
  • Advanced Format 4k native drives, marked with a blue "4Kn" logo: they require support from the operating system (Windows 8+, or Linux 2.6.31+). Because they do not need a translation layer, they are cheaper, however they might be incompatible with old tools.

Check supported sector sizes

The physical and logical sector size of hard disk /dev/sdX can be determined by reading the following sysfs entries:

$ cat /sys/class/block/sdX/queue/physical_block_size
$ cat /sys/class/block/sdX/queue/logical_block_size

Drives with a translation layer (see above) will usually report a logical block size of 512 (for backwards compatibility) and a physical block size of 4096 (indicating they are Advanced Format drives).

Tools which will report the sector of a drive (provided the drive will report it correctly) includes:

  • fdisk:
    # LC_ALL=C fdisk -l /dev/sdX | grep 'Sector size'
  • smartmontools:
    # smartctl -a /dev/sdX | grep 'Sector Size'
  • hdparm:
    # hdparm -I /dev/sdX | grep 'Sector size:'

Note that both works even for USB-attached discs (if the USB bridge supports SAT aka SCSI/ATA Translation, ANSI INCITS 431-2007).

Solid state drives

Most solid state drives (SSDs) report their sector size as 512 bytes, even though they use larger sectors - typically 4 KiB, 8 KiB, or sometimes larger. As a result, file systems cannot automatically optimize for the native sector size. To avoid sub-optimal performance, one can either:

  • Manually specify the sector size when creating a file system,
  • Change the native sector size reported by the device.

Check supported sector sizes of NVMe drives

Use smartmontools to check supported sector sizes:

# smartctl -a device
...
Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1
...

Setting native sector size

As an alternative to manually overriding the auto-detected sector size, some SSDs can have their sector size changed during formatting, so that they report a number closer to their true sector size.

NVMe

To see whether a given NVMe device supports this, use the Identify Namespace command.

# nvme id-ns -H /dev/nvme0n1 | grep "Relative Performance"
LBA Format  0 : Metadata Size: 0   bytes - Data Size: 512 bytes - Relative Performance: 0x2 Good (in use)
LBA Format  1 : Metadata Size: 0   bytes - Data Size: 4096 bytes - Relative Performance: 0x1 Better
  • Metadata Size is (probably) the number of extra metadata bytes per sector, and this is not well supported under Linux so best to select a format with a value of 0 here.
  • Relative Performance indicates which format will provide good, better or the best performance.

To change the sector size, use nvme format and specify the preferred value with the --lbaf parameter:

# nvme format --lbaf=1 /dev/nvme0n1
You are about to format nvme0n1, namespace 0x1.
WARNING: Format may irrevocably delete this device's data.
You have 10 seconds to press Ctrl-C to cancel this operation.

Use the force [--force] option to suppress this warning.
Sending format operation ... 
Success formatting namespace:1

This should take just a few seconds to proceed.

Tango-inaccurate.pngThe factual accuracy of this article or section is disputed.Tango-inaccurate.png

Reason: Before fiddling with power state or BIOS settings, users are encouraged to use native nvme commands to check operation logs, or soft reset the controller with e.g. nvme reset. (Discuss in Talk:Advanced Format)

If nvme format fails, try putting the machine to sleep (e.g., with systemctl suspend) and then try running nvme format again after waking it. If nvme format still fails, fiddling with your BIOS settings might help.

SATA

Tango-inaccurate.pngThe factual accuracy of this article or section is disputed.Tango-inaccurate.png

Reason: Do SATA SSDs that support changing their sector size even exist?[2] (Discuss in Talk:Advanced Format)

For SATA devices, manufacturer specific programs must be used. Not all SATA devices support having the sector size changed.

Intel

Tango-view-refresh-red.pngThis article or section is out of date.Tango-view-refresh-red.png

Reason: Ever since Intel's SSD business was bought by SK Hynix under the Solidigm brand, Intel MAS cannot be used to manage SSDs, only Optane products. Solidigm provides solidigm-sst-storage-tool-cliAUR to perform the same functions Intel MAS used to offer: see Solid_state_drive/NVMe#Intel/Solidigm. (Discuss in Talk:Advanced Format)

For Intel use the Intel Memory and Storage (MAS) Tool (intel-mas-cli-toolAUR) with the -set PhysicalSectorSize=4096 option.

Seagate

For Seagate use seagate-seachestAUR.

Scan all drives to find the correct one, and print info from the one you found:

# SeaChest_Basics --scan
# SeaChest_Basics -d /dev/sgX -i

Should print out information about the drive. Make sure to check the serial number.

Check the logical block sizes supported by the drive:

# SeaChest_Format -d /dev/sgX --showSupportedFormats

If 4096 is listed, you can change the logical sector size to it as follows:

# SeaChest_Format -d /dev/sgX --setSectorSize=4096 --confirm this-will-erase-data

This will take a couple of minutes, after which your drive now uses a 4K native sector size.

Partition alignment

Aligning partitions correctly avoids excessive read-modify-write cycles. A typical practice for personal computers is to have each partition's start and size aligned to 1 MiB (1 048 576 bytes) marks. This covers all common page and block size scenarios, as it is divisible by all commonly used sizes—1 MiB, 512 KiB, 128 KiB, 4 KiB, and 512 B.

Warning: Misaligned partitions will prevent being able to use 4096 byte sectors with dm-crypt/LUKS. See [3].
  • fdisk, cfdisk and sfdisk handle alignment automatically.
  • gdisk and cgdisk handle alignment automatically.
    • sgdisk by default only aligns the start of partitions. Use the -I/--align-end option to additionally enable partition size/end alignment.
  • Parted only aligns the start of the partition, but not the size/end. When creating partitions, make sure to specify the partition end in mebibytes or a larger IEC binary unit.

dm-crypt

Tango-view-fullscreen.pngThis article or section needs expansion.Tango-view-fullscreen.png

Reason: Add example for plain dm-crypt. (Discuss in Talk:Advanced Format)

As of Cryptsetup 2.4.0, luksFormat automatically detects the optimal encryption sector size for LUKS2 format [4].

However, for this to work, the device needs to report the correct default sector size, see #Setting native sector size.

After using cryptsetup luksFormat, you can check the sector size used by the LUKS2 volume with

# cryptsetup luksDump device | grep sector

If the default sector size is incorrect, you can force create a LUKS2 container with a 4K sector size and otherwise default options with:

# cryptsetup luksFormat --sector-size=4096 device

The command will abort on an error if the requested size does not match your device:

# cryptsetup luksFormat --sector-size 4096 device
(...)
Verify passphrase: 
Device size is not aligned to requested sector size.
Note: See cryptsetup issue 585 for why the command may fail while the underlying drive does use 4K physical sectors.

If you encrypted your device with the wrong sector size, the device can be re-encrypted by running:

Warning: The contained file system must have a block size of 4096 bytes or a multiple of it, otherwise it will break.
# cryptsetup reencrypt --sector-size=4096 device

File systems

Tango-view-fullscreen.pngThis article or section needs expansion.Tango-view-fullscreen.png

Reason: Differentiate which mkfs utilities use 4096 explicitly and which use the page size (getconf PAGESIZE). (Discuss in Talk:Advanced Format)

mkfs.btrfs(8), mkfs.jfs(8), mkfs.nilfs2(8), mkfs.reiserfs(8) and mkswap(8) default to a 4096 byte sector size.

mkfs.ext4(8) defaults to 1024 byte sectors for file systems smaller than 512 MiB and 4096 byte sectors for 512 MiB and larger.

mkfs.xfs(8) defaults to 512 byte sectors, but will use 4096 for 512e and 4Kn disks.

mkfs.f2fs(8), mkfs.fat(8), mkfs.ntfs(8) and mkfs.udf(8) use the backing device's logical sector size. I.e. they will use 512 byte sectors for 512e disks and 4096 byte sectors for 4Kn disks.

zpool-create(8) (from ZFS) defaults to 512 (2⁹) byte sectors, the sector size should be set explicitly at pool creation if using advanced format disks with the parameter -o ashift=12 (2¹², 4096 bytes).

If the storage device does not report the correct sector size, you can explicitly format the partitions according to the physical sector size.

In particular shingled magnetic recording (SMR) drives that are firmware-managed are severely and negatively impacted if using a logical sector size of 512 bytes if their physical sector size is of 4096 bytes. Those drives have different performance writing zones and remapping reallocation occurs while being idle, but during heavy active writes (e.g., RAID resilvering, backups, writing many small files, rsync, etc.), a different file system sector size could drop write speed to single digit megabytes/second, as the higher performance write areas get depleted, and the sector translation layer gets overworked on the shingled areas.

Here are some examples to set the 4096-byte sector size explicitly:

  • ext4:
    # mkfs.ext4 -b 4096 /dev/device
  • XFS:
    # mkfs.xfs -s size=4096 /dev/device
  • FAT:
    # mkfs.fat -S 4096 /dev/device
  • NTFS-3G:
    # mkfs.ntfs -Q -s 4096 /dev/device
  • UDF:
    # mkfs.udf -b 4096 /dev/device
  • ZFS:
    # zpool create -o ashift=12 poolname raidz device0deviceN

See also