Difference between revisions of "Securely wipe disk"

From ArchWiki
Jump to: navigation, search
m (shred: adjust wording; not giving an extra example for erasing files because of the scope of this page, replacing wikipedia link with online project home/manpage, the main content of the wikipedia link is mentioned elsewhere in this article)
(Data remanence: Cleaned up and removed the myth that a scanning electron microscope could recover data. Reorganized the whole section and added sources.)
Line 224: Line 224:
 
See [http://superuser.com/questions/19326/how-to-wipe-free-disk-space-in-linux the tools list] for more info.
 
See [http://superuser.com/questions/19326/how-to-wipe-free-disk-space-in-linux the tools list] for more info.
  
== Data remanence ==
+
== Data Remanence ==
 
{{Expansion|This section is too dependent on links to Wikipedia. Links to diverse and high quality resources should be added.}}
 
{{Expansion|This section is too dependent on links to Wikipedia. Links to diverse and high quality resources should be added.}}
  
 
See also [[Wikipedia:Data remanence]].
 
See also [[Wikipedia:Data remanence]].
  
The residual representation of data may remain even after attempts have been made to remove or erase the data.
+
=== Recoverable Data and the Residual Magnetism Myth ===
  
Residual data may get wiped by writing (random) data to the disk with a single or even more than one iteration. However, more than one iteration may not significantly decrease the possibility to reconstruct the data of hard disk drives. For some information see the results presented in [https://web.archive.org/web/20120102004746/http://www.h-online.com/newsticker/news/item/Secure-deletion-a-single-overwrite-will-do-it-739699.html a paper from 2008].
+
It is a widespread, and in modern terms, almost entirely unfounded belief that a single ''full overwrite'' of zeros or even several overwrites of random data to a disk can leave any recoverable data. This myth can be traced back to a 1996 by paper Dr. Peter Gutmann, where many of the points he discussed have become wildly misinterpreted. Dr. Gutmann's article was referring to the possibility of seeing single residual bits from older low-density drives. In an addendum to his original paper, [http://www.howtogeek.com/115573/htg-explains-why-you-only-have-to-wipe-a-disk-once-to-erase-it/ Gutmann wrote:]
 +
<blockquote>“…with modern high-density drives, even if you’ve got 10KB of sensitive data on a drive and can’t erase it with 100% certainty, the chances of an adversary being able to find the erased traces of that 10KB in 200GB of other erased traces are close to '''zero'''.”</blockquote>
  
=== Random data ===
+
From [http://security.stackexchange.com/questions/26132/is-data-remanence-a-myth/26134#26134 security.stackexchange.com]:
If the data can get exactly located on the disk and was never copied anywhere else, wiping with random data can be thoroughgoing and impressively quick as long there is enough entropy in the pool.
+
<blockquote>..[It has been] demonstrated that correctly wiped data cannot reasonably be retrieved even if it is of a small size or found only over small parts of the hard drive.. The forensic recovery of data using electron microscopy is infeasible. This was true both on old drives and has become more difficult over time.</blockquote>
  
A good example is cryptsetup using /dev/urandom for [[Dm-crypt/Device encryption#Keyslot_management|wiping the LUKS keyslots]].
+
If one is still worried about billion dollar black-ops agencies recovering your data, [[Wikipedia:Degaussing#Degaussing magnetic data storage media|degaussing]] is a common practiced countermeasure.
 +
 
 +
For articles detailing the misconceptions and incorrect information contained in Gutmann's paper, [http://www.nber.org/sys-admin/overwritten-data-guttman.html see this NBER article] and [http://www.howtogeek.com/115573/htg-explains-why-you-only-have-to-wipe-a-disk-once-to-erase-it/ how-to geek article].
  
 
=== Hardware specific issues ===
 
=== Hardware specific issues ===
 +
There are however a [https://kromey.us/2013/04/the-myth-of-data-remanence-484.html few real possibilities] of forensic data recovery.
 +
 +
==== Marked Bad Sectors ====
 +
 +
If a hard drive marks a sector as bad, it cordons it off, and the section becomes impossible to write to via software. Thus a full overwrite would not reach it. However because of block sizes, these sections would only amount to a few '''theoretically''' recoverable KB.
 +
 
==== Flash memory ====
 
==== Flash memory ====
 
[[Wikipedia:Write amplification|Write amplification]] and other characteristics make Flash memory a stubborn target for reliable wiping.
 
[[Wikipedia:Write amplification|Write amplification]] and other characteristics make Flash memory a stubborn target for reliable wiping.
As there is a lot of transparent abstraction in between data as seen by a device's controller chip and the operating system sight data is never overwritten in place and wiping particular blocks or files is not reliable.
+
As there is a lot of transparent abstraction in between data as seen by a device's controller chip and the operating system sight, data is never overwritten in place and wiping particular blocks or files is not reliable.
  
 
Other "features" like transparent compression (all SandForce SSD's) can compress your /dev/zero or pattern stream so if wiping is fast beyond belief this might be the case.
 
Other "features" like transparent compression (all SandForce SSD's) can compress your /dev/zero or pattern stream so if wiping is fast beyond belief this might be the case.
Line 247: Line 256:
 
Disassembling Flash memory devices, unsoldering the chips and analyzing data content without the controller in between is feasible without difficulty using [http://www.flash-extractor.com/manual/reader_models/ simple hardware]. Data recovery companys do it for cheap money.
 
Disassembling Flash memory devices, unsoldering the chips and analyzing data content without the controller in between is feasible without difficulty using [http://www.flash-extractor.com/manual/reader_models/ simple hardware]. Data recovery companys do it for cheap money.
  
For more information see: [http://www.usenix.org/events/fast11/tech/full_papers/Wei.pdf Reliably Erasing Data From Flash-Based Solid State Drives].
+
=== Sources ===
 +
[1] Securely erasing flash memory: http://www.usenix.org/events/fast11/tech/full_papers/Wei.pdf Reliably Erasing Data From Flash-Based Solid State Drives].
 +
 
 +
[2] How-to Geek: https://web.archive.org/web/20120102004746/http://www.h-online.com/newsticker/news/item/Secure-deletion-a-single-overwrite-will-do-it-739699.html
  
==== Residual magnetism ====
+
[3] Can Intelligence Agencies Read Overwritten Data?: http://www.nber.org/sys-admin/overwritten-data-guttman.html
Wiped hard disk drives and other magnetic storage can get disassembled in a cleanroom and then analyzed with equipment like a [[Wikipedia:Magnetic force microscope|magnetic force microscope]]. This may allow the overwritten data to be reconstructed by analyzing the measured [[Wikipedia:Remanence|residual magnetics]].
 
  
This method of data recovery for current HDD's is largely theoretical and would require substantial financial resources. Nevertheless [[Wikipedia:Degaussing#Degaussing magnetic data storage media|degaussing]] is still a practiced countermeasure.
+
[4] The Myth of Data Remanence: https://kromey.us/2013/04/the-myth-of-data-remanence-484.html
  
==== Old magnetic storage ====
+
[5] security.stackexchange.com; Is data remanence a myth?: http://security.stackexchange.com/questions/26132/is-data-remanence-a-myth
Securely wiping old magnetic storage (e.g. floppy disks, magnetic tape) is much harder due to much lower [[Wikipedia:Memory storage density|memory storage density]]. Many iterations with random data might be needed to wipe any sensitive data. To ensure that data has been completely erased most resources advise physical destruction.
 
  
==== Operating system, programs and filesystem ====
+
[6] [[Wikipedia:Data remanence]]
{{Note|This is not specific to any hardware obviously.}}
 
The operating system, executed programs or [[Wikipedia:Journaling file system|journaling file system]]s may copy your unencrypted data throughout the block device. When writing to plain disks this should only be relevant in conjunction with one of the above.
 

Revision as of 20:19, 7 January 2015

Wiping a disk is done by writing new data over every single bit.

Note: References to "disks" in this article also apply to loopback devices.

Common use cases

Wipe all data left on the device

The most common usecase for completely and irrevocably wiping a device will be when the device it going to be given away or also maybe sold. There may be (unencrypted) data left on the device and you want to protect against simple forensic investigation that is mere child's play with for example File recovery software.

If you want to quickly wipe everything from the disk /dev/zero or simple patterns allow maximum performance while adequate randomness can be advantageous in some cases that should be covered up in #Data remanence.

Every overwritten bit means to provide a level of data erasure not allowing recovery with normal system functions (like standard ATA/SCSI commands) and hardware interfaces. Any file recovery software mentioned above then would need to be specialized on proprietary storage-hardware features.

In case of a HDD data recreation will not be possible without at least undocumented drive commands or fiddling about the device’s controller or firmware to make them read out for example reallocated sectors (bad blocks that S.M.A.R.T. retired from use).

There are different wiping issues with different physical storage technologys, most notably all Flash memory based devices and older magnetic storage (old HDD's, floppy disks, tape).

Preparations for block device encryption

If you want to prepare your drive to securely set up Disk encryption#Block device encryption inside the wiped area afterwards you really should use #Random data generated by a trusted cryptographically strong random number generator (referred to as RNG in this article from now on).

See also Wikipedia:Random number generation.

Warning: If Block device encryption is mapped on a partition that contains anything else than random/encrypted data, disclosure of usage patterns on the encrypted drive is possible and weakens the encryption being comparable with filesystem-level-encryption. Never use /dev/zero, simple patterns (badblocks, eg.) or other unrandom data before setting up Block device encryption if you are serious about it!

Select a target

Note: Fdisk will not work on GPT formatted devices. Use gdisk (gptfdisk) instead.

Use fdisk to locate all read/write devices the user has read acess to.

Check the output for lines that start with devices such as /dev/sdX.

This is an example for a HDD formatted to boot a linux system:

# fdisk -l
Disk /dev/sda: 250.1 GB, 250059350016 bytes, 488397168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00ff784a

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048      206847      102400   83  Linux
/dev/sda2          206848   488397167   244095160   83  Linux

Or the Arch Install Medium written to a 4GB USB thumb drive:

# fdisk -l
Disk /dev/sdb: 4075 MB, 4075290624 bytes, 7959552 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x526e236e

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           0      802815      401408   17  Hidden HPFS/NTFS

If you are worried about unintentional damage of important data on the primary computer, consider using an isolated environment such as a virtual environment (VirtualBox, VMWare, QEMU, etc...) with direct connected disk drives to it or a single computer only with a storage disk(s) that need to be wiped booted from a Live Media(USB, CD, PXE, etc...).

Select a block size

See also Wikipedia:Dd (Unix)#Block size, blocksize io-limits.

If you have a Advanced Format hard drive it is recommended that you specify a block size larger than the default 512 bytes. To speed up the overwriting process choose a block size matching your drive's physical geometry by appending the block size option to the dd command (i.e. bs=4096 for 4KB).

fdisk prints physical and logical sector size for every disk.

Alternatively sysfs does expose information:

/sys/block/sdX/size
/sys/block/sdX/queue/physical_block_size
/sys/block/sdX/queue/logical_block_size
/sys/block/sdX/sdXY/alignment_offset
/sys/block/sdX/sdXY/start
/sys/block/sdX/sdXY/size

Calculate blocks to wipe manually

In the following the determination of the data area to wipe is done in an example.

A block storage devices contains sectors and a size of a single sector that can be used to calculate the whole size of device in bytes. You can do it by multiplying sectors with size of the sector.

As an example we use the parameters with the dd command to wipe a partition:

# dd if=data_source of=/dev/sdX bs=sector_size count=sector_number seek=partitions_start_sector

Here you will see only a part of output of fdisk -l /dev/sdX with root, showing the example partition information:

Device     Boot      Start        End         Sectors     Size  Id Type
/dev/sdXA            2048         3839711231  3839709184  1,8T  83 Linux
/dev/sdXB            3839711232   3907029167  67317936    32,1G  5 Extended

The first line of the fdisk output shows the disk size in bytes and logical sectors:

Disk /dev/sdX: 1,8 TiB, 2000398934016 bytes, 3907029168 sectors

To calculate size of a single logical sector use echo $((2000398934016 / 3907029168)) or use data from the second line of fdisk output:

Units: sectors of 1 * 512 = 512 bytes

To calculate physical sectors that will make it work faster we can use the third line:

Sector size (logical/physical): 512 bytes / 4096 bytes

To get size in the physical sectors you will need the known disk size in bytes divided with physical sectors echo $((2000398934016 / 4096)), you can get size of the storage device or partition on it even with the blockdev --getsize64 /dev/sdX(Y) command.

Note: In the examples below we will use the logical sector size.

To wipe partition /dev/sdXA the example parameters with logical sectors would be used like this:

# dd if=data_source of=/dev/sdXA bs=512 count=3839709184 seek=2048

Or, to wipe the whole disk (count= optional):

# dd if=data_source of=/dev/sdX bs=512 count=3907029168 seek=0
Warning: Without count= or if count= miss-configured to point outside of the possible size, it will show an error that the devices is ended and no future writes are possible when it comes to the end of it.

Select a data source

As just said If you want to wipe sensitive data you can use anything matching your needs.

If you want to setup block device encryption afterwards, you should always wipe at least with an encryption cipher as source or even pseudorandom data.

For data that is not truly random your disk's writing speed should be the only limiting factor. If you need random data, the required system performance to generate it may extremely depend on what you choose as source of entropy.

Note: Everything regarding Benchmarking disk wipes should get merged there.

Non-random data

Overwriting with /dev/zero or simple patterns is considered secure in most resources. In the case of current HDD's it should be sufficient for fast disk wipes.

Warning: A drive that is abnormally fast in writing patterns or zeroing could be doing transparent compression. It is obviously presumable not all blocks get wiped this way. Some #Flash memory devices do "feature" that.

Pattern write test

#Badblocks can write simple patterns to every block of a device and then read and check them searching for damaged areas (just like memtest86* does with memory).

As the pattern is written to every accesible block this effectively wipes the device.

Random data

For differences between random and pseudorandom data as source, please see Random number generation.

Note: Data that is hard to compress (random data) will get written slower, if the drive logic mentioned in the #Unrandom data warning tries compressing it. This should not lead to #Data remanence though. As maximum write-speed is not the performance-bottleneck it can get completely neglected while wiping disks with random data.

Encrypted data

When preparing a drive for full-disk encryption, sourcing high quality entropy is usually not necessary. The alternative is to use an encrypted datastream. For example, if you will use AES for your encrypted partition, you would wipe it with an equivalent encryption cipher prior to creating the filesystem to make the empty space not distinguishable from the used space.

Overwrite the target

The chosen drive can be overwritten with several utilities, make your choice.

dd

See also Core utilities#dd.

Warning: There is no confirmation regarding the sanity of this command so repeatedly check that the correct drive or partition has been targeted. Make certain that the of=... option points to the target drive and not to a system disk.

Zero-fill the disk by writing a zero byte to every addressable location on the disk using the /dev/zero stream. iflag and oflag as below will try to disable buffering, which is senseless for a constant stream.

# dd if=/dev/zero of=/dev/sdX iflag=nocache oflag=direct bs=4096

Or the /dev/urandom stream:

# dd if=/dev/urandom of=/dev/sdX bs=4096

The process is finished when dd reports, No space left on device:

dd: writing to ‘/dev/sdb’: No space left on device
7959553+0 records in
7959552+0 records out
4075290624 bytes (4.1 GB) copied, 1247.7 s, 3.3 MB/s

Advanced example

Get the number of sectors (NUM_BLOCKS), the sector size (LOGIC_BLOCK_SIZE) and (optionally) total disk size in bytes (DISK_SIZE), of the device to be wiped:

# fdisk -l /dev/sdX 
Disk /dev/sdX: 21,5 GiB, 23045603328 bytes, 45010944 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

and use them in the following command to randomize the drive/partition using a randomly-seeded AES cipher from OpenSSL (displaying the optional progress meter with pv):

# openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt </dev/zero \
    | pv -bartpes <DISK_SIZE> | dd bs=<LOGIC_BLOCK_SIZE> count=<NUM_BLOCKS> of=/dev/sdX

The command above creates a 128 byte encryption key seeded from /dev/urandom. AES-256 in CTR mode is used to encrypt /dev/zero's output with the urandom key. Utilizing the cipher instead of a pseudorandom source results in very high write speeds and the result is a device filled with AES ciphertext.

See also Dm-crypt/Drive preparation#dm-crypt wipe before installation for a similar approach.

shred

shred is a Unix command that can be used to securely delete individual files or full devices so that they can be recovered only with great difficulty with specialised hardware, if at all. shred uses three passes, writing pseudo-random data to the device during each pass. This can be reduced or increased.

The following command invokes shred with its default settings and displays the progress.

# shred -v /dev/sdX

Alternatively, shred can be instructed to do only one pass, with entropy from e.g. /dev/urandom.

# shred --verbose --random-source=/dev/urandom -n1 /dev/sdX

Badblocks

For letting badblocks perform a disk wipe, a destructive read-write test has to be done:

# badblocks -c <NUMBER_BLOCKS> -wsv /dev/<drive>

hdparm

hdparm supports ATA Secure Erase, which is functionally equivalent to zero-filling a disk. It is however handled by the hard-drive firmware itself, and includes "hidden data areas". As such, it can be seen as a modern-day "low-level format" command. SSD drives reportedly achieve factory performance after issuing this command, but may not be sufficiently wiped (see #Flash_memory).

Some drives support Enhanced Secure Erase, which uses distinct patterns defined by the manufacturer. If the output of hdparm -I for the device indicates a manifold time advantage for the Enhanced erasure, the device probably has a hardware encryption feature and the wipe will be performed to the encryption keys only.

For detailed instructions on using ATA Secure Erase, see the Linux ATA wiki.

secure-delete

The secure-deleteAUR package from AUR provides several utilites for secure erasion, including sfill, which deletes only free space in a specified mount. For example:

# sfill -v /

See the tools list for more info.

Data Remanence

Tango-view-fullscreen.pngThis article or section needs expansion.Tango-view-fullscreen.png

Reason: This section is too dependent on links to Wikipedia. Links to diverse and high quality resources should be added. (Discuss in Talk:Securely wipe disk#)

See also Wikipedia:Data remanence.

Recoverable Data and the Residual Magnetism Myth

It is a widespread, and in modern terms, almost entirely unfounded belief that a single full overwrite of zeros or even several overwrites of random data to a disk can leave any recoverable data. This myth can be traced back to a 1996 by paper Dr. Peter Gutmann, where many of the points he discussed have become wildly misinterpreted. Dr. Gutmann's article was referring to the possibility of seeing single residual bits from older low-density drives. In an addendum to his original paper, Gutmann wrote:

“…with modern high-density drives, even if you’ve got 10KB of sensitive data on a drive and can’t erase it with 100% certainty, the chances of an adversary being able to find the erased traces of that 10KB in 200GB of other erased traces are close to zero.”

From security.stackexchange.com:

..[It has been] demonstrated that correctly wiped data cannot reasonably be retrieved even if it is of a small size or found only over small parts of the hard drive.. The forensic recovery of data using electron microscopy is infeasible. This was true both on old drives and has become more difficult over time.

If one is still worried about billion dollar black-ops agencies recovering your data, degaussing is a common practiced countermeasure.

For articles detailing the misconceptions and incorrect information contained in Gutmann's paper, see this NBER article and how-to geek article.

Hardware specific issues

There are however a few real possibilities of forensic data recovery.

Marked Bad Sectors

If a hard drive marks a sector as bad, it cordons it off, and the section becomes impossible to write to via software. Thus a full overwrite would not reach it. However because of block sizes, these sections would only amount to a few theoretically recoverable KB.

Flash memory

Write amplification and other characteristics make Flash memory a stubborn target for reliable wiping. As there is a lot of transparent abstraction in between data as seen by a device's controller chip and the operating system sight, data is never overwritten in place and wiping particular blocks or files is not reliable.

Other "features" like transparent compression (all SandForce SSD's) can compress your /dev/zero or pattern stream so if wiping is fast beyond belief this might be the case.

Disassembling Flash memory devices, unsoldering the chips and analyzing data content without the controller in between is feasible without difficulty using simple hardware. Data recovery companys do it for cheap money.

Sources

[1] Securely erasing flash memory: http://www.usenix.org/events/fast11/tech/full_papers/Wei.pdf Reliably Erasing Data From Flash-Based Solid State Drives].

[2] How-to Geek: https://web.archive.org/web/20120102004746/http://www.h-online.com/newsticker/news/item/Secure-deletion-a-single-overwrite-will-do-it-739699.html

[3] Can Intelligence Agencies Read Overwritten Data?: http://www.nber.org/sys-admin/overwritten-data-guttman.html

[4] The Myth of Data Remanence: https://kromey.us/2013/04/the-myth-of-data-remanence-484.html

[5] security.stackexchange.com; Is data remanence a myth?: http://security.stackexchange.com/questions/26132/is-data-remanence-a-myth

[6] Wikipedia:Data remanence