ZFS Reference & Cheat Sheet

A working ZFS cheat sheet covering pools, datasets, snapshots, send/recv, compression and the bits you forget at 2am. Started on OpenSolaris around 2008, kept current with OpenZFS on FreeBSD and Proxmox.

!! NOTE
This was originally written for OpenSolaris back when I picked up ZFS around 2007-2008 and have been updating it ever since. Most of these commands work across Solaris, FreeBSD and OpenZFS on Linux but I’ve called out version-specific bits where it matters.

A quick history

ZFS landed in Solaris 10 in June 2006 and the moment I read about copy-on-write, end-to-end checksums and pooled storage I knew RAID5 on a hardware controller was on borrowed time. Sun open-sourced it shortly after and OpenSolaris was where most of us tinkered with it.

Then came the unfortunate Oracle bit in 2010. Pool version 28 was the last open release. The community forked it as OpenZFS which is what powers ZFS on FreeBSD, Linux, illumos and macOS today. Modern OpenZFS uses feature flags instead of monotonic version numbers so you can mix and match what your platform supports.

A rough timeline of the bits you’ll actually care about:

ZFS Feature Timeline
YearReleaseNotable
2006Solaris 10 6/06ZFS arrives
2008FreeBSD 7.0First non-Solaris port
2009Pool v17RAIDZ3 (triple parity)
2010Pool v28Last open Solaris version
2013ZoL 0.6.1First stable ZFS on Linux, lz4 compression
2019OpenZFS 0.8Native encryption, TRIM, special vdevs, sequential resilver
2020OpenZFS 2.0Linux and FreeBSD on one codebase, zstd compression
2021OpenZFS 2.1dRAID
2023OpenZFS 2.2Block cloning, BLAKE3 checksums
2024OpenZFS 2.3RAIDZ expansion (finally!), Direct IO

Most of what follows works on anything from pool v15 onwards. Where it doesn’t, I’ve flagged the minimum version.

Pool Topology

Before you touch a single command, pick your topology. You can’t change a pool’s redundancy level once it’s created (well, you couldn’t until RAIDZ expansion in 2.3 which is still a fairly limited operation).

vdev Types
TypeMin DisksParityNotes
stripe10No redundancy, lose a disk lose the pool
mirror2n-1Best for IOPS, n-way mirrors supported
raidz131Tolerates 1 disk loss, like RAID5
raidz242Tolerates 2 disk loss, like RAID6, the sensible default
raidz353Tolerates 3 disk loss (pool v17+)
draidvariesvariesDistributed parity (OpenZFS 2.1+) for very large arrays

A few rules of thumb that have served me well:

  • For anything bigger than 4TB drives use raidz2. Resilver times on big disks are scary and raidz1 leaves you exposed during that window.
  • If you care about random IOPS (VM storage, databases) use mirrors. A pool of mirrored pairs gives you the IOPS of N drives versus 1 for a raidz vdev.
  • You can stripe across vdevs but never stripe across raidz vdevs of different widths.
  • dRAID is for the folks running 30+ disks. If you’re not, stick with raidz.

Device Naming

This bit changed a lot over the years. On Solaris/OpenSolaris we had the lovely c0t0d0 controller-target-disk style. FreeBSD uses ada0, da0. Linux had /dev/sdX which is awful because the letters can shuffle on reboot.

Always reference disks by their stable identifier. On Linux that’s /dev/disk/by-id/ (the WWN or serial number ones). On FreeBSD use the gptid or diskid. Saves you a lot of pain when a controller renumbers things.

bash
ls -l /dev/disk/by-id/

Pool Creation

The basics, with examples mirroring my own kit (gandalf is one of my Proxmox boxes, zeus is the FreeBSD machine).

A simple stripe (don’t do this for anything you care about):

bash
zpool create tank /dev/disk/by-id/ata-WDC_WD40EFRX-1

A two-way mirror, which is what I run for VM storage:

bash
zpool create -o ashift=12 vmstore mirror \
    /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S1 \
    /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S2

A six-disk raidz2 for bulk storage (movies, photos, the ~2PB of Aero/Astro test data I’ve been smashing through with Smash):

bash
zpool create -o ashift=12 archive raidz2 \
    /dev/disk/by-id/ata-WDC_WD80EFAX-1 \
    /dev/disk/by-id/ata-WDC_WD80EFAX-2 \
    /dev/disk/by-id/ata-WDC_WD80EFAX-3 \
    /dev/disk/by-id/ata-WDC_WD80EFAX-4 \
    /dev/disk/by-id/ata-WDC_WD80EFAX-5 \
    /dev/disk/by-id/ata-WDC_WD80EFAX-6

Striped mirrors (three vdevs of two-way mirrors), great for VM storage with 6 SSDs:

bash
zpool create -o ashift=12 fast \
    mirror /dev/disk/by-id/ata-A1 /dev/disk/by-id/ata-A2 \
    mirror /dev/disk/by-id/ata-B1 /dev/disk/by-id/ata-B2 \
    mirror /dev/disk/by-id/ata-C1 /dev/disk/by-id/ata-C2

About ashift

ashift=12 means 4K sectors (2^12 = 4096 bytes). Almost every modern drive is 4K native or 4K-emulated, so ashift=12 is what you want. NVMe is sometimes happier on ashift=13 (8K). You cannot change ashift after pool creation, so get it right the first time.

If you’re not sure, peek at the drive:

bash
smartctl -a /dev/sda | grep -i sector

Useful create-time properties

Set these at creation rather than fiddling later:

bash
zpool create -o ashift=12 \
             -O compression=zstd \
             -O atime=off \
             -O xattr=sa \
             -O acltype=posixacl \
             tank raidz2 /dev/disk/by-id/...
  • compression=zstd (OpenZFS 2.0+), or lz4 for older. Always on, it’s faster than no compression for most workloads.
  • atime=off stops every read updating access times, which is pointless write amplification.
  • xattr=sa stores extended attributes inline (Linux), much faster.
  • acltype=posixacl enables POSIX ACLs (Linux).

Pool Status & Inspection

The commands you’ll run a hundred times.

Show all pools at a glance:

bash
zpool list
NAME      SIZE   ALLOC   FREE  FRAG    CAP  DEDUP  HEALTH
archive  43.7T  18.2T  25.5T    8%    41%  1.00x  ONLINE
fast     2.91T   847G  2.08T   12%    28%  1.00x  ONLINE
rpool     446G   112G   334G    4%    25%  1.00x  ONLINE

The full status with vdev tree, errors and resilver progress:

bash
zpool status -v

I/O statistics, refreshed every 2 seconds:

bash
zpool iostat -v 2

Pool history (every command run against the pool, ever, kept on-pool):

bash
zpool history archive

That last one has saved my bacon more than once when trying to remember what someone (me) did to a pool six months ago.

Datasets

Datasets are the ZFS equivalent of filesystems but they’re cheap to create, can be nested and inherit properties from their parent. Make liberal use of them.

Create a dataset:

bash
zfs create archive/photos
zfs create archive/photos/raw
zfs create archive/photos/edited

Set a property (children inherit unless overridden):

bash
zfs set compression=zstd-9 archive/photos/raw
zfs set quota=2T archive/photos
zfs set recordsize=1M archive/movies

recordsize matters for performance. Big media files love 1M. Databases want 8K or 16K to match their page size. The default of 128K is fine for general purpose stuff.

Inspect properties (the ones that aren’t default):

bash
zfs get -s local,received all archive/photos
NAME            PROPERTY     VALUE   SOURCE
archive/photos  compression  zstd    local
archive/photos  quota        2T      local
archive/photos  atime        off     inherited from archive

List all datasets with size info:

bash
zfs list -o name,used,avail,refer,mountpoint

Volumes (zvols) are block devices backed by ZFS, used by Proxmox for VM disks:

bash
zfs create -V 32G -b 16k fast/vm-100-disk-0

The -b is the volblocksize, similar concept to recordsize but for zvols. Set it to match the guest filesystem block size for best performance.

Snapshots

Snapshots are essentially free (copy-on-write means they cost nothing until blocks change) and they’re the killer feature.

Take a snapshot:

bash
zfs snapshot archive/photos@2026-05-02

Recursive snapshot of a dataset and all children:

bash
zfs snapshot -r archive@nightly-2026-05-02

List snapshots for a dataset:

bash
zfs list -t snapshot -o name,used,refer,creation -s creation archive/photos

Browse the contents of a snapshot (it’s right there at .zfs/snapshot/<name> in the dataset root):

bash
ls /archive/photos/.zfs/snapshot/2026-05-02/

If .zfs isn’t visible, set snapdir=visible:

bash
zfs set snapdir=visible archive/photos

Roll back to a snapshot (loses all changes since):

bash
zfs rollback archive/photos@2026-05-02

If there are newer snapshots between now and the target, you’ll need -r to discard them.

Clone a snapshot to a new writeable dataset (useful for spinning up a VM disk from a known-good template):

bash
zfs clone fast/vm-100-disk-0@golden fast/vm-201-disk-0

Promote the clone if you want to delete the original:

bash
zfs promote fast/vm-201-disk-0

Delete a snapshot:

bash
zfs destroy archive/photos@2026-05-02

For automatic snapshots I’ve used zfs-auto-snapshot on Linux for years and zfsnap on FreeBSD. On Proxmox I let it manage VM/CT snapshots itself and run a cron for dataset-level ones.

Send and Receive (the magic bit)

zfs send and zfs receive is how you get data off one box and onto another, byte-for-byte, with full ZFS metadata. This is replication, backup and migration all in one tool.

Send a snapshot to a file:

bash
zfs send archive/photos@2026-05-02 | gzip > /backup/photos-2026-05-02.zfs.gz

Restore from that file:

bash
gzcat /backup/photos-2026-05-02.zfs.gz | zfs receive newtank/photos

Send to another box over SSH (this is the bread and butter):

bash
zfs send archive/photos@2026-05-02 | \
    ssh root@zeus zfs receive backup/photos

Incremental send (only the changes between two snapshots):

bash
zfs send -i archive/photos@2026-05-01 archive/photos@2026-05-02 | \
    ssh root@zeus zfs receive backup/photos

Recursive incremental of a whole tree (use -R):

bash
zfs send -R -I archive@2026-05-01 archive@2026-05-02 | \
    ssh root@zeus zfs receive -F backup

Some flags worth knowing:

  • -c send compressed-on-disk blocks as-is, no decompress/recompress (OpenZFS 0.8+)
  • -w raw send, including encrypted blocks without decrypting (OpenZFS 0.8+)
  • -L use large blocks
  • -e use embedded data blocks
  • -p include properties

For ongoing replication I use syncoid by Jim Salter which wraps all of this up nicely. It even handles the resume tokens if a transfer dies halfway. Highly recommended.

Compression

Always on. Modern CPUs decompress faster than disks can read uncompressed data, and the algorithm is per-block so incompressible data just stores raw.

The options:

Compression Algorithms
AlgorithmRatioSpeedWhen
off1.00xfastestalmost never the right answer
lz4~1.5xvery fastthe default since 2013, fine for everything
zstd~2xfastOpenZFS 2.0+, the new default
zstd-Nup to 3xtunable 1-19higher = slower, use for archive datasets
gzip-N~2xslowmostly historical, prefer zstd
zleminimalvery fastonly collapses zero runs

Set it (recursively if you want):

bash
zfs set compression=zstd archive
zfs set compression=zstd-9 archive/photos/raw

Check the compression ratio you’re actually getting:

bash
zfs get compressratio archive/photos
NAME            PROPERTY       VALUE  SOURCE
archive/photos  compressratio  1.42x  -

Note that compression only applies to newly-written data. Existing data stays at whatever compression was active when it was written. Rewrite (e.g. via send/receive into a new dataset) if you change the algorithm and want the existing data recompressed.

Deduplication

Don’t.

Alright, that’s a bit harsh. ZFS dedupe is real but the rule of thumb is 5GB of RAM per TB of deduped data because the dedupe table (DDT) needs to live in ARC for performance. If your DDT spills to disk, performance falls off a cliff and recovering is painful.

If you genuinely have a high-dedup-ratio workload (VM disk images of identical OSes, mostly) then it’s worth considering. Otherwise use compression and call it a day.

OpenZFS 2.3 is bringing Fast Dedup which fixes most of the historical pain. Worth revisiting once that’s stable on your platform.

If you must:

bash
zfs set dedup=on tank/vm-images

Check your current DDT size:

bash
zpool status -D tank

Encryption (OpenZFS 0.8+)

Native encryption arrived in 0.8 and it’s brilliant because it’s per-dataset, not per-pool. Different keys for different datasets, raw-send to an untrusted backup target without decrypting, etc.

Create an encrypted dataset:

bash
zfs create -o encryption=on \
           -o keyformat=passphrase \
           tank/secrets

It will prompt for a passphrase. To use a key file instead:

bash
zfs create -o encryption=on \
           -o keyformat=raw \
           -o keylocation=file:///root/keys/secrets.key \
           tank/secrets

Load the key after a reboot:

bash
zfs load-key tank/secrets
zfs mount tank/secrets

Or load all keys at once:

bash
zfs load-key -a
zfs mount -a

Raw send keeps everything encrypted in transit and at rest on the destination:

bash
zfs send -w tank/secrets@daily | ssh root@zeus zfs receive backup/secrets

ARC, L2ARC, ZIL and SLOG

Confusing acronyms, simple ideas.

  • ARC is the read cache in RAM. ZFS will use most of your free RAM for it. This is normal and good.
  • L2ARC is a second-level read cache on a fast SSD. Useful only if you have specific workloads with hot data that doesn’t fit in RAM.
  • ZIL (ZFS Intent Log) is where synchronous writes get logged. Every pool has one, by default it’s part of the pool itself.
  • SLOG (Separate intent LOG) is a dedicated fast device (NVMe, Optane) for the ZIL. Only helps synchronous workloads (NFS, databases). For async writes it does nothing.

Limit ARC size on Linux (it’s hungry by default):

bash
echo "options zfs zfs_arc_max=8589934592" >> /etc/modprobe.d/zfs.conf
update-initramfs -u

That sets it to 8GB. Reboot for it to take effect, or set it live via /sys/module/zfs/parameters/zfs_arc_max.

Add an L2ARC to a pool:

bash
zpool add archive cache /dev/disk/by-id/nvme-Samsung_SSD_980_PRO

Add a SLOG (mirror it, because losing your SLOG mid-write loses synchronous data):

bash
zpool add archive log mirror \
    /dev/disk/by-id/nvme-Optane-1 \
    /dev/disk/by-id/nvme-Optane-2

Check ARC stats (Linux):

bash
arcstat 1
arc_summary

Special vdevs (OpenZFS 0.8+)

Special vdevs hold metadata (and optionally small blocks) on faster storage. Massive performance win for metadata-heavy workloads (lots of small files, snapshot listings, etc).

Heads up
A special vdev becomes part of the pool. If it dies, the pool dies. Mirror them. Always.

Add a mirrored special vdev:

bash
zpool add archive special mirror \
    /dev/disk/by-id/nvme-A \
    /dev/disk/by-id/nvme-B

Steer small blocks to the special vdev as well:

bash
zfs set special_small_blocks=64K archive

That sends any block smaller than 64K to the special vdev. Set per-dataset for finer control.

Maintenance

Scrub

Reads all data, verifies checksums, repairs from parity if anything’s bad. Run monthly on consumer drives, quarterly on enterprise.

bash
zpool scrub archive

Watch progress:

bash
zpool status archive

Stop a scrub:

bash
zpool scrub -s archive

On Linux, the zfs-zed and zfsutils-linux packages typically install a cron entry that scrubs on the second Sunday of each month. Check /etc/cron.d/zfsutils-linux.

Resilver

Happens automatically when you replace a failed disk. Sequential resilver (0.8+) is dramatically faster than the old block-pointer-tree walk for full-disk replacements.

TRIM (OpenZFS 0.8+)

For SSDs. Tells the drive about free blocks so it can do its garbage collection properly.

bash
zpool trim fast

Enable autotrim if you’d rather not remember:

bash
zpool set autotrim=on fast

Replacing a failed disk

The drill, more or less. Replace identifiers with your actual stable IDs.

Find the failed disk:

bash
zpool status archive

Offline it (if it’s not already faulted automatically):

bash
zpool offline archive /dev/disk/by-id/ata-WDC_WD80EFAX-3

Physically swap the disk. Then replace it in the pool:

bash
zpool replace archive \
    /dev/disk/by-id/ata-WDC_WD80EFAX-3 \
    /dev/disk/by-id/ata-WDC_WD80EFAX-NEW

Watch the resilver:

bash
zpool status archive

If the new disk has the same name (because the slot got reused), the second argument is optional.

Properties Cheat Sheet

The ones I tweak most often.

Common Dataset Properties
PropertyValuesNotes
compressionlz4, zstd, zstd-N, offAlways on, default to zstd on 2.0+
atimeon, offOff unless you actually need access times
recordsize512B to 16MMatch workload, 1M for media, 16K for DBs
xattron, sasa on Linux, much faster
mountpointpath or nonenone for parents that just hold children
quotasize or noneHard cap including children
reservationsize or noneGuaranteed space
refquotasizeQuota excluding snapshots/clones
syncstandard, always, disabledDon’t disable in production
dedupoff, on, etcDon’t, see above
copies1, 2, 3Extra copies of your data, useful on single-disk
checksumon, sha256, blake3blake3 on 2.2+ is fast and strong

Useful one-liners

What’s actually using my space, sorted:

bash
zfs list -o name,used,referenced,compressratio -s used

How much space would I free if I deleted snapshots:

bash
zfs list -t snapshot -o name,used -s used

Find datasets without compression set:

bash
zfs get -H -o name,value compression | awk '$2 == "off"'

Total of all snapshots for a dataset:

bash
zfs list -t snap -o used -p archive/photos | awk 'NR>1 {s+=$1} END {printf "%.2f GB\n", s/1024/1024/1024}'
14.27 GB

Pool fragmentation (high frag means writes will get slower):

bash
zpool list -o name,size,frag,cap

Common gotchas

A handful of things that have bitten me over the years.

  • Don’t fill pools past 80%. ZFS gets dramatically slower as it approaches full because the allocator has to work harder. 80% is the soft limit, 90% is the panic limit.
  • Mirror your SLOG and special vdevs. Losing them loses the pool.
  • ashift is forever. Get it right at create time, you can’t change it.
  • zpool import -f if a pool was last used elsewhere. Don’t do this if the original system still has it imported, you’ll corrupt the pool.
  • Send streams aren’t backups by themselves. Always keep two snapshots either side of an incremental, in case the stream is corrupt.
  • Watch your free space on send/recv targets. A receive that runs out of space leaves a partial dataset that needs cleaning up.
  • atime=off everywhere. I haven’t found a workload where the access time write amplification was worth it.

References

Related Articles

Post is 6040 days old, comments have been disabled.