ZFS Reference & Cheat Sheet

18 October, 2009 2 May, 2026 13 min read ZFS
#zfs #opensolaris #solaris #freebsd #openzfs #proxmox #raidz #snapshots #filesystems

A working ZFS cheat sheet covering pools, datasets, snapshots, send/recv, compression and the bits you forget at 2am. Started on OpenSolaris around 2008, kept current with OpenZFS on FreeBSD and Proxmox.

!! NOTE
This was originally written for OpenSolaris back when I picked up ZFS around 2007-2008 and have been updating it ever since. Most of these commands work across Solaris, FreeBSD and OpenZFS on Linux but I’ve called out version-specific bits where it matters.

A quick history

ZFS landed in Solaris 10 in June 2006 and the moment I read about copy-on-write, end-to-end checksums and pooled storage I knew RAID5 on a hardware controller was on borrowed time. Sun open-sourced it shortly after and OpenSolaris was where most of us tinkered with it.

Then came the unfortunate Oracle bit in 2010. Pool version 28 was the last open release. The community forked it as OpenZFS which is what powers ZFS on FreeBSD, Linux, illumos and macOS today. Modern OpenZFS uses feature flags instead of monotonic version numbers so you can mix and match what your platform supports.

A rough timeline of the bits you’ll actually care about:

ZFS Feature Timeline
Year	Release	Notable
2006	Solaris 10 6/06	ZFS arrives
2008	FreeBSD 7.0	First non-Solaris port
2009	Pool v17	RAIDZ3 (triple parity)
2010	Pool v28	Last open Solaris version
2013	ZoL 0.6.1	First stable ZFS on Linux, lz4 compression
2019	OpenZFS 0.8	Native encryption, TRIM, special vdevs, sequential resilver
2020	OpenZFS 2.0	Linux and FreeBSD on one codebase, zstd compression
2021	OpenZFS 2.1	dRAID
2023	OpenZFS 2.2	Block cloning, BLAKE3 checksums
2024	OpenZFS 2.3	RAIDZ expansion (finally!), Direct IO

Most of what follows works on anything from pool v15 onwards. Where it doesn’t, I’ve flagged the minimum version.

Pool Topology

Before you touch a single command, pick your topology. You can’t change a pool’s redundancy level once it’s created (well, you couldn’t until RAIDZ expansion in 2.3 which is still a fairly limited operation).

vdev Types
Type	Min Disks	Parity	Notes
`stripe`	1	0	No redundancy, lose a disk lose the pool
`mirror`	2	n-1	Best for IOPS, n-way mirrors supported
`raidz1`	3	1	Tolerates 1 disk loss, like RAID5
`raidz2`	4	2	Tolerates 2 disk loss, like RAID6, the sensible default
`raidz3`	5	3	Tolerates 3 disk loss (pool v17+)
`draid`	varies	varies	Distributed parity (OpenZFS 2.1+) for very large arrays

A few rules of thumb that have served me well:

For anything bigger than 4TB drives use raidz2. Resilver times on big disks are scary and raidz1 leaves you exposed during that window.
If you care about random IOPS (VM storage, databases) use mirrors. A pool of mirrored pairs gives you the IOPS of N drives versus 1 for a raidz vdev.
You can stripe across vdevs but never stripe across raidz vdevs of different widths.
dRAID is for the folks running 30+ disks. If you’re not, stick with raidz.

Device Naming

This bit changed a lot over the years. On Solaris/OpenSolaris we had the lovely c0t0d0 controller-target-disk style. FreeBSD uses ada0, da0. Linux had /dev/sdX which is awful because the letters can shuffle on reboot.

Always reference disks by their stable identifier. On Linux that’s /dev/disk/by-id/ (the WWN or serial number ones). On FreeBSD use the gptid or diskid. Saves you a lot of pain when a controller renumbers things.

Pool Creation

The basics, with examples mirroring my own kit (gandalf is one of my Proxmox boxes, zeus is the FreeBSD machine).

A simple stripe (don’t do this for anything you care about):

A two-way mirror, which is what I run for VM storage:

A six-disk raidz2 for bulk storage (movies, photos, the ~2PB of Aero/Astro test data I’ve been smashing through with Smash):

bash

zpool create -o ashift=12 archive raidz2 \
    /dev/disk/by-id/ata-WDC_WD80EFAX-1 \
    /dev/disk/by-id/ata-WDC_WD80EFAX-2 \
    /dev/disk/by-id/ata-WDC_WD80EFAX-3 \
    /dev/disk/by-id/ata-WDC_WD80EFAX-4 \
    /dev/disk/by-id/ata-WDC_WD80EFAX-5 \
    /dev/disk/by-id/ata-WDC_WD80EFAX-6

Striped mirrors (three vdevs of two-way mirrors), great for VM storage with 6 SSDs:

bash

zpool create -o ashift=12 fast \
    mirror /dev/disk/by-id/ata-A1 /dev/disk/by-id/ata-A2 \
    mirror /dev/disk/by-id/ata-B1 /dev/disk/by-id/ata-B2 \
    mirror /dev/disk/by-id/ata-C1 /dev/disk/by-id/ata-C2

About `ashift`

ashift=12 means 4K sectors (2^12 = 4096 bytes). Almost every modern drive is 4K native or 4K-emulated, so ashift=12 is what you want. NVMe is sometimes happier on ashift=13 (8K). You cannot change ashift after pool creation, so get it right the first time.

If you’re not sure, peek at the drive:

Useful create-time properties

Set these at creation rather than fiddling later:

bash

zpool create -o ashift=12 \
             -O compression=zstd \
             -O atime=off \
             -O xattr=sa \
             -O acltype=posixacl \
             tank raidz2 /dev/disk/by-id/...

compression=zstd (OpenZFS 2.0+), or lz4 for older. Always on, it’s faster than no compression for most workloads.
atime=off stops every read updating access times, which is pointless write amplification.
xattr=sa stores extended attributes inline (Linux), much faster.
acltype=posixacl enables POSIX ACLs (Linux).

Pool Status & Inspection

The commands you’ll run a hundred times.

Show all pools at a glance:

bash

zpool list
NAME      SIZE   ALLOC   FREE  FRAG    CAP  DEDUP  HEALTH
archive  43.7T  18.2T  25.5T    8%    41%  1.00x  ONLINE
fast     2.91T   847G  2.08T   12%    28%  1.00x  ONLINE
rpool     446G   112G   334G    4%    25%  1.00x  ONLINE

The full status with vdev tree, errors and resilver progress:

I/O statistics, refreshed every 2 seconds:

Pool history (every command run against the pool, ever, kept on-pool):

That last one has saved my bacon more than once when trying to remember what someone (me) did to a pool six months ago.

Datasets

Datasets are the ZFS equivalent of filesystems but they’re cheap to create, can be nested and inherit properties from their parent. Make liberal use of them.

Create a dataset:

Set a property (children inherit unless overridden):

recordsize matters for performance. Big media files love 1M. Databases want 8K or 16K to match their page size. The default of 128K is fine for general purpose stuff.

Inspect properties (the ones that aren’t default):

bash

zfs get -s local,received all archive/photos
NAME            PROPERTY     VALUE   SOURCE
archive/photos  compression  zstd    local
archive/photos  quota        2T      local
archive/photos  atime        off     inherited from archive

List all datasets with size info:

Volumes (zvols) are block devices backed by ZFS, used by Proxmox for VM disks:

The -b is the volblocksize, similar concept to recordsize but for zvols. Set it to match the guest filesystem block size for best performance.

Snapshots

Snapshots are essentially free (copy-on-write means they cost nothing until blocks change) and they’re the killer feature.

Take a snapshot:

Recursive snapshot of a dataset and all children:

List snapshots for a dataset:

Browse the contents of a snapshot (it’s right there at .zfs/snapshot/<name> in the dataset root):

If .zfs isn’t visible, set snapdir=visible:

Roll back to a snapshot (loses all changes since):

If there are newer snapshots between now and the target, you’ll need -r to discard them.

Clone a snapshot to a new writeable dataset (useful for spinning up a VM disk from a known-good template):

Promote the clone if you want to delete the original:

Delete a snapshot:

For automatic snapshots I’ve used zfs-auto-snapshot on Linux for years and zfsnap on FreeBSD. On Proxmox I let it manage VM/CT snapshots itself and run a cron for dataset-level ones.

Send and Receive (the magic bit)

zfs send and zfs receive is how you get data off one box and onto another, byte-for-byte, with full ZFS metadata. This is replication, backup and migration all in one tool.

Send a snapshot to a file:

Restore from that file:

Send to another box over SSH (this is the bread and butter):

Incremental send (only the changes between two snapshots):

Recursive incremental of a whole tree (use -R):

Some flags worth knowing:

-c send compressed-on-disk blocks as-is, no decompress/recompress (OpenZFS 0.8+)
-w raw send, including encrypted blocks without decrypting (OpenZFS 0.8+)
-L use large blocks
-e use embedded data blocks
-p include properties

For ongoing replication I use syncoid by Jim Salter which wraps all of this up nicely. It even handles the resume tokens if a transfer dies halfway. Highly recommended.

Compression

Always on. Modern CPUs decompress faster than disks can read uncompressed data, and the algorithm is per-block so incompressible data just stores raw.

The options:

Compression Algorithms
Algorithm	Ratio	Speed	When
`off`	1.00x	fastest	almost never the right answer
`lz4`	~1.5x	very fast	the default since 2013, fine for everything
`zstd`	~2x	fast	OpenZFS 2.0+, the new default
`zstd-N`	up to 3x	tunable 1-19	higher = slower, use for archive datasets
`gzip-N`	~2x	slow	mostly historical, prefer zstd
`zle`	minimal	very fast	only collapses zero runs

Set it (recursively if you want):

Check the compression ratio you’re actually getting:

Note that compression only applies to newly-written data. Existing data stays at whatever compression was active when it was written. Rewrite (e.g. via send/receive into a new dataset) if you change the algorithm and want the existing data recompressed.

Deduplication

Don’t.

Alright, that’s a bit harsh. ZFS dedupe is real but the rule of thumb is 5GB of RAM per TB of deduped data because the dedupe table (DDT) needs to live in ARC for performance. If your DDT spills to disk, performance falls off a cliff and recovering is painful.

If you genuinely have a high-dedup-ratio workload (VM disk images of identical OSes, mostly) then it’s worth considering. Otherwise use compression and call it a day.

OpenZFS 2.3 is bringing Fast Dedup which fixes most of the historical pain. Worth revisiting once that’s stable on your platform.

If you must:

Check your current DDT size:

Encryption (OpenZFS 0.8+)

Native encryption arrived in 0.8 and it’s brilliant because it’s per-dataset, not per-pool. Different keys for different datasets, raw-send to an untrusted backup target without decrypting, etc.

Create an encrypted dataset:

It will prompt for a passphrase. To use a key file instead:

Load the key after a reboot:

Or load all keys at once:

Raw send keeps everything encrypted in transit and at rest on the destination:

ARC, L2ARC, ZIL and SLOG

Confusing acronyms, simple ideas.

ARC is the read cache in RAM. ZFS will use most of your free RAM for it. This is normal and good.
L2ARC is a second-level read cache on a fast SSD. Useful only if you have specific workloads with hot data that doesn’t fit in RAM.
ZIL (ZFS Intent Log) is where synchronous writes get logged. Every pool has one, by default it’s part of the pool itself.
SLOG (Separate intent LOG) is a dedicated fast device (NVMe, Optane) for the ZIL. Only helps synchronous workloads (NFS, databases). For async writes it does nothing.

Limit ARC size on Linux (it’s hungry by default):

That sets it to 8GB. Reboot for it to take effect, or set it live via /sys/module/zfs/parameters/zfs_arc_max.

Add an L2ARC to a pool:

Add a SLOG (mirror it, because losing your SLOG mid-write loses synchronous data):

Check ARC stats (Linux):

Special vdevs (OpenZFS 0.8+)

Special vdevs hold metadata (and optionally small blocks) on faster storage. Massive performance win for metadata-heavy workloads (lots of small files, snapshot listings, etc).

Heads up
A special vdev becomes part of the pool. If it dies, the pool dies. Mirror them. Always.

Add a mirrored special vdev:

Steer small blocks to the special vdev as well:

That sends any block smaller than 64K to the special vdev. Set per-dataset for finer control.

Maintenance

Scrub

Reads all data, verifies checksums, repairs from parity if anything’s bad. Run monthly on consumer drives, quarterly on enterprise.

Watch progress:

Stop a scrub:

On Linux, the zfs-zed and zfsutils-linux packages typically install a cron entry that scrubs on the second Sunday of each month. Check /etc/cron.d/zfsutils-linux.

Resilver

Happens automatically when you replace a failed disk. Sequential resilver (0.8+) is dramatically faster than the old block-pointer-tree walk for full-disk replacements.

TRIM (OpenZFS 0.8+)

For SSDs. Tells the drive about free blocks so it can do its garbage collection properly.

Enable autotrim if you’d rather not remember:

Replacing a failed disk

The drill, more or less. Replace identifiers with your actual stable IDs.

Find the failed disk:

Offline it (if it’s not already faulted automatically):

Physically swap the disk. Then replace it in the pool:

Watch the resilver:

If the new disk has the same name (because the slot got reused), the second argument is optional.

Properties Cheat Sheet

The ones I tweak most often.

Common Dataset Properties
Property	Values	Notes
`compression`	`lz4`, `zstd`, `zstd-N`, `off`	Always on, default to `zstd` on 2.0+
`atime`	`on`, `off`	Off unless you actually need access times
`recordsize`	`512B` to `16M`	Match workload, 1M for media, 16K for DBs
`xattr`	`on`, `sa`	`sa` on Linux, much faster
`mountpoint`	path or `none`	`none` for parents that just hold children
`quota`	size or `none`	Hard cap including children
`reservation`	size or `none`	Guaranteed space
`refquota`	size	Quota excluding snapshots/clones
`sync`	`standard`, `always`, `disabled`	Don’t disable in production
`dedup`	`off`, `on`, etc	Don’t, see above
`copies`	`1`, `2`, `3`	Extra copies of your data, useful on single-disk
`checksum`	`on`, `sha256`, `blake3`	`blake3` on 2.2+ is fast and strong

Useful one-liners

What’s actually using my space, sorted:

How much space would I free if I deleted snapshots:

Find datasets without compression set:

Total of all snapshots for a dataset:

Pool fragmentation (high frag means writes will get slower):

Common gotchas

A handful of things that have bitten me over the years.

Don’t fill pools past 80%. ZFS gets dramatically slower as it approaches full because the allocator has to work harder. 80% is the soft limit, 90% is the panic limit.
Mirror your SLOG and special vdevs. Losing them loses the pool.
ashift is forever. Get it right at create time, you can’t change it.
zpool import -f if a pool was last used elsewhere. Don’t do this if the original system still has it imported, you’ll corrupt the pool.
Send streams aren’t backups by themselves. Always keep two snapshots either side of an incremental, in case the stream is corrupt.
Watch your free space on send/recv targets. A receive that runs out of space leaves a partial dataset that needs cleaning up.
atime=off everywhere. I haven’t found a workload where the access time write amplification was worth it.

References

OpenZFS Wiki - the canonical docs
OpenZFS Feature Flags - what’s supported where
Jim Salter’s ZFS articles on Ars Technica - the best explainers I’ve read
sanoid/syncoid - automated snapshots and replication
zfs-auto-snapshot - simpler automatic snapshots
Aaron Toponce’s ZFS Administration series - dated but still excellent fundamentals
Proxmox ZFS Documentation - if you’re on Proxmox like me

Post is 6040 days old, comments have been disabled.

A quick history

Pool Topology

Device Naming

Pool Creation

About ashift

Useful create-time properties

Pool Status & Inspection

Datasets

Snapshots

Send and Receive (the magic bit)

Compression

Deduplication

Encryption (OpenZFS 0.8+)

ARC, L2ARC, ZIL and SLOG

Special vdevs (OpenZFS 0.8+)

Maintenance

Scrub

Resilver

TRIM (OpenZFS 0.8+)

Replacing a failed disk

Properties Cheat Sheet

Useful one-liners

Common gotchas

References

Related Articles

About `ashift`