Aug 31, 2014 zfs

If I had to use one and only one file system, there is only one choice. ZFS. It’s robust with stability and has some amazing features.
And with that, I’d most likely use FreeBSD because it’s free and it has the most mature implementation of ZFS outside of Oracle/Sun Solaris. ZFS on Linux isn’t as stable, and I hope that Btrfs will will be ready for production.

Also see Becoming a ZFS Ninja part 2 (Youtube)

Intro

Combines volume manager (Raid) and File System.

There are only 2 commands:

zpool : manage pool zfs : additional tools

Partitioning disk in ZFS is not recommended.

pools

Pools are created from one or more ‘vdevs’ (virtual devices). 1/64th of pool capacity is reserved (to protect COW).

ex: RAIDZ2 (4 x 100GB) = 400GB - 2 parity drive = 200GB - 1/64th = 195GB

VDevs components

Becoming a ZFS Ninja part 1 (Youtube)

file VDevs

Becoming a ZFS Ninja part 1 (Youtube)

RAID concepts

Becoming a ZFS Ninja part 1 (Youtube)

zpool

zpool create

zpool create mypool c0t0d0 c0t1d0 c0t2d0

RAID 1 - simple mirror

zpool create tank1 mirror c0t0d0 c0t1d0

RAID 1+0

zpool create mypool mirror c0t1d0 c1t1d0 mirror c0t2d0 c1t2d0

And this example creates a new pool out of two vdevs that are RAID-Z groups with 2 data disks and one parity disk each:

zpool create tank2 raidz c0t0d0 c0t1d0 c0t2d0 raidz c0t3d0 c0t4d0 c0t5d0

zpool add (grow zpool)

zpool add <vdev>: when you decide to grow your pool, you just add one or more vdevs to it

zpool add tank1 mirror c0t2d0 c0t3d0
  	zpool add tank2 raidz c0t6d0 c0t7d0 c0t8d0  

add 2 spare disks
zpool add mypool spare c1t0d0 c2t0d0

add log (ZIL)
zpool add log c5t0d0

also see https://pthree.org/2012/04/17/install-zfs-on-debian-gnulinux/ for complete guide on Debian

On my server, I’ve added more vdevs

$ sudo zpool create data mirror /dev/ada1.eli /dev/ada2.eli


$ zpool status
	 pool: tank
	 state: ONLINE
	 scan: none requested
	config:
	
	    NAME        STATE     READ WRITE CKSUM
	    tank        ONLINE       0     0     0
	      mirror-0  ONLINE       0     0     0
	        ada1    ONLINE       0     0     0
	        ada2    ONLINE       0     0     0
	
	errors: No known data errors

And add (grow) more hard drives

$ sudo zpool add data mirror /dev/ada3.eli /dev/ada4.eli

$ zpool status
  pool: data
 state: ONLINE
 scan: none requested
config:

    NAME          STATE     READ WRITE CKSUM
    data          ONLINE       0     0     0
      mirror-0    ONLINE       0     0     0
        ada1.eli  ONLINE       0     0     0
        ada2.eli  ONLINE       0     0     0
      mirror-1    ONLINE       0     0     0
        ada3.eli  ONLINE       0     0     0
        ada4.eli  ONLINE       0     0     0

errors: No known data errors


$ df -h
Filesystem      Size    Used   Avail Capacity  Mounted on
/dev/ada0s1a     19G    9.1G    9.0G    50%    /
devfs           1.0k    1.0k      0B   100%    /dev
/dev/ada0s2d    263G    235G    7.0G    97%    /usr/home
linprocfs       4.0k    4.0k      0B   100%    /compat/linux/proc
/dev/da0.eli     28M    144k     26M     1%    /mnt/usbkey
data            913G    349G    564G    38%    /data

zpool scrub

Becoming a ZFS Ninja part 1 (Youtube) * should be run periodically * verifies on-disk contents and fixes problems if possible * schedule via cron to run weekly, on off hours

zpool import/export

I had multiple tanks, and wouldn’t load it without specifying id

zpool import tank #--> wouldn't work
zpool import 12152999....234   #--> works, to see id#, do zpool import

Also to mount it,  sometimes it needs to be manually mounted
sudo zfs mount tank

Also, sometimes, I had to delete the folder /tank, as its existence
prevented zfs from mounting

zpool maintenance

Manually replace disk

zpool replace <old_vdev> <new_vdev>

Turn on/off devices, useful for testing or execute it just before replacing hard disk

zpool online <vdev>
zpool offline <vdev>

Add or remove devices from mirror

zpool attach ...
zpool detach 

Destroy pool

zpool destroy tank

zpool properties

zpool history

View entire history of pool’s history

zpool history mypool

zdb

zdb -C see the content of /etc/zfs/zpool.cache

other features

Displays iostat every 10 seconds

zpool iostat 10

RESULT with GELI while WRITING files

$ zpool iostat 1
            capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
data         352G   576G      0     15  22.7K   998K
data         352G   576G      0    366      0  45.2M
data         352G   576G      0    347      0  42.9M
data         352G   576G      0    407      0  50.1M
data         352G   576G      0    443      0  45.1M
data         352G   576G      0    263      0  31.3M
data         352G   576G      0    312      0  37.8M
data         352G   576G      0    317      0  39.0M
data         352G   576G      0    318      0  38.8M
data         352G   576G      0    348      0  41.3M
data         352G   576G      0    303      0  38.0M

Not bad for ZFS on SATA 300 on top of GELI on old AMD X2 Be2300 1.9Ghz, 4GB RAM

READING should be FASTER, more IOPS with more vdevs mirrors.

For better performance, consider using zil on SSD.

replace disk in zpool

$ zpool status
pool: data
state: DEGRADED
status: One or more devices has been removed by the administrator.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Online the device using 'zpool online' or replace the device with
    'zpool replace'.
scan: resilvered 84K in 0h0m with 0 errors on Tue May  8 17:37:26 2012
config:

    NAME                      STATE     READ WRITE CKSUM
    data                      DEGRADED     0     0     0
    mirror-0                ONLINE       0     0     0
        ada1.eli              ONLINE       0     0     0
        ada2.eli              ONLINE       0     0     0
    mirror-1                DEGRADED     0     0     0
        17537411141092671575  REMOVED      0     0     0  was /dev/ada3.eli
        ada4.eli              ONLINE       0     0     0

errors: No known data errors

Replace hard disk

$ sudo zpool replace data ada3.eli ada3.eli

check the status of resilvering

$ zpool status -v
pool: data
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun May 20 20:46:27 2012
    140G scanned out of 558G at 2.81G/s, 0h2m to go
    3.56M resilvered, 25.13% done
config:

    NAME                        STATE     READ WRITE CKSUM
    data                        DEGRADED     0     0     0
    mirror-0                  ONLINE       0     0     0
        ada1.eli                ONLINE       0     0     0
        ada2.eli                ONLINE       0     0     0
    mirror-1                  DEGRADED     0     0     0
        replacing-0             REMOVED      0     0     0
        17537411141092671575  REMOVED      0     0     0  was /dev/ada3.eli/old
        ada3.eli              ONLINE       0     0     0  (resilvering)
        ada4.eli                ONLINE       0     0     0

errors: No known data errors

* one of the best comprehensive talk by Oracle

After a coffee break

$ zpool status
pool: data
state: ONLINE
scan: resilvered 151G in 1h11m with 0 errors on Sun May 20 21:57:34 2012
config:

    NAME          STATE     READ WRITE CKSUM
    data          ONLINE       0     0     0
    mirror-0    ONLINE       0     0     0
        ada1.eli  ONLINE       0     0     0
        ada2.eli  ONLINE       0     0     0
    mirror-1    ONLINE       0     0     0
        ada3.eli  ONLINE       0     0     0
        ada4.eli  ONLINE       0     0     0

errors: No known data errors

It took 1 hour 11minutes to resilver about 151G on encrypted hard disk.

DEGRADED STATE (removed 2 drives)

$ zpool status
pool: data
state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
scan: scrub repaired 0 in 4h12m with 0 errors on Tue Jun 19 20:29:46 2012
config:

    NAME                      STATE     READ WRITE CKSUM
    data                      DEGRADED     0     0     0
    mirror-0                DEGRADED     0     0     0
        ada1.eli              ONLINE       0     0     0
        14079505549490135106  UNAVAIL      0     0     0  was /dev/ada2.eli
    mirror-1                DEGRADED     0     0     0
        ada3.eli              ONLINE       0     0     0
        10106669688987601550  UNAVAIL      0     0     0  was /dev/ada4.eli

zfs commands

zfs list

$ zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
data   560G   354G   560G  /data

Datasets

either a filesystem or a volume (ZVols)
inherited by sub-filesystems

zfs create <pool>/<dataset>

zfs create mypool/dataset

Dataset properties

zfs get all <pool | ds>
zfs set key=value <pool|ds>

ZVols / volume datasets

On my server

I use the following commands to bring up the ZFS pool on servers.

FreeBSD

zpool create -f tank raidz ada1p4 ada0 ada2
# had to use -f (force) because partition were slightly different in size

Linux

Encryption

Linux

Install

Misc

Recommended to always set to 4k sector instead of 512b for future HD, and better performance.

Disable atime

zfs set atime=off tank

Disable dedup (FreeBSD recommends disabling it)

zfs set dedup=off tank

Scrub often via cron

# crontab -e
...
30 19 * * 5 zpool scrub <pool>

Re-enable ZFS after Debian upgrade

To enable ZFS after upgrading debian release, kernel, etc.

sudo apt-get install --reinstall debian-zfs
sudo reboot

tip for Linux ZFS

see comment on http://louwrentius.com/74tb-diy-nas-based-on-zfs-on-linux.html

zfs set xattr=sa <pool> 

Snapshots and clones

ARC

Misc Issues

Fixing arc lock contention