THURSDAY, MARCH 28, 2024
Thursday, 18 July, 2013 15:41

A quick demo of using the ZFS hot spare feature.

A quick demo of using the ZFS hot spare feature. We talk of ZFS in the Oracle University course at our Minneapolis location.

After the install is complete I added 4 2-GB drives so ZFS had some drives to use.

bash-3.00# format
Searching for disks...done

          AVAILABLE DISK SELECTIONS:
       0. c0d0 
          /pci@0,0/pci-ide@7,1/ide@0/cmdk@0,0
       1. c0d1 
          /pci@0,0/pci-ide@7,1/ide@0/cmdk@1,0
       2. c1d1 
          /pci@0,0/pci-ide@7,1/ide@1/cmdk@1,0
       3. c2t0d0 
          /pci@0,0/pci1000,30@10/sd@0,0
       4. c2t1d0 
          /pci@0,0/pci1000,30@10/sd@1,0

There were no existing ZFS pools

bash-3.00# zpool list
no pools available

So I created a pool named brian, mirrored 2 drives and added one as a spare

bash-3.00# zpool create brian mirror c0d1 c1d1 spare c2t0d0

bash-3.00# zpool status brian
  pool: brian
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        brian       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c0d1    ONLINE       0     0     0
            c1d1    ONLINE       0     0     0
        spares
          c2t0d0    AVAIL

errors: No known data errors

Note that there is a spare identified in the zpool status output. Spares can be used by multiple pools. Mr. Eric Schrock that wrote the code for this tells us that there is now an FMA agent, zfs-retire, which subscribes to vdev failure faults and automatically initiates replacements if there are any hot spares available.

Now I force a failure and use zfs replace so the spare takes over

bash-3.00# zpool offline brian c0d1
Bringing device c0d1 offline
bash-3.00# zpool replace brian c0d1 c2t0d0

bash-3.00# zpool status brian
  pool: brian
 state: DEGRADED
status: One or more devices has been taken offline by the
administrator.
Sufficient replicas exist for the pool to continue functioning
in a degraded state.
action: Online the device using 'zpool online' or replace the
device with 'zpool replace'.
 scrub: resilver completed with 0 errors on Sun Jun 22 11:55:46 2008
config:

        NAME          STATE     READ WRITE CKSUM
        brian         DEGRADED     0     0     0
          mirror      DEGRADED     0     0     0
            spare     DEGRADED     0     0     0
              c0d1    OFFLINE      0     0     0
              c2t0d0  ONLINE       0     0     0
            c1d1      ONLINE       0     0     0
        spares
          c2t0d0      INUSE     currently in use

errors: No known data errors

Note the the spare is now marked as INUSE but is still marked as a spare. The replacement is only temporary and once the original device is replaced it will return to the pool.

Now I replace the “failed” drive and the spare returns to the AVAIL state.

bash-3.00# zpool replace brian c0d1 c2t1d0

bash-3.00# zpool status brian
  pool: brian
 state: ONLINE
 scrub: resilver completed with 0 errors on Sun Jun 22 11:58:02 2008
config:

        NAME        STATE     READ WRITE CKSUM
        brian       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c2t1d0  ONLINE       0     0     0
            c1d1    ONLINE       0     0     0
        spares
          c2t0d0    AVAIL

errors: No known data errors

And finally I remove the spare from this pool if it is no longer required

bash-3.00# zpool remove brian c2t0d0

bash-3.00# zpool status brian
  pool: brian
 state: ONLINE
 scrub: resilver completed with 0 errors on Sun Jun 22 11:58:02 2008
config:

        NAME        STATE     READ WRITE CKSUM
        brian       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c2t1d0  ONLINE       0     0     0
            c1d1    ONLINE       0     0     0

errors: No known data errors
Tags:  ,