This is an old revision of the document!
Table of Contents
ZFS - Troubleshooting - Replace a Disk
Check the Pool
Verify that a disk is bad and that it needs to be replaced.
zpool status
returns:
pool: testpool state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-4J scan: scrub repaired 0 in 2h4m with 0 errors on Sun Jun 9 00:28:24 2013 config: NAME STATE READ WRITE CKSUM testpool DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 ata-ST3300620A_5QF0MJFP ONLINE 0 0 0 ata-ST3300831A_5NF0552X UNAVAIL 0 0 0 ata-ST3200822A_5LJ1CHMS ONLINE 0 0 0 ata-ST3200822A_3LJ0189C ONLINE 0 0 0 errors: No known data errors
NOTE: This shows that one disk is unavailable.
- This is ata-ST3300831A_5NF0552X.
Add a New Disk
- Add a new disk.
- Optionally remove the old disk.
NOTE: The new disk is ata-ST3500320AS_9QM03ATQ.
- This can be checked at /dev/disk/by-id/ata-ST3500320AS_9QM03ATQ.
- Only remove the old drive at this point if it is a redundant setup.
Replace the Old Device
zpool replace testpool c1t1d0 c2t0d0 zpool offline testpool c1t1d0 zpool remove testpool c1t1d0
NOTE: Here the old device is specified first followed by the new device.
- If the pool is a redundant configuration, data will be copied from other good disks to the new disk.
- If the pool is not redundant, data will be copied from the old device to the new device.
- The old drive should also become detached.
- Once that is complete, the old device can be physically removed.
NOTE: If the old disk is already removed from the system and a new device has replaced it with the same device name, the following command can be used instead:
zpool offline testpool sdd
zpool remove testpool sdd
zpool attach -f testpool sdc sdd
Wait For Resilvering to Complete
Before the pool will be back to normal it will need to sync data over to the new disk.
- It will remain in a degraded status while the data syncs.
- This data syncing process is called resilvering.
- It may take a very long time depending on the size of the disks and on how much data is on them.
The status of the resilvering can be checked:
zpool status testpool
Physically Remove the Old Drive
Physically remove the old drive.
- If it is hot-swappable then just pull it out.
- Otherwise, shutdown the system, before removing the device.
Potential Issues
If the bad disk has already been removed from the system you might not be able to specify it by ID.
- If this is the case try specifying it by device name or by GUID:
zdb # Find GUID. zdb -l /dev/sda1 # In case the 'zdb' command does not work. zpool status -g # Find GUID. zpool status -L # Find device name, resolving links.
NOTE: If zdb does not output anything, try specifying the device.