zfs:troubleshooting:replace_a_disk
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
zfs:troubleshooting:replace_a_disk [2021/10/13 22:58] – created peter | zfs:troubleshooting:replace_a_disk [2021/10/13 23:58] (current) – [Replace the Old Device] peter | ||
---|---|---|---|
Line 3: | Line 3: | ||
===== Check the Pool ===== | ===== Check the Pool ===== | ||
- | Verifing | + | Verify |
<code bash> | <code bash> | ||
zpool status | zpool status | ||
</ | </ | ||
+ | |||
+ | returns: | ||
+ | |||
+ | <code bash> | ||
+ | pool: testpool | ||
+ | | ||
+ | status: One or more devices could not be used because the label is missing or | ||
+ | invalid. | ||
+ | functioning in a degraded state. | ||
+ | action: Replace the device using 'zpool replace' | ||
+ | see: http:// | ||
+ | scan: scrub repaired 0 in 2h4m with 0 errors on Sun Jun 9 00:28:24 2013 | ||
+ | config: | ||
+ | |||
+ | NAME | ||
+ | testpool | ||
+ | raidz1-0 | ||
+ | ata-ST3300620A_5QF0MJFP | ||
+ | ata-ST3300831A_5NF0552X | ||
+ | ata-ST3200822A_5LJ1CHMS | ||
+ | ata-ST3200822A_3LJ0189C | ||
+ | |||
+ | errors: No known data errors | ||
+ | </ | ||
+ | |||
+ | <WRAP info> | ||
+ | **NOTE: | ||
+ | |||
+ | * This is ata-ST3300831A_5NF0552X. | ||
+ | </ | ||
---- | ---- | ||
Line 17: | Line 47: | ||
<WRAP info> | <WRAP info> | ||
- | **NOTE: | + | **NOTE: |
+ | |||
+ | * This can be seen at / | ||
+ | |||
+ | * Only remove the old drive at this point if it is a redundant setup. | ||
</ | </ | ||
Line 25: | Line 59: | ||
<code bash> | <code bash> | ||
- | zpool replace testpool | + | zpool replace testpool |
- | zpool offline testpool | + | zpool offline testpool |
- | zpool remove | + | zpool detatch |
</ | </ | ||
Line 35: | Line 69: | ||
* If the pool is a redundant configuration, | * If the pool is a redundant configuration, | ||
* If the pool is not redundant, data will be copied from the old device to the new device. | * If the pool is not redundant, data will be copied from the old device to the new device. | ||
- | |||
- | * The old drive should also become detached. | ||
* Once that is complete, the old device can be physically removed. | * Once that is complete, the old device can be physically removed. | ||
Line 42: | Line 74: | ||
</ | </ | ||
+ | <WRAP important> | ||
+ | ==== Potential Issues ==== | ||
- | If the old disk is already removed from the system | + | If the bad device has already |
<code bash> | <code bash> | ||
- | zpool offline | + | cannot |
- | zpool remove pool1 sdd | + | |
- | zpool attach | + | |
</ | </ | ||
+ | * This is because the label of the drive that died does not exist in the system any more. | ||
+ | * Therefore the bad device cannot be specified by ID. | ||
+ | * If this case, try specifying it by device name or by GUID. | ||
+ | |||
+ | ---- | ||
+ | |||
+ | There are various ways to determine a GUID: | ||
+ | |||
+ | <code bash> | ||
+ | zdb # Find GUID. | ||
+ | zdb -l / | ||
+ | zpool status -g # Find GUID. | ||
+ | zpool status -L # Find device name, resolving links. | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | Try to get the GUID using zdb: | ||
+ | |||
+ | <code bash> | ||
+ | zdb | ||
+ | testpool: | ||
+ | version: 28 | ||
+ | name: ' | ||
+ | state: 0 | ||
+ | txg: 162804 | ||
+ | pool_guid: 14829240649900366534 | ||
+ | hostname: ' | ||
+ | vdev_children: | ||
+ | vdev_tree: | ||
+ | type: ' | ||
+ | id: 0 | ||
+ | guid: 14829240649900366534 | ||
+ | children[0]: | ||
+ | type: ' | ||
+ | id: 0 | ||
+ | guid: 5355850150368902284 | ||
+ | nparity: 1 | ||
+ | metaslab_array: | ||
+ | metaslab_shift: | ||
+ | ashift: 9 | ||
+ | asize: 791588896768 | ||
+ | is_log: 0 | ||
+ | create_txg: 4 | ||
+ | children[0]: | ||
+ | type: ' | ||
+ | id: 0 | ||
+ | guid: 11426107064765252810 | ||
+ | path: '/ | ||
+ | phys_path: '/ | ||
+ | whole_disk: 1 | ||
+ | create_txg: 4 | ||
+ | children[1]: | ||
+ | type: ' | ||
+ | id: 1 | ||
+ | guid: 15935140517898495532 | ||
+ | path: '/ | ||
+ | phys_path: '/ | ||
+ | whole_disk: 1 | ||
+ | create_txg: 4 | ||
+ | children[2]: | ||
+ | type: ' | ||
+ | id: 2 | ||
+ | guid: 7183706725091321492 | ||
+ | path: '/ | ||
+ | phys_path: '/ | ||
+ | whole_disk: 1 | ||
+ | create_txg: 4 | ||
+ | children[3]: | ||
+ | type: ' | ||
+ | id: 3 | ||
+ | guid: 17196042497722925662 | ||
+ | path: '/ | ||
+ | phys_path: '/ | ||
+ | whole_disk: 1 | ||
+ | create_txg: 4 | ||
+ | features_for_read: | ||
+ | </ | ||
+ | |||
+ | <WRAP info> | ||
+ | **NOTE: | ||
+ | </ | ||
+ | |||
+ | Use the GUID to offline the old device: | ||
+ | |||
+ | <code bash> | ||
+ | zpool offline testpool 15935140517898495532 | ||
+ | </ | ||
+ | |||
+ | And check this has worked: | ||
+ | |||
+ | <code bash> | ||
+ | zpool status | ||
+ | pool: testpool | ||
+ | | ||
+ | status: One or more devices has been taken offline by the administrator. | ||
+ | Sufficient replicas exist for the pool to continue functioning in a | ||
+ | degraded state. | ||
+ | action: Online the device using 'zpool online' | ||
+ | 'zpool replace' | ||
+ | scan: scrub repaired 0 in 2h4m with 0 errors on Sun Jun 9 00:28:24 2013 | ||
+ | config: | ||
+ | |||
+ | NAME | ||
+ | testpool | ||
+ | raidz1-0 | ||
+ | ata-ST3300620A_5QF0MJFP | ||
+ | ata-ST3300831A_5NF0552X | ||
+ | ata-ST3200822A_5LJ1CHMS | ||
+ | ata-ST3200822A_3LJ0189C | ||
+ | |||
+ | errors: No known data errors | ||
+ | </ | ||
+ | |||
+ | and then replace the pool: | ||
+ | |||
+ | <code bash> | ||
+ | zpool replace testpool 15935140517898495532 / | ||
+ | </ | ||
+ | |||
+ | And check again this has worked: | ||
+ | |||
+ | <code bash> | ||
+ | zpool status | ||
+ | pool: testpool | ||
+ | | ||
+ | status: One or more devices is currently being resilvered. | ||
+ | continue to function, possibly in a degraded state. | ||
+ | action: Wait for the resilver to complete. | ||
+ | scan: resilver in progress since Sun Jun 9 01:44:36 2013 | ||
+ | 408M scanned out of 419G at 20,4M/s, 5h50m to go | ||
+ | 101M resilvered, 0,10% done | ||
+ | config: | ||
+ | |||
+ | NAME STATE READ WRITE CKSUM | ||
+ | testpool | ||
+ | raidz1-0 | ||
+ | ata-ST3300620A_5QF0MJFP | ||
+ | replacing-1 | ||
+ | ata-ST3300831A_5NF0552X | ||
+ | ata-ST3500320AS_9QM03ATQ | ||
+ | ata-ST3200822A_5LJ1CHMS | ||
+ | ata-ST3200822A_3LJ0189C | ||
+ | |||
+ | errors: No known data errors | ||
+ | </ | ||
+ | |||
+ | </ | ||
+ | |||
+ | <WRAP info> | ||
+ | **NOTE: | ||
+ | |||
+ | <code bash> | ||
+ | zpool offline testpool sdd | ||
+ | zpool remove testpool sdd | ||
+ | zpool attach -f testpool sdc sdd | ||
+ | </ | ||
+ | |||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | ===== Wait For Resilvering to Complete ===== | ||
+ | |||
+ | |||
+ | Before the pool will be back to normal it will need to sync data over to the new disk. | ||
+ | |||
+ | * It will remain in a degraded status while the data syncs. | ||
+ | * This data syncing process is called resilvering. | ||
+ | * It may take a __very__ long time depending on the size of the disks and on how much data is on them. | ||
+ | |||
+ | The status of the resilvering can be checked: | ||
+ | |||
+ | <code bash> | ||
+ | zpool status testpool | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | ===== Physically Remove the Old Drive ===== | ||
+ | |||
+ | Physically remove the old drive. | ||
+ | |||
+ | * If it is hot-swappable then just pull it out. | ||
+ | * Otherwise, shutdown the system, before removing the device. | ||
+ | |||
+ | ---- | ||
+ | |||
+ | ===== References ===== | ||
+ | https:// |
zfs/troubleshooting/replace_a_disk.1634165937.txt.gz · Last modified: 2021/10/13 22:58 by peter