zfs:troubleshooting:replace_a_disk
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
zfs:troubleshooting:replace_a_disk [2021/10/13 23:28] – [Replace the Old Device] peter | zfs:troubleshooting:replace_a_disk [2021/10/13 23:58] (current) – [Replace the Old Device] peter | ||
---|---|---|---|
Line 59: | Line 59: | ||
<code bash> | <code bash> | ||
- | zpool replace testpool | + | zpool replace testpool |
- | zpool offline testpool | + | zpool offline testpool |
- | zpool remove | + | zpool detatch |
</ | </ | ||
Line 74: | Line 74: | ||
</ | </ | ||
+ | <WRAP important> | ||
+ | ==== Potential Issues ==== | ||
+ | |||
+ | If the bad device has already been removed from the system, this might fail with the following error. | ||
+ | |||
+ | <code bash> | ||
+ | cannot offline / | ||
+ | </ | ||
+ | |||
+ | * This is because the label of the drive that died does not exist in the system any more. | ||
+ | * Therefore the bad device cannot be specified by ID. | ||
+ | * If this case, try specifying it by device name or by GUID. | ||
+ | |||
+ | ---- | ||
+ | |||
+ | There are various ways to determine a GUID: | ||
+ | |||
+ | <code bash> | ||
+ | zdb # Find GUID. | ||
+ | zdb -l / | ||
+ | zpool status -g # Find GUID. | ||
+ | zpool status -L # Find device name, resolving links. | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | Try to get the GUID using zdb: | ||
+ | |||
+ | <code bash> | ||
+ | zdb | ||
+ | testpool: | ||
+ | version: 28 | ||
+ | name: ' | ||
+ | state: 0 | ||
+ | txg: 162804 | ||
+ | pool_guid: 14829240649900366534 | ||
+ | hostname: ' | ||
+ | vdev_children: | ||
+ | vdev_tree: | ||
+ | type: ' | ||
+ | id: 0 | ||
+ | guid: 14829240649900366534 | ||
+ | children[0]: | ||
+ | type: ' | ||
+ | id: 0 | ||
+ | guid: 5355850150368902284 | ||
+ | nparity: 1 | ||
+ | metaslab_array: | ||
+ | metaslab_shift: | ||
+ | ashift: 9 | ||
+ | asize: 791588896768 | ||
+ | is_log: 0 | ||
+ | create_txg: 4 | ||
+ | children[0]: | ||
+ | type: ' | ||
+ | id: 0 | ||
+ | guid: 11426107064765252810 | ||
+ | path: '/ | ||
+ | phys_path: '/ | ||
+ | whole_disk: 1 | ||
+ | create_txg: 4 | ||
+ | children[1]: | ||
+ | type: ' | ||
+ | id: 1 | ||
+ | guid: 15935140517898495532 | ||
+ | path: '/ | ||
+ | phys_path: '/ | ||
+ | whole_disk: 1 | ||
+ | create_txg: 4 | ||
+ | children[2]: | ||
+ | type: ' | ||
+ | id: 2 | ||
+ | guid: 7183706725091321492 | ||
+ | path: '/ | ||
+ | phys_path: '/ | ||
+ | whole_disk: 1 | ||
+ | create_txg: 4 | ||
+ | children[3]: | ||
+ | type: ' | ||
+ | id: 3 | ||
+ | guid: 17196042497722925662 | ||
+ | path: '/ | ||
+ | phys_path: '/ | ||
+ | whole_disk: 1 | ||
+ | create_txg: 4 | ||
+ | features_for_read: | ||
+ | </ | ||
+ | |||
+ | <WRAP info> | ||
+ | **NOTE: | ||
+ | </ | ||
+ | |||
+ | Use the GUID to offline the old device: | ||
+ | |||
+ | <code bash> | ||
+ | zpool offline testpool 15935140517898495532 | ||
+ | </ | ||
+ | |||
+ | And check this has worked: | ||
+ | |||
+ | <code bash> | ||
+ | zpool status | ||
+ | pool: testpool | ||
+ | | ||
+ | status: One or more devices has been taken offline by the administrator. | ||
+ | Sufficient replicas exist for the pool to continue functioning in a | ||
+ | degraded state. | ||
+ | action: Online the device using 'zpool online' | ||
+ | 'zpool replace' | ||
+ | scan: scrub repaired 0 in 2h4m with 0 errors on Sun Jun 9 00:28:24 2013 | ||
+ | config: | ||
+ | |||
+ | NAME | ||
+ | testpool | ||
+ | raidz1-0 | ||
+ | ata-ST3300620A_5QF0MJFP | ||
+ | ata-ST3300831A_5NF0552X | ||
+ | ata-ST3200822A_5LJ1CHMS | ||
+ | ata-ST3200822A_3LJ0189C | ||
+ | |||
+ | errors: No known data errors | ||
+ | </ | ||
+ | |||
+ | and then replace the pool: | ||
+ | |||
+ | <code bash> | ||
+ | zpool replace testpool 15935140517898495532 / | ||
+ | </ | ||
+ | |||
+ | And check again this has worked: | ||
+ | |||
+ | <code bash> | ||
+ | zpool status | ||
+ | pool: testpool | ||
+ | | ||
+ | status: One or more devices is currently being resilvered. | ||
+ | continue to function, possibly in a degraded state. | ||
+ | action: Wait for the resilver to complete. | ||
+ | scan: resilver in progress since Sun Jun 9 01:44:36 2013 | ||
+ | 408M scanned out of 419G at 20,4M/s, 5h50m to go | ||
+ | 101M resilvered, 0,10% done | ||
+ | config: | ||
+ | |||
+ | NAME STATE READ WRITE CKSUM | ||
+ | testpool | ||
+ | raidz1-0 | ||
+ | ata-ST3300620A_5QF0MJFP | ||
+ | replacing-1 | ||
+ | ata-ST3300831A_5NF0552X | ||
+ | ata-ST3500320AS_9QM03ATQ | ||
+ | ata-ST3200822A_5LJ1CHMS | ||
+ | ata-ST3200822A_3LJ0189C | ||
+ | |||
+ | errors: No known data errors | ||
+ | </ | ||
+ | |||
+ | </ | ||
<WRAP info> | <WRAP info> | ||
Line 111: | Line 268: | ||
* If it is hot-swappable then just pull it out. | * If it is hot-swappable then just pull it out. | ||
* Otherwise, shutdown the system, before removing the device. | * Otherwise, shutdown the system, before removing the device. | ||
- | |||
- | ---- | ||
- | |||
- | ===== Potential Issues ===== | ||
- | |||
- | If the bad disk has already been removed from the system you might not be able to specify it by ID. | ||
- | * If this is the case try specifying it by device name or by GUID: | ||
- | |||
- | <code bash> | ||
- | zdb # Find GUID. | ||
- | zdb -l / | ||
- | zpool status -g # Find GUID. | ||
- | zpool status -L # Find device name, resolving links. | ||
- | </ | ||
- | |||
- | <WRAP info> | ||
- | **NOTE: | ||
- | </ | ||
---- | ---- |
zfs/troubleshooting/replace_a_disk.1634167724.txt.gz · Last modified: 2021/10/13 23:28 by peter