< Previous | Next > | |
Product: Volume Manager Guides | |
Manual: Volume Manager 4.1 Troubleshooting Guide |
Failures on RAID-5 VolumesFailures are seen in two varieties: system failures and disk failures. A system failure means that the system has abruptly ceased to operate due to an operating system panic or power failure. Disk failures imply that the data on some number of disks has become unavailable due to a system failure (such as a head crash, electronics failure on disk, or disk controller failure). System FailuresRAID-5 volumes are designed to remain available with a minimum of disk space overhead, if there are disk failures. However, many forms of RAID-5 can have data loss after a system failure. Data loss occurs because a system failure causes the data and parity in the RAID-5 volume to become unsynchronized. Loss of synchronization occurs because the status of writes that were outstanding at the time of the failure cannot be determined. If a loss of sync occurs while a RAID-5 volume is being accessed, the volume is described as having stale parity. The parity must then be reconstructed by reading all the non-parity columns within each stripe, recalculating the parity, and writing out the parity stripe unit in the stripe. This must be done for every stripe in the volume, so it can take a long time to complete. Caution While the resynchronization of a RAID-5 volume without log plexes is being performed, any failure of a disk within the volume causes its data to be lost. Besides the vulnerability to failure, the resynchronization process can tax the system resources and slow down system operation. RAID-5 logs reduce the damage that can be caused by system failures, because they maintain a copy of the data being written at the time of the failure. The process of resynchronization consists of reading that data and parity from the logs and writing it to the appropriate areas of the RAID-5 volume. This greatly reduces the amount of time needed for a resynchronization of data and parity. It also means that the volume never becomes truly stale. The data and parity for all stripes in the volume are known at all times, so the failure of a single disk cannot result in the loss of the data within the volume. Disk FailuresAn uncorrectable I/O error occurs when disk failure, cabling or other problems cause the data on a disk to become unavailable. For a RAID-5 volume, this means that a subdisk becomes unavailable. The subdisk cannot be used to hold data and is considered stale and detached. If the underlying disk becomes available or is replaced, the subdisk is still considered stale and is not used. If an attempt is made to read data contained on a stale subdisk, the data is reconstructed from data on all other stripe units in the stripe. This operation is called a reconstructing-read. This is a more expensive operation than simply reading the data and can result in degraded read performance. When a RAID-5 volume has stale subdisks, it is considered to be in degraded mode. A RAID-5 volume in degraded mode can be recognized from the output of the vxprint -ht command as shown in the following display: V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE ... v r5vol - ENABLED DEGRADED 204800 RAID - raid5 pl r5vol-01 r5vol ENABLED ACTIVE 204800 RAID 3/16 RW sd disk01-01 r5vol-01 disk01 0 102400 0/0 c2t9d0 ENA sd disk02-01 r5vol-01 disk02 0 102400 1/0 c2t10d0 dS sd disk03-01 r5vol-01 disk03 0 102400 2/0 c2t11d0 ENA pl r5vol-02 r5vol ENABLED LOG 1440 CONCAT - RW sd disk04-01 r5vol-02 disk04 0 1440 0 c2t12d0 ENA pl r5vol-03 r5vol ENABLED LOG 1440 CONCAT - RW sd disk05-01 r5vol-03 disk05 0 1440 0 c2t14d0 ENA The volume r5vol is in degraded mode, as shown by the volume state, which is listed as DEGRADED. The failed subdisk is disk02-01, as shown by the MODE flags; d indicates that the subdisk is detached, and S indicates that the subdisk's contents are stale. Note Do not run the vxr5check command on a RAID-5 volume that is in degraded mode. A disk containing a RAID-5 log plex can also fail. The failure of a single RAID-5 log plex has no direct effect on the operation of a volume provided that the RAID-5 log is mirrored. However, loss of all RAID-5 log plexes in a volume makes it vulnerable to a complete failure. In the output of the vxprint -ht command, failure within a RAID-5 log plex is indicated by the plex state being shown as BADLOG rather than LOG. This is shown in the following display, where the RAID-5 log plex r5vol-02 has failed: V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE ... v r5vol - ENABLED ACTIVE 204800 RAID - raid5 pl r5vol-01 r5vol ENABLED ACTIVE 204800 RAID 3/16 RW sd disk01-01 r5vol-01 disk01 0 102400 0/0 c2t9d0 ENA sd disk02-01 r5vol-01 disk02 0 102400 1/0 c2t10d0 ENA sd disk03-01 r5vol-01 disk03 0 102400 2/0 c2t11d0 ENA pl r5vol-02 r5vol DISABLED BADLOG 1440 CONCAT - RW sd disk04-01 r5vol-02 disk04 0 1440 0 c2t12d0 ENA pl r5vol-03 r5vol ENABLED LOG 1440 CONCAT - RW sd disk05-01 r5vol-12 disk05 0 1440 0 c2t14d0 ENA Default Startup Recovery Process for RAID-5VxVM may need to perform several operations to restore fully the contents of a RAID-5 volume and make it usable. Whenever a volume is started, any RAID-5 log plexes are zeroed before the volume is started. This prevents random data from being interpreted as a log entry and corrupting the volume contents. Also, some subdisks may need to be recovered, or the parity may need to be resynchronized (if RAID-5 logs have failed). VxVM takes the following steps when a RAID-5 volume is started:
Recovering a RAID-5 VolumeThe types of recovery that may typically be required for RAID-5 volumes are the following:
Parity resynchronization and stale subdisk recovery are typically performed when the RAID-5 volume is started, or shortly after the system boots. They can also be performed by running the vxrecover command. For more information on starting RAID-5 volumes, see Starting RAID-5 Volumes. If hot-relocation is enabled at the time of a disk failure, system administrator intervention is not required unless no suitable disk space is available for relocation. Hot-relocation is triggered by the failure and the system administrator is notified of the failure by electronic mail. Hot relocation automatically attempts to relocate the subdisks of a failing RAID-5 plex. After any relocation takes place, the hot-relocation daemon (vxrelocd) also initiates a parity resynchronization. In the case of a failing RAID-5 log plex, relocation occurs only if the log plex is mirrored; the vxrelocd daemon then initiates a mirror resynchronization to recreate the RAID-5 log plex. If hot-relocation is disabled at the time of a failure, the system administrator may need to initiate a resynchronization or recovery. Note Following severe hardware failure of several disks or other related subsystems underlying a RAID-5 plex, it may be impossible to recover the volume using the methods described in this chapter. In this case, remove the volume, recreate it on hardware that is functioning correctly, and restore the contents of the volume from a backup. Parity ResynchronizationIn most cases, a RAID-5 array does not have stale parity. Stale parity only occurs after all RAID-5 log plexes for the RAID-5 volume have failed, and then only if there is a system failure. Even if a RAID-5 volume has stale parity, it is usually repaired as part of the volume start process. If a volume without valid RAID-5 logs is started and the process is killed before the volume is resynchronized, the result is an active volume with stale parity. For an example of the output of the vxprint -ht command, see the following example for a stale RAID-5 volume: V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE ... v r5vol - ENABLED NEEDSYNC 204800 RAID - raid5 pl r5vol-01 r5vol ENABLED ACTIVE 204800 RAID 3/16 RW sd disk01-01 r5vol-01 disk01 0 102400 0/0 c2t9d0 ENA sd disk02-01 r5vol-01 disk02 0 102400 1/0 c2t10d0 dS sd disk03-01 r5vol-01 disk03 0 102400 2/0 c2t11d0 ENA ... This output lists the volume state as NEEDSYNC, indicating that the parity needs to be resynchronized. The state could also have been SYNC, indicating that a synchronization was attempted at start time and that a synchronization process should be doing the synchronization. If no such process exists or if the volume is in the NEEDSYNC state, a synchronization can be manually started by using the resync keyword for the vxvol command. For example, to resynchronize the RAID-5 volume in the figure Invalid RAID-5 Volume, use the following command: # vxvol -g mydg resync r5vol Parity is regenerated by issuing VOL_R5_RESYNC ioctls to the RAID-5 volume. The resynchronization process starts at the beginning of the RAID-5 volume and resynchronizes a region equal to the number of sectors specified by the -o iosize option. If the -o iosize option is not specified, the default maximum I/O size is used. The resync operation then moves onto the next region until the entire length of the RAID-5 volume has been resynchronized. For larger volumes, parity regeneration can take a long time. It is possible that the system could be shut down or crash before the operation is completed. In case of a system shutdown, the progress of parity regeneration must be kept across reboots. Otherwise, the process has to start all over again. To avoid the restart process, parity regeneration is checkpointed. This means that the offset up to which the parity has been regenerated is saved in the configuration database. The -o checkpt=size option controls how often the checkpoint is saved. If the option is not specified, the default checkpoint size is used. Because saving the checkpoint offset requires a transaction, making the checkpoint size too small can extend the time required to regenerate parity. After a system reboot, a RAID-5 volume that has a checkpoint offset smaller than the volume length starts a parity resynchronization at the checkpoint offset. Log Plex RecoveryRAID-5 log plexes can become detached due to disk failures. These RAID-5 logs can be reattached by using the att keyword for the vxplex command. To reattach the failed RAID-5 log plex, use the following command: # vxplex -g mydg att r5vol r5vol-l1 Stale Subdisk RecoveryStale subdisk recovery is usually done at volume start time. However, the process doing the recovery can crash, or the volume may be started with an option such as -o delayrecover that prevents subdisk recovery. In addition, the disk on which the subdisk resides can be replaced without recovery operations being performed. In such cases, you can perform subdisk recovery using the vxvol recover command. For example, to recover the stale subdisk in the RAID-5 volume shown in the figure Invalid RAID-5 Volume, use the following command: # vxvol -g mydg recover r5vol disk05-00 A RAID-5 volume that has multiple stale subdisks can be recovered in one operation. To recover multiple stale subdisks, use the vxvol recover command on the volume, as follows: # vxvol -g mydg recover r5vol Recovery After Moving RAID-5 SubdisksWhen RAID-5 subdisks are moved and replaced, the new subdisks are marked as STALE in anticipation of recovery. If the volume is active, the vxsd command may be used to recover the volume. If the volume is not active, it is recovered when it is next started. The RAID-5 volume is degraded for the duration of the recovery operation. Any failure in the stripes involved in the move makes the volume unusable. The RAID-5 volume can also become invalid if its parity becomes stale. To avoid this occurring, vxsd does not allow a subdisk move in the following situations:
Only the third case can be overridden by using the -o force option. Subdisks of RAID-5 volumes can also be split and joined by using the vxsd split command and the vxsd join command. These operations work the same way as those for mirrored volumes. Note RAID-5 subdisk moves are performed in the same way as subdisk moves for other volume types, but without the penalty of degraded redundancy. Starting RAID-5 VolumesWhen a RAID-5 volume is started, it can be in one of many states. After a normal system shutdown, the volume should be clean and require no recovery. However, if the volume was not closed, or was not unmounted before a crash, it can require recovery when it is started, before it can be made available. This section describes actions that can be taken under certain conditions. Under normal conditions, volumes are started automatically after a reboot and any recovery takes place automatically or is done through the vxrecover command. Unstartable RAID-5 VolumesA RAID-5 volume is unusable if some part of the RAID-5 plex does not map the volume length:
When this occurs, the vxvol start command returns the following error message: VxVM vxvol ERROR V-5-1-1236 Volume r5vol is not startable; RAID-5 plex does not map entire volume length. At this point, the contents of the RAID-5 volume are unusable. Another possible way that a RAID-5 volume can become unstartable is if the parity is stale and a subdisk becomes detached or stale. This occurs because within the stripes that contain the failed subdisk, the parity stripe unit is invalid (because the parity is stale) and the stripe unit on the bad subdisk is also invalid. The figure, Invalid RAID-5 Volume, illustrates a RAID-5 volume that has become invalid due to stale parity and a failed subdisk. Click the thumbnail above to view full-sized image. This example shows four stripes in the RAID-5 array. All parity is stale and subdisk disk05-00 has failed. This makes stripes X and Y unusable because two failures have occurred within those stripes. This qualifies as two failures within a stripe and prevents the use of the volume. In this case, the output display from the vxvol start command is as follows: VxVM vxvol ERROR V-5-1-1237 Volume r5vol is not startable; some subdisks are unusable and the parity is stale. This situation can be avoided by always using two or more RAID-5 log plexes in RAID-5 volumes. RAID-5 log plexes prevent the parity within the volume from becoming stale which prevents this situation (see System Failures for details). Forcibly Starting RAID-5 VolumesYou can start a volume even if subdisks are marked as stale: for example, if a stopped volume has stale parity and no RAID-5 logs, and a disk becomes detached and then reattached. The subdisk is considered stale even though the data is not out of date (because the volume was in use when the subdisk was unavailable) and the RAID-5 volume is considered invalid. To prevent this case, always have multiple valid RAID-5 logs associated with the array whenever possible. To start a RAID-5 volume with stale subdisks, you can use the -f option with the vxvol start command. This causes all stale subdisks to be marked as non-stale. Marking takes place before the start operation evaluates the validity of the RAID-5 volume and what is needed to start it. Also, you can mark individual subdisks as non-stale by using the following command: # vxmend [-g diskgroup] fix unstale subdisk
This is done because if the system were to crash or the volume was ungracefully stopped while it was active, the parity becomes stale, making the volume unusable. If this is undesirable, the volume can be started with the -o unsafe start option. Caution The -o unsafe start option is considered dangerous, as it can make the contents of the volume unusable. It is therefore not recommended. If any subdisk recovery fails and there are no valid logs, the volume start is aborted because the subdisk remains stale and a system crash makes the RAID-5 volume unusable. This can also be overridden by using the -o unsafe start option. Caution The -o unsafe start option is considered dangerous, as it can make the contents of the volume unusable. It is therefore not recommended. If the volume has valid logs, subdisk recovery failures are noted but they do not stop the start procedure. |
^ Return to Top | < Previous | Next > |
Product: Volume Manager Guides | |
Manual: Volume Manager 4.1 Troubleshooting Guide | |
VERITAS Software Corporation
www.veritas.com |