< Previous | Next > | |
Product: Volume Manager Guides | |
Manual: Volume Manager 4.1 Administrator's Guide |
How Hot-Relocation WorksHot-relocation allows a system to react automatically to I/O failures on redundant (mirrored or RAID-5) VxVM objects, and to restore redundancy and access to those objects. VxVM detects I/O failures on objects and relocates the affected subdisks to disks designated as spare disks or to free space within the disk group. VxVM then reconstructs the objects that existed before the failure and makes them redundant and accessible again. When a partial disk failure occurs (that is, a failure affecting only some subdisks on a disk), redundant data on the failed portion of the disk is relocated. Existing volumes on the unaffected portions of the disk remain accessible. Note Hot-relocation is only performed for redundant (mirrored or RAID-5) subdisks on a failed disk. Non-redundant subdisks on a failed disk are not relocated, but the system administrator is notified of their failure. Hot-relocation is enabled by default and takes effect without the intervention of the system administrator when a failure occurs. The hot-relocation daemon, vxrelocd, detects and reacts to VxVM events that signify the following types of failures:
When vxrelocd detects such a failure, it performs the following steps:
If relocation is not possible, vxrelocd notifies the system administrator and takes no further action. Note Hot-relocation does not guarantee the same layout of data or the same performance after relocation. The system administrator can make configuration changes after hot-relocation occurs. Relocation of failing subdisks is not possible in the following cases:
See the vxrelocd(1M) manual page for more information about the hot-relocation daemon. Example of Hot-Relocation for a Subdisk in a RAID-5 Volume illustrates the hot-relocation process in the case of the failure of a single subdisk of a RAID-5 volume. Example of Hot-Relocation for a Subdisk in a RAID-5 Volume Click the thumbnail above to view full-sized image. Partial Disk Failure Mail MessagesIf hot-relocation is enabled when a plex or disk is detached by a failure, mail indicating the failed objects is sent to root. If a partial disk failure occurs, the mail identifies the failed plexes. For example, if a disk containing mirrored volumes fails, you can receive mail information as shown in the following example: To: root Subject: Volume Manager failures on host teal Failures have been detected by the VERITAS Volume Manager: failed plexes: home-02 src-02 See Modifying the Behavior of Hot-Relocation for information on how to send the mail to users other than root. You can determine which disk is causing the failures in the above example message by using the following command: # vxstat -g mydg -s -ff home-02 src-02 The -s option asks for information about individual subdisks, and the -ff option displays the number of failed read and write operations. The following output display is typical: FAILED TYP NAME READS WRITES sd mydg01-04 0 0 sd mydg01-06 0 0 sd mydg02-03 1 0 sd mydg02-04 1 0 This example shows failures on reading from subdisks mydg02-03 and mydg02-04 of disk mydg02. Hot-relocation automatically relocates the affected subdisks and initiates any necessary recovery procedures. However, if relocation is not possible or the hot-relocation feature is disabled, you must investigate the problem and attempt to recover the plexes. Errors can be caused by cabling failures, so check the cables connecting your disks to your system. If there are obvious problems, correct them and recover the plexes using the following command: # vxrecover -b -g mydg home src This starts recovery of the failed plexes in the background (the command prompt reappears before the operation completes). If an error message appears later, or if the plexes become detached again and there are no obvious cabling failures, replace the disk (see Removing and Replacing Disks). Complete Disk Failure Mail MessagesIf a disk fails completely and hot-relocation is enabled, the mail message lists the disk that failed and all plexes that use the disk. For example, you can receive mail as shown in this example display: To: root Subject: Volume Manager failures on host teal Failures have been detected by the VERITAS Volume Manager: failed disks: mydg02 failed plexes: home-02 src-02 mkting-01 failing disks: mydg02 This message shows that mydg02 was detached by a failure. When a disk is detached, I/O cannot get to that disk. The plexes home-02, src-02, and mkting-01 were also detached (probably because of the failure of the disk). As described in Partial Disk Failure Mail Messages, the problem can be a cabling error. If the problem is not a cabling error, replace the disk (see Removing and Replacing Disks). How Space is Chosen for RelocationA spare disk must be initialized and placed in a disk group as a spare before it can be used for replacement purposes. If no disks have been designated as spares when a failure occurs, VxVM automatically uses any available free space in the disk group in which the failure occurs. If there is not enough spare disk space, a combination of spare space and free space is used. The free space used in hot-relocation must not have been excluded from hot-relocation use. Disks can be excluded from hot-relocation use by using vxdiskadm, vxedit or the VERITAS Enterprise Administrator (VEA). You can designate one or more disks as hot-relocation spares within each disk group. Disks can be designated as spares by using vxdiskadm, vxedit, or the VEA. Disks designated as spares do not participate in the free space model and should not have storage space allocated on them. When selecting space for relocation, hot-relocation preserves the redundancy characteristics of the VxVM object to which the relocated subdisk belongs. For example, hot-relocation ensures that subdisks from a failed plex are not relocated to a disk containing a mirror of the failed plex. If redundancy cannot be preserved using any available spare disks and/or free space, hot-relocation does not take place. If relocation is not possible, the system administrator is notified and no further action is taken. From the eligible disks, hot-relocation attempts to use the disk that is "closest" to the failed disk. The value of "closeness" depends on the controller, target, and disk number of the failed disk. A disk on the same controller as the failed disk is closer than a disk on a different controller. A disk under the same target as the failed disk is closer than one on a different target. Hot-relocation tries to move all subdisks from a failing drive to the same destination disk, if possible. When hot-relocation takes place, the failed subdisk is removed from the configuration database, and VxVM ensures that the disk space used by the failed subdisk is not recycled as free space. |
^ Return to Top | < Previous | Next > |
Product: Volume Manager Guides | |
Manual: Volume Manager 4.1 Administrator's Guide | |
VERITAS Software Corporation
www.veritas.com |