27 September 2011

VxVM DG Disabled

During a recent SAN maintenance, we ran into an issue on a few hosts that
were unfortunately single pathed to the SAN and lost connectivity to it.
The SAN-presented disk devices were under Veritas Volume Manager (VxVM)
control which started to present I/O errors as a result and displayed
"dgdisabled" in 'vxdisk' output.  The following details this situation.
For our setup, we have:
        HOST:                   apollo
        OS:                     Solaris 9
        VxVM VERSION:           4.1 (also relevant in other versions)
        DISK GROUPS (DGs):      appdg, storedg
        VOLUMES:                appvol, storevol
Upon logging into apollo, 'df' returned the errors below:
        apollo [0] /usr/sbin/df -h
        Filesystem             size   used  avail capacity  Mounted on
        /dev/md/dsk/d2         4.9G   975M   3.9G    20%    /
        <snip...>
        swap                   7.4G   176K   7.4G     1%    /var/run
        dmpfs                  7.4G     0K   7.4G     0%    /dev/vx/dmp
        dmpfs                  7.4G     0K   7.4G     0%    /dev/vx/rdmp
        df: cannot statvfs /usr/appvol: I/O error
        df: cannot statvfs /usr/storevol: I/O error
Checking the output of 'vxdisk' and 'vxdg' shows the DGs as disabled:
        apollo [0] /usr/sbin/vxdisk -o alldgs list
        DEVICE       TYPE            DISK         GROUP        STATUS
        <snip...>
        c1t2d0s2     auto:sliced     disk01       rootdg       online
        c1t3d0s2     auto:sliced     disk02       rootdg       online
        c2t0d0s2     auto:sliced     appdg02      appdg        online dgdisabled
        c2t0d1s2     auto:sliced     appdg01      appdg        online dgdisabled
        c2t0d2s2     auto:cdsdisk    storedg01    storedg      online dgdisabled
        apollo [0] /usr/sbin/vxdg list
        NAME         STATE           ID
        appdg        disabled        1090904042.1047.apollo
        rootdg       enabled         1090964640.1025.apollo
        storedg      disabled        1197441805.18.apollo
Since the DGs are disabled, yet the volumes were still mounted, I did
a quick verify of the mount properties (mount) for the volumes before
forcefully unmounting the volumes (umount -f), then a final check (mount)
to verify the volumes were unmounted:
        apollo [0] /usr/sbin/mount | /usr/bin/grep 'vx/dsk'
        /usr/appvol on /dev/vx/dsk/appdg/appvol \
           read/write/setuid/delaylog/largefiles/ioerror=mwdisable/dev=44501d0 \
           on Wed Apr 13 00:15:34 2011
        /usr/storevol on /dev/vx/dsk/storedg/storevol \
           read/write/setuid/delaylog/largefiles/ioerror=mwdisable/dev=444bf68 \
           on Wed Apr 13 00:15:39 2011
        apollo [0] /usr/sbin/umount -f /usr/appvol
        apollo [0] /usr/sbin/umount -f /usr/storevol
        apollo [0] /usr/sbin/mount | /usr/bin/grep 'vx/dsk'
At this point, we can refocus on VxVM.  In order to re-enable the DGs,
I've deported them (vxdg deport DGNAME) followed by an import (vxdg import
DGNAME).  After the import, I've verified their new state ('vxdg list',
'vxdisk list'):
        apollo [1] /usr/sbin/vxdg deport storedg
        apollo [0] /usr/sbin/vxdg deport appdg
        apollo [0] /usr/sbin/vxdg import storedg
        apollo [0] /usr/sbin/vxdg import appdg
        apollo [0] /usr/sbin/vxdg list
        NAME         STATE           ID
        rootdg       enabled         1090964640.1025.apollo
        appdg        enabled         1090904042.1047.apollo
        storedg      enabled,cds     1197441805.18.apollo
        apollo [0] /usr/sbin/vxdisk -o alldgs list
        DEVICE       TYPE            DISK         GROUP        STATUS
        <snip...>
        c1t2d0s2     auto:sliced     disk01       rootdg       online
        c1t3d0s2     auto:sliced     disk02       rootdg       online
        c2t0d0s2     auto:sliced     appdg02      appdg        online
        c2t0d1s2     auto:sliced     appdg01      appdg        online
        c2t0d2s2     auto:cdsdisk    storedg01    storedg      online
        apollo [0]
Excellent, all DGs are now enabled and online.  Next thing is to restart
the volumes.  Rather than starting all volumes in a DG, you could instead
start them individually.  For simplicity, I've started them all:
        apollo [0] /usr/sbin/vxvol -g storedg startall
        apollo [0] /usr/sbin/vxvol -g appdg startall
        apollo [0] /usr/sbin/mount /usr/storevol
        UX:vxfs mount: ERROR: V-3-21268: /dev/vx/dsk/storedg/storevol is corrupted. needs checking
In my haste, I tried mounting one of the volumes (by the mount point
listed in /etc/vfstab) without checking / repairing the filesystem.
As a result, the above error was returned.  Below, 'fsck' is run against
both volumes and once marked clean, the volumes are remounted:
        apollo [28] /usr/sbin/fsck -F vxfs -y /dev/vx/dsk/storedg/storevol
        log replay in progress
        replay complete - marking super-block as CLEAN
        apollo [0] /usr/sbin/fsck -F vxfs -y /dev/vx/dsk/appdg/appvol
        log replay in progress
        replay complete - marking super-block as CLEAN
        apollo [0] /usr/sbin/mount /usr/storevol
        apollo [0] /usr/sbin/mount /usr/appvol
A final check using 'df' shows the volumes mounted and usable:
        apollo [0] /usr/sbin/df -h /usr/storevol /usr/appvol
        Filesystem             size   used  avail capacity  Mounted on
        /dev/vx/dsk/storedg/storevol
                                80G    11G    69G    14%    /usr/storevol
        /dev/vx/dsk/appdg/appvol
                               1.1T   359G   695G    35%    /usr/appvol
        apollo [0]