Contents
Replacing a physical disk
OS layer
First step is to remove the disk from the O/S layer. In this example, hdisk1 is removed:
unmirrorvg rootvg hdisk1 reducevg rootvg hdisk1 bootlist -m normal hdisk0 (to remove it from the startup order)
This may fail due to being busy. Try taking the system into single user mode with telinit S
. If that doesn’t work, reboot and try again. If it still doesn’t want to talk, boot into single user mode. Reboot, and keep hitting 1 at the AIX splash screen until you get to the ‘system management menu’. There is an option (5, I think) to boot into single user – I think IBM calls it “diagnostic shell” or something. Boot up and log in as root and do it. If all that doesn’t work boot from CD 1 of the AIX install disk set and start a maintenance shell from there and repair the damaged LV.
GOTCHA
In the case of the rootvg having another dump device setup, the “reducevg” command will fail. If you do a lspv hdiskX
you will notice that there are used PPs. Probably a small number like 3. Either way doing a lsvg -l rootvg
will list all of the LVs. Pay attention for the dump device. This should be what is stopping you from removing this disk from the VG. To fix this, you have to remove reference to the dump device (so it closes the LV), and then remove the LV. Like the example above, I’ll do this for hdisk1.
sysdumpdev -l primary /dev/lvdump1 secondary /dev/lvdump2 copy directory /var/adm/ras forced copy flag TRUE always allow dump FALSE dump compression ON As we know hdisk1 is bad, it would be the secondary dump device we want removed. So do the following: sysdumpdev -s /dev/sysdumpnull rmlv -f 'lvump2'
Physical disk layer
This is done by IBM. Drives are almost always hot swappable. However, some steps are performed by the Unix Administrator to expedite the process.
diag (go into the diagnostic routine, and hit enter) select task selection, hot plug task, SCSI and SCSI Raid Hot Plug Manager select Identify a Device Attached to a SCSI Hot Swap Enclosure Device select which drive is affected (if not displayed, use the location code for errpt if necessary)
Once the Identify LED is enabled, it will identify which Drive Array unit and which HDD slot is the problematic device.
go back one screen select Replace/Remove a Device attached to a SCSI Hot Swap Enclosure Device select the disk you want, and hit enter... The drive can then physically be removed from the Drive Array Unit hit enter once that step is done
Once the drive has been removed, you have to go through DIAG to prep system before installing new Hard drive.
go back one screen select Attach a device to a SCSI hot swap enclosure device (select slot) Install new hard drive Hit enter
Perform same steps from any other LPARs as applicable, depending on how many drives are being replaced. Verify that the Identify LEDs have been turned off. If not, go back into diag and have them turned off. Alternatively,
diag select Task selection, Identify and Attention Indicators, Set ALL Identify Indicators to NORMAL
OS post-install steps
After replacing the “physical disk”, you need to make the drive ‘viewable’.
Once the disk is installed type cfgmgr
to scan the bus to pick up the new hardware, then run lspv
to make sure the drive is listed. Do lscfg -vpl hdisk
x against the new drive to make sure it’s the PV you think it is (match up serial #, etc) – it should be pretty obvious which it is if everything went smoothly, though. Assuming we got lucky and the new PV came up as hdisk1, do:
extendvg rootvg hdisk1 mirrorvg rootvg
The mirrorvg should take a while to run and will appear to have hung, this is fine – it’s copying the entire disk across. Disk usage can be monitored in nmon with either the adapter (“a”) or disk (“d”) views.
Now mirror is in operation the boot records need updating. hdisk0 is the usually other PV in the rootvg, but check first with lsvg -p rootvg
:
bosboot -a bootlist -m normal hdisk0 hdisk1
Troubleshooting
If you have a disk with bad sectors, it may fail to move volume groups. Use lspv -l hdiskx
to check what is on there. If it is a non-fs LV (eg dumplv or swaplv), recreate it on the alternate volume (using sysdumpdev or mkswapfs) and then just delete the LV. If it is a filesystem volume that for some reason is not mirrored restore from backup is required, otherwise break the mirror of the specific LV and delete the LV on the bad disk.
552 error
552 error on the front panel is missing IPL device (this text may even be given on the front panel). Boot off the install disk 1 of the same major AIX version that is not newer (in order to avoid firmware corruption issues), then go into diags and start up the rootvg. Do any maintenance work necessary to get the system into a functional state (unmirrorvg, reducevg, extendvg, etc). Assuming the boot device is hdisk0 run ls -l /dev/ipldevice /dev/rhdisk0
. If the major and minor do not match then rm /dev/ipldevice
. If /dev/ipldevice
does not exist create with ln /dev/rhdisk0 /dev/ipldevice
. Run bosboot -ad/dev/ipldevice ; bootlist -m normal hdisk0 hdisk1
and the system should reboot cleanly.
MPIO SETTINGS
Ensure the MPIO settings have been implemented on the hdisk device after attaching it (if applicable).