Good day. Recently it was discovered that one of the AIX servers is having an issue with a multitude of powerpath devices. When issuing a lsdev |grep hdiskpower | wc -l
I was surprised to see over 3000 finds. Upon looking at what was currently being used with lspv |grep power
I noticed there was like half a dozen maybe in use.
Upgrading the ODM to a newer version didn’t help much. It took over 2.5 hours to remove all of the hdiskpower devices, followed by installing 3 additional ones. A reboot of the AIX system didn’t help either. Upon scouring the web, I have found a few places which indicate the following procedure should fix up the issue (at the moment this is untested). I’ll be validating this information within the next week.
Procedure
* Shutdown the Application(s), Database(s), etc and varyoff all Volume Groups (VGs) except rootvg. This can be confirmed with lsvg -o
* If EMC Solutions Enabler is running, disable with stordaemon shutdown all -immediate
* Remove paths from the PowerPath Configuration –> powermt remove hba=all
* Delete all Symmetrix Disks –> lsdev -CtSYMM* -Fname |xargs -n1 rmdev -dl
* Delete all hdiskpower devices –> rmdev -dl powerpath0
* Confirm they’re gone with –> lsdev -Cc disk
(no symmextrix nor hdiskpower devices should exist)
* Remove all fibre devices instances -> rmdev -Rdl fscsi0
(repeat for others like fscsi1 etc)
* Verify fibre adapters are gone –> lsdev -Cc adapter
(no fscsi should exist)
* Put the hba devices into a defined state –> rmdev -l fcsX
(replace x with 0, 1 etc)
* Scan the bus –> emc_cfgmgr
or cfgmgr -vl fcsX
NOTE: emc_cgrmgr is a script downloadable from EMC’s website
* Configure all of the EMC devices into PowerPath –> powermt config
* Some final checks –> powermt display
& powermt display dev=all
& lsdev -Cc disk
* Finally save your changes with –> powermt save
MPIO settings (if applicable) may have to be put in again. If so, they can be changed like so:
chdev -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail
(repeat for other adapters)
A reboot should NOT be necessary. However, I’ll confirm and update within a week. No reboot is required for this procedure.
= Varying degrees of success =
No issues up until “rmdev -dl powerpath0”. Got this response instead:
rmdev -dl powerpath0
Method error (/etc/methods/ucfgpower):
0514-062 Cannot perform the requested function because the
specified device is busy.
Hence, done the lsdev -Cc disk
option. It listed the two local SAS drives, and the 3000+ hdiskpower devices (all of the hdiskpower devices were in a Defined state). Hence, attempted a manual removal of those with the following line of code:
lsdev -Cc disk | grep hdiskpower | awk {'print "rmdev -dl " $1'} | sh
This slowly started to delete each of them one at a time. Time for a coffee break apparently!
Once the 3135 hdiskpower devices were deleted, the rmdev -dl powerpath0
command worked as expected. Rest of the procedure worked as planned. Lastly set the MPIO settings with the command:
chdev -l fscsi0 -a dyntrk=yes -a fc_err_recov=fast_fail -P
chdev -l fscsi1 dyntrk=yes -a fc_err_recov=fast_fail -P
MPIO settings took effect after reboot.
NOTE: If you wish to make the settings active without a reboot, you have to remove the hdisk (and hdiskpower) devices (and hba child device fscisX), then make the MPIO settings change, followed by running a emc_cfgmgr
. This will re-discover all of the hdisk and hdiskpower devices. Personally, as everything is already down, it’s probably alot easier to just reboot the system.