Finding and fixing a corrupt ODM install

Good day. Recently it was discovered that one of the AIX servers is having an issue with a multitude of powerpath devices. When issuing a lsdev |grep hdiskpower | wc -l I was surprised to see over 3000 finds. Upon looking at what was currently being used with lspv |grep power I noticed there was like half a dozen maybe in use.

Upgrading the ODM to a newer version didn’t help much. It took over 2.5 hours to remove all of the hdiskpower devices, followed by installing 3 additional ones. A reboot of the AIX system didn’t help either. Upon scouring the web, I have found a few places which indicate the following procedure should fix up the issue (at the moment this is untested). I’ll be validating this information within the next week.

Procedure
* Shutdown the Application(s), Database(s), etc and varyoff all Volume Groups (VGs) except rootvg. This can be confirmed with lsvg -o
* If EMC Solutions Enabler is running, disable with stordaemon shutdown all -immediate
* Remove paths from the PowerPath Configuration –> powermt remove hba=all
* Delete all Symmetrix Disks –> lsdev -CtSYMM* -Fname |xargs -n1 rmdev -dl
* Delete all hdiskpower devices –> rmdev -dl powerpath0
* Confirm they’re gone with –> lsdev -Cc disk (no symmextrix nor hdiskpower devices should exist)
* Remove all fibre devices instances -> rmdev -Rdl fscsi0 (repeat for others like fscsi1 etc)
* Verify fibre adapters are gone –> lsdev -Cc adapter (no fscsi should exist)
* Put the hba devices into a defined state –> rmdev -l fcsX (replace x with 0, 1 etc)
* Scan the bus –> emc_cfgmgr or cfgmgr -vl fcsX NOTE: emc_cgrmgr is a script downloadable from EMC’s website
* Configure all of the EMC devices into PowerPath –> powermt config
* Some final checks –> powermt display & powermt display dev=all & lsdev -Cc disk
* Finally save your changes with –> powermt save

MPIO settings (if applicable) may have to be put in again. If so, they can be changed like so:
chdev -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail (repeat for other adapters)

A reboot should NOT be necessary. However, I’ll confirm and update within a week. No reboot is required for this procedure.

= Varying degrees of success =

No issues up until “rmdev -dl powerpath0”. Got this response instead:
rmdev -dl powerpath0
Method error (/etc/methods/ucfgpower):
0514-062 Cannot perform the requested function because the
specified device is busy.

Hence, done the lsdev -Cc disk option. It listed the two local SAS drives, and the 3000+ hdiskpower devices (all of the hdiskpower devices were in a Defined state). Hence, attempted a manual removal of those with the following line of code:
lsdev -Cc disk | grep hdiskpower | awk {'print "rmdev -dl " $1'} | sh
This slowly started to delete each of them one at a time. Time for a coffee break apparently!

Once the 3135 hdiskpower devices were deleted, the rmdev -dl powerpath0 command worked as expected. Rest of the procedure worked as planned. Lastly set the MPIO settings with the command:
chdev -l fscsi0 -a dyntrk=yes -a fc_err_recov=fast_fail -P
chdev -l fscsi1 dyntrk=yes -a fc_err_recov=fast_fail -P

MPIO settings took effect after reboot.

NOTE: If you wish to make the settings active without a reboot, you have to remove the hdisk (and hdiskpower) devices (and hba child device fscisX), then make the MPIO settings change, followed by running a emc_cfgmgr. This will re-discover all of the hdisk and hdiskpower devices. Personally, as everything is already down, it’s probably alot easier to just reboot the system.