Introducing a Power10 server into the CAA environment

Recently work required the procurement of a new power 10 server, so that an aging power 8 server could be taken out of the mix and decommissioned. This required researching what was required for CPU, memory, types of networking adapters and fibre adapters. Once that was determined, the process was started to order in a power 10 server. Once the server arrived, the real fun began. Thus, things had to be done in stages.

Table of Contents:

Racking the server
Cabling the server
HMC Connectivity
HMC Service Partition
VIOS image prepping
Virtual I/O Server Installation
Network Etherchannel
Disk definitions
Updating the firmware
Configure a monitoring solution
Burn-in testing
Add system to CAA

Racking the server

The power 10 server (9043-MRX) was on-site, and had to be put into a rack, with the proper power. The power9 servers and up are a bit longer than the standard 19″ racks. A lot of times a new rack will have to be ordered, the door left off the back, or an extension to the existing rack be installed to accommodate. In this instance, a new rack was ordered. Once the rack arrived, and a decision decided on where to put the rack was determined, the server was racked via rails. Next the power was connected, and it has to be connected properly for the power distribution and fail-over. If disconnected improperly, it can cause unexpected outages. In this case, the 4 PDU connections are labelled C0 – C3. The first two (C0 and C1) should be connected to one side of the rack, while the other two are connected to the other side. The next phase required determining how far away the rack was from the relevant switches. Determining how many adapters were required, and then ordering in the cables that would accommodate the required length. Some folks build their own cables for this endeavor, while others buy pre-made cables. There are pros and cons to either, which isn’t important for this build. However, if your curious, pre-made cables were utilized in this build.

Cabling the Server

The appropriate network and fibre cables were purchased, and connected to the switches. I can’t stress how important it is to label the cables. It will save alot of headaches later down the road. Now that the cables were connected on the switch side (and ports activated), the cables are connected to the back of the Power 10 server. The configuration used was:

2 x RJ45 1G connections for Dual HMC connectivity
2 x RJ45 1G network links for VIOS IP managed network configured in a network bond (per vios)
2 x 10G 10G network links for data network (per vios)
1 dual port fibre adapter (per vios)

The above configuration would require 6 RJ45 1GB network cables, and 4 10G network cables, along with 4 16/32GB fibre cables.

Hardware Management Console Connectivity

At the back of the Power 10 server, there are PCI adapters in locations C0 – C11. C0 typically has 2 x usb ports, and 2 x network ports (for console/hmc). The network ports were labelled T2 and T3. The network port T2 is a dual-purpose port on the power 10 server, allowing it to be utilized for console access and/or HMC connectivity. The T3 network port is for HMC connectivity only. If you have dual HMCs, you’ll be connecting both. The T2/T3 network connections should be configured to be on the same VLAN as the HMCs private network segment(s).

Once connected and the network ports configured for the VLAN segments one would scan for the managed system. In the case of the Power10 server, it found the managed system, however, it wouldn’t accept the password credentials. Part of the problem is the power 10 server required a vHMC / HMC to be at least at version V10R1M1020. It was under this, so the HMC(s) had to be flashed to a new version (at least 1020). Once this was done, the HMC still wouldn’t accept the password. So one had to launch the Advanced System Management interface available in the HMC, connect using the default ASM credentials of admin/admin. Once connected, you’re forced to update the ASM password for the managed system. Once this was done, the system could be ‘partially’ managed by the HMC. It complained that it wasn’t connected. Referring to the redbooks revealed the issue. The newer power server uses another network interface called VSI configuration. By default, it is set to static. If using DHCP, it will share HMC connectivity and VSI. If using static, you’ll require separate network connections. In this instance the decision was made to utilize DHCP. Once this was done, the managed system was fully controllable by the HMC. On the rare occassion that one removes a managed system, and attempts to re-add it may cause issues. Recently this was the case. The eBMC connections were discovered in the HMC cli. However, the HMC gui didn’t see the connections. If one attempted to add the systems via IP address, it would fail indicating it was already present. The trick is to use the hmc cli to detemine what IPs is already existant using the command lssysconn -r all. Normally one would attempt to reset or rescan first. In the event that doesn’t work, one should remove it from hmc. Especially if you are seeing the errors of “connecting” (and it never does), handshake failures or an unusuable IP address. One should first try a rediscover, then a reset if that doesn’t work.

rmsysconn -o rediscover --ip IPADDRESS rmsysconn -o reset --ip IPADDRESS

Most likely once the managed system for a power10+ server is added, it may complain about no VMI connection. This may require a reboot of the managed system.

In the rare case that a managed system is added, and the status goes into a “recovery” status, further work is involved. First, login to the ASMI interface. Set the power-on options to User-initiated. Reset the managed system. Once the managed system is back up, it will still state that it is in recovery mode. However, as it was set to not to automatically start the service partition, you can recover it by selecting the Power server in question, and under the backup/restore menu, you will have a recover partition entry. In the case of a new power server, you won’t have a recent backup of the HMC data. So you should ‘re-initialize it’ instead. This should now have the power server in a stand-by state.

Hardware Management Console Service Partition

When the new power 10 server was hooked up to the HMC, there was a pre-existing service partition. This partition is supposedly usable to install firmware updates etc. However, this proved more trouble than it was worth. It doesn’t show as an OS, doesn’t allow for remote console sessions from the HMC after booting via the SMS menus. The service partition also has all of the resources allocated to it, such as the CPU and RAM. In this organization, the plan was to utilize 2 x VIOS servers. However, it’s very difficult to add a VIOS(es) partition if all of the resources are already used by the service partition. By default, stopping the service partition will also power off the managed power 10 server. So the managed server was selected, configured by going to the system properties and changing the power-on parameters to ‘User-Initiated‘. After which the service partition LPAR can be shutdown (whilst the manager server staying up). Then the service partition can be deleted. Afterwords, a VIOS(es) can be created manually under the systems partitions / Virtual I/O / Create VIOS menus, or one can utilize the templates menu(s) to create the vios(es).

Virtual I/O Server Image Preparation

The power 10 server doesn’t have an optical dvd drive (or it wasn’t ordered with one), thus to install the VIO Server, it would have to be done over the network or via an USB flash drive. However, first the Virtual I/O server installation image has to be available. This is available from IBM’s Entitlement Software Selection (ESS) site. This is typically available after logging into IBMs website, under the support links. Once downloaded the Virtual I/O server image can be extracted and ‘prepped’ on an USB drive, or the image uploaded to the HMC (images section).

The Virtual I/O Server can be installed via the HMC, however, this usually requires having the other networking adapters pre-configured with the proper vlans etc. This wasn’t the case here, and thus the installation method was going to be via an usb drive. Specifically an USB flash drive. As a Linux system was handy, the image was put on an usb flash drive with something like: sudo dd if=/ISOs/VIOS_3.1.0.0...iso of=/dev/sdc bs=4M progress=status

If you had windows or some other system, you could use a cdr burning application (like nero for example) to ‘write an image’ to an usb drive, or an utility like ruffus. Which ever method you use will ultimately end up with the virtual I/O servers installation files placed on the USB flash drive..

Virtual I/O Server Installation

As the managed power 10 server wasn’t already configured for networking installing VIOS via the hmc or via a NIM server is out, so the installation will be done via an USB flash drive (with the virtual I/O installation media image already prepared from the previous step). Thus, the first VIOS LPAR/partition is created, and assigned physical adapters. One of the things to add here is the Universal Serial Bus device. NOTE: the front USB drives will be useless here, you’ll have to use the ones in the back of the power 10 server.

Insert the USB flash drive into one of the USB ports on location C0 in the back of the power 10 server. Activate the virtual I/O partition, open a remote console. When the system starts to boot up, it will display the IBM logo, with an option to enter the SMS Menu’s (usually option 1). Now if you have an existing hard drive available on the power 10 server, you can change the boot order and boot of the USB device and perform a Virtual I/O installation. If however, you have no hard drives installed on the power 10, and you plan on doing a SAN boot instead, then you’ll want to head into the SMS menus, and under the boot menu you will be able to temporarily do SAN zoning. This will allow the hba port to attempt to login to the hba’s switch port, and the WWN will be captured by the flogi db. Repeat for other fibre adapter ports as necessary. Have the SAN administrator build you a SAN lun (for vios OS SAN boot). Once completed, the SMS menu / IO menu should allow one to see the added device.

NOTE: If you will be doing a VIOS installation to a SAN Boot EMC device, it will require the AIX EMC_ODM_Definitions. To do this properly, have the SAN administrator provide you with ONLY one path to the SAN LUN. Other paths can be added AFTER the EMC_ODM_Definitions is installed on the disk.

If installing to a SAN boot device, and the SAN boot lun has been zoned and presented (with only 1 path to the LUN), you’ll boot the Virtual I/O installation media via the boot / usb drive option. When installing the Virtual I/O server, you may wish to change the VIO Edition, as it’ll be standard by default.

Virtual I/O Etherchannel Creation

Once the Virtual I/O server has been built, the next step will be to configure the two network ports into an etherchannel. The network adapters should of already been added into the Virtual I/O servers profile PRIOR to activation. Assuming this has been done, one can view the available ethernet adapters with a command like: $ lsdev |grep ent

ent0 U78A0.001.DNWHZS4-P1-C3-T1 2-Port 10/100/1000 Base-TX PCI-Express Adapter
ent1 U78A0.001.DNWHZS4-P1-C3-T2 2-Port 10/100/1000 Base-TX PCI-Express Adapter
ent2 U78A0.001.DNWHZS4-P1-C7-T1 2-Port 10/100/1000 Base-TX PCI-Express Adapter
ent3 U78A0.001.DNWHZS4-P1-C7-T2 2-Port 10/100/1000 Base-TX PCI-Express Adapter

Using the example above, we’ve determined the 2 ethernet adapters we want to use are ent0 and ent2 for the etherchannel based on the location and port.

$ mkvdev -lnagg ent0,ent2 -attr mode=8023ad hash_mode=src_dst_port

If other “attributes” are required, they can be added above, or they can be added after the device has been created.

In this use case scenario, we are going to assign an IP address to the newly created etherchannel. We will assume the newly created device is called “ent4”. So to configure an IP and what-not on this interface within the restricted environment we will run:

mktcpip -hostname vios1 -interface en4 -inetaddr 10.0.0.55 -netmask 255.255.255.0 -gateway 10.0.0.254

Another method to accomplish creating the etherchannel and then configuring the IP address is to break out of the restricted shell, and then run the smitty fast_path commands:

oem_setup_env smitty etherchannel # then select the devices and configure the attributes as required smitty tcpip # select the en4 device, and enter the required information

Additional Etherchannel / Shared Ethernet Adapters can be created as required. As the data network mentioned earlier is usually configured as a Shared Ethernet Adapter (with all of the appropriate vlans tied to it).

Disk definition installation

If you were using EMC disks for example, you would be required to install the AIX emc_odm_definitions so the AIX/VIOS system can communicate properly with the underlying EMC devices. As the VIOS was installed, and an IP address has been configured (from above steps), one can now download the EMC aix_odm_definitions from DELL’s web site, and upload them to the VIO server. The correct EMC disk driver definitions are then installed, and the vios should then be rebooted. Once this has been completed, the SAN administrator is free to add the “additional paths” to the boot LUN.

Note: It is possible this operation will fail, indicating a problem writing to the boot device. This is typically because one of the important links are missing. There should exist the /dev/hdiskxxx device, the /dev/hd5 device. On top of that, you should have the ipldevice and ipl_blv device. They link to the bootable hdiskxxx device and the hd5 boot LV. They can be set with:

ln /dev/hdiskXXX /dev/ipldevice
ln /dev/hd5 /dev/ipl_blv

This should alleviate the issue, and allow one to perform bosinst operations.

Firmware Update

To determine which adapter firmware is required, one can look at utilizing inventory scout. Then using IBMs fix-central website, the appropriate firmware can be downloaded and installed. A previous write up about firmware update is available here. As the inventory scout will look at the system firmware as well as the adapter firmware levels.

Update: System and adapter firmware updates can be performed from the HMC. The HMC can connect to a resource or use IBM’s website to determine, download and install the latest fixes. However, when it comes to Fibre Adapters, you can select multiple ports and adapters. Typically, it will indicate they were all updated. That is NOT true. This can take multiple passes to update the firmware. To determine the adapters firmware levels, connect to the system via ssh / remote console and run lsmcode -A

Add VIOS to monitoring solution

Now that the vios(es) have been setup, you’ll want to look at adding it into your monitoring solution for trending, server health, preemptive alerts etc. Note that it isn’t possible to add some VIO systems into the current CAA if the vios version is too old or new. It depends on what is use in the CAA. For example, if the CAA has a bunch of vios nodes at the vios 2.2.1.5 level, you can’t add vios 3.1.1.10 into the environment. Your best option is to migrate to VIOS 3.x on the existing vio caa nodes, then add the newer vios into the mix. When using CAA, you can do rolling updates, which basically requires one to stop cluster services on the node to be updated, perform the update and reboot, then re-start cluster services on that target node. Typically, the DBN node is done last. The DBN node can be determined by running this command on one of the VIO node members: cluster -status |grep -p DBN

Burn-in testing

Typically before you want to release a system into production, you will want to perform some burn-in testing. This is used to test the performance of the hardware, to find peaks etc. By setting up the monitoring solution first, you can get some nice graphs for nmon, grafana, etc. Some sample burn-in testing could be “coin mining” or perhaps Nigel’s nstress tools package, xdisk, etc.

Add VIOS to Cluster Aware AIX

Now that you have completed setting up the hardware, configured it as a managed system via the HMC, installed the VIOS, you’re now ready to have it added into the CAA environment. As this is a pre-existing environment, it would have a bunch of pre-existing VIOS node members, and a slew of shared SAN luns. Some may be on different speed disks and be configured for different disk tiers as well. You typically also have one 10G+ LUN as the repository. It is also good practice to have a 2^nd unused repository disk already carved out and presented to the VIO members. This way if the repository disk gets corrupted, a new one can be built on-the-fly without having to wait on the SAN team to carve out a LUN for you.

Have the SAN team do up the shared luns and present them to you. If you need the WWN of your fibre adapters, that can be garnered with something like: lsdev -dev fcs0 -vpd within the restricted environment. Once they are presented, you will scan for them with: cfgdev within the restricted environment. You may need to rename some of the discovered LUNs, (ex. rendev -l hdisk32 -n prddisk001) as well as set disk attributes for MPIO. Once that is completed, you can add the node into the CAA cluster with the standard commands.

ex:

$ cluster -addnode -clustername NAME -hostname MYHOSTNAME.EXAMPLE.com

Once the node has been added, you can look at the status of the the cluster with the command cluster -status

If the cluster consists of various IOS level versions, you can do a verbose view to see if they are on-level or up-level with the command: cluster -status -verbose

Category: HMC, powervm, SAN, VIOS