DLPAR error : *humpf* … what do I do ?

You may encounter sometimes this type of error when you try a DLPAR operation :

Sometimes this problem is caused by a cloned system, which inherits older infos from the original system, and these need to be wiped clean.


Is your connexion to your HMC established or broken ?

First check on your lpar if your RSCT services are running (the key thing is that IBM.DRM should be in active state) :

root@lpar # lssrc -g rsct_rm
 ctrmc rsct 13172908 active
 IBM.DRM rsct_rm 10420380 active
 IBM.CSMAgentRM rsct_rm 12583156 active
 IBM.ServiceRM rsct_rm 13369366 active
 IBM.MgmtDomainRM rsct_rm 5111906 active
 ctcas rsct inoperative
 IBM.AuditRM rsct_rm inoperative

Uksing the lsrsrc command, you may check the hostname of the HMC your LPAR is connected to  :

root@lpar #  lsrsrc "IBM.ManagementServer"
Resource Persistent Attributes for IBM.ManagementServer
resource 1:
 ResourceHandle = "0x6015 0xffff 0x687c66ac 0xa150cb54 0x12ac5ab4 0x3ececd4c"
 Name = ""
 Variety = 1
Hostname = ""
 ManagerType = "HMC"
 NodeIDs = {7529005568693461460}
LocalHostname = ""
 MgmtSvrNodeID = 426489130722495177
ConnectivityNames = {""}
 MgmtSvrPrimaryNames = {"HMC1"}
 ClusterTM = "9078-160"
 ClusterSNum = ""
 HAState = 1
 HMCName = "7042CR6*999BC"
HMCIPAddr = ""
HMCAddIPs = ""
 MgmtSvrHostNamesList = {}
 LocalHostNamesList = {}
 HMCAddIPv6s = ""
 ActivePeerDomain = ""
 NodeNameList = {"LPAR1"}

If you get a similar result, then you don’t have a problem with your HMC connectivity, and  you should go to the rsct-layer checks below.
If the previous command doesn’t return anything, You need to check your network settings first :

Are your firewall ports open ?

To be sure of that and isolate all the other partitions which actually may have some issues DLPAR operations, you can do it with the lspartition command.

lspartition command

Note :

  • An active ID>=1 means that the partition is talking to the HMC
  • An active ID=0 means that the partition is not communicating with the HMC
  • The Dcaps value have to be higher than <0x0>, otherwise we have an RMC problem.
  • If your LPAR doesn’t even show up in your lspartition -dlpar output,it might be a network/firewall related problem (check procedure detailed below)

Here’s an example :

hscroot@HMC:~> lspartition -dlpar | grep -B1 "Active:"
<#162> Partition:<60*9119-FHB*999BC, ,>
 Active:<1>, OS:<AIX, 6.1, 6100-01-09-1015>, DCaps:<0x2c5f>, CmdCaps:<0x1b, 0x1b>, PinnedMem:<1196>
<#163> Partition:<17*9119-FHB*999BC, ,>
 Active:<0>, OS:<, , >, DCaps:<0x0>, CmdCaps:<0x0, 0x0>, PinnedMem:<0>

We can see here that we have 1 faulty partition, with an lpar_id=17 (Sadly we can’t have the lpar_name directly with this command, you’ll have to cross-check with a lssyscfg command, for example).

Hint : You can determine the lpar_id by reading the output like this :

« <#163> Partition:<17*9119-FHB*999BC »   <====>   <#ID> Partition:<lpar_id*Model-Type*Serial_number »

TIP: If you wanna check only one partition in particular for example the lpar with an ID=17, you can try this (Careful, if your HMC manages multiple frames like mine, you may have multiple answers, especially for small IDs like 1,2,3…. be sure to check the right frame’s serial number first. Personnaly, I always add a grep -w $serial_number in my scripts, just to avoid confusion between multiple frames)  :

hscroot@HMC:~>  lspartition -dlpar | grep -A1 "17\*"
<#163> Partition:<17*9119-FHB*999BC, ,>
       Active:<0>, OS:<, , >, DCaps:<0x0>, CmdCaps:<0x0, 0x0>, PinnedMem:<0>

–> If this command does not return anything, then the system might not be set up for RMC, thus you need to check multiple things :

  • First check the network connectivity on your LPAR and the VIO Server wich serves it. It  goes without saying 🙂
  • check with your Network guys their firewall rules (you need to have ingoing/outgoing udp/tcp packets on port 657, from your lpar’s VLAN to your HMC’s VLAN) –> Actually this was my problem a few days ago, packets were dropped by the Firewall because they forgot to allow ingoing packets on port 657 on my VLAN…
  • check your HMC’s Firewall configuration, please be sure to have the line « RMC » on port 657 (tcp/udp) in the « Allowed hosts » Array :

Now we gotta fix the problem, what are the options ?

Reboot the HMC

Just joking 😀 , I’m not a big fan of the reboot option… it may fix the wrong connectivity problem (with Dcaps=<0x0>), but you’ll never know why… quite disturbing from my point of view), but well, if you’re in a rush… You might want to give it a try anyway.

hscroot@HMC # hmcshutdown -r -t now

Reset your RMC connection

Here you will need to reset and re-establish the RMC connections to your HMC   by issuing a series of rsct-related commands on your LPAR:

/usr/sbin/rsct/install/bin/uncfgct -n
/usr/sbin/rsct/bin/rmcctrl -z
/usr/sbin/rsct/bin/rmcctrl -A
/usr/sbin/rsct/bin/rmcctrl -p
echo "checking connection..."
lsrsrc -p 0 IBM.ManagementServer|grep Hostname|grep -v Local

Extra problem : It still doesn’t work !!!

 Sometimes it won’t even work after executing those commands ; then maybe you’re sitting on a bigger problem, coming from the HMC itself ;

Call 911

In that case, you will need to call your IBM support, in order to get a pesh password for a day, and as root , reset the RMC layer on the HMC by issuing the same commands as I mentioned before :
/usr/sbin/rsct/bin/rmcctrl -p

Enjoy DLPAR 🙂


Technical Reference: RSCT for AIX

Verifying RMC connections for the mobile partition

Debug and Fix RMC Connection Errors

The most common reasons for failures with Dynamic Logical Partitioning

Debug of RMC on HMC (Hardware Management Console)

A Practical Guide for Resource Monitoring and Control (RMC)

DLPAR or Dynamic Logical Partitioning problems

Checking status of the RMC connection on IVM and HMC using rmcdomainstatus

Verifying RMC connections for the mobile partition

Share Button
DLPAR issues and RMC connections digest
Taggé sur :                

Laisser un commentaire