Update : PowerHA : clstat,cldump don’t work ? Check your SNMP configuration!

*** UPDATED ON 2014, February : More tips on how fix this, if the snmpd file modification isn’t enough ***

On my PowerHA nodes , I get often this kind of errors :


# clstat -o
Failed retrieving cluster information.

There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.

# cldump

cldump: Waiting for the Cluster SMUX peer (clstrmgrES)
to stabilize

.............
Failed retrieving cluster information.

There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.

Refer to the HACMP Administration Guide for more information.

It is pretty annoying because I can’t have a « live » output of the server (and I like the output of  clstat very much), so I just run clfindres command instead… but theres is actually a fix, I don’t guarantee it will work in every case, but I tried this method and it worked everywhere so far:

First, read the f… README :)

# more /usr/es/sbin/cluster/README6.1.0.UPDATE

=====================================================================
o APAR IZ41204: Enabling Internet MIB tree for clstat or cldump to work
=====================================================================

clstat or cldump will not start if the internet MIB tree is not
enabled in snmpdv3.conf file. This behavior is usually seen in
AIX 6.1 onwards where this internet MIB entry was intentionally
disabled as a security issue. This internet MIB entry is required to
view/resolve risc6000clsmuxpd (1.3.6.1.4.1.2.3.1.2.1.5) MIB sub tree
which is used by clstat or cldump functionality.

There are two ways to enable this MIB sub tree(risc6000clsmuxpd) they
are:
1) Enable the main internet MIB entry by adding this line in
/etc/snmpdv3.conf file
VACM_VIEW defaultView internet – included -
But doing so is not advisable as it unlocks the entire MIB tree

2) Enable only the MIB sub tree for risc6000clsmuxpd without enabling
the main MIB tree by adding this line in /etc/snmpdv3.conf file
VACM_VIEW defaultView 1.3.6.1.4.1.2.3.1.2.1.5 – included -

Note: After enabling the MIB entry above snmp daemon must be restarted
with the following commands as shown below:
1) stopsrc -s snmpd
2) startsrc -s snmpd
After snmp is restarted leave the daemon running for about two minutes
before attempting to start clstat or cldump.

If this line is not present in snmpdv3.conf, then you need to add it :

Edit SNMP V3 configuration

Add this line in /etc/snmpdv3.conf  :

VACM_VIEW defaultView 1.3.6.1.4.1.2.3.1.2.1.5 – included -

–> this is the MIB sub tree for risc6000clsmuxpd, full explanation in the README extract above this line.

Make sure to remove those lines’ comments in snmpdv3.conf

smux     1.3.6.1.4.1.2.3.1.2.1.2         gated_password # gated   

smux     1.3.6.1.4.1.2.3.1.2.1.5      clsmuxpd_password # PowerHA  SystemMirror clsmuxpd     

Restart snmp and mibd services

You don’t need (in my experience, I might be wrong) to restart the aixmibd and the hostmibd services.

#  stopsrc -s clinfoES
#  stopsrc -s aixmibd
#  stopsrc -s snmpmibd
#  stopsrc -s hostmibd
#  stopsrc -s snmpd
#  startsrc -s snmpd
#  startsrc -s snmpmibd
#  startsrc -s clinfoES
#  refresh -s clstrmgrES

Wait for 2 minutes and try again the command :


# clstat

clstat - HACMP Cluster Status Monitor

 -------------------------------------

Cluster: cluster_1 (1587708273)
Thu Oct 31 09:34:13 2013
 State: UP Nodes: 2
 SubState: STABLE

 Node: node1 State: UP
 Interface: node1_1 (0) Address: 10.245.37.46
 State: UP
 Resource Group: heartbeatrg State: On line

Node: node2 State: UP
 Interface: node2_2 (0) Address: 10.245.37.47
 State: UP
 Resource Group: heartbeatrg State: On line 

Now it works !

UPDATE : ***If it still doesn’t work ***

Let’s dive a little deeper into the abyss… the snmpd logs.

The logfiles of snmpd, mibd agents are located here, you might end up finding some more precise errors:

# cd /usr/tmp/

# ls -ltr
total 968
-rw-r--r--    1 root     system         1244 Feb 14 14:09 snmpmibd.log
-rw-r--r--    1 root     system          238 Feb 14 14:14 aixmibd.log
-rw-r--r--    1 root     system        37506 Feb 14 14:14 snmpdv3.log
-rw-r--r--    1 root     system          371 Feb 14 14:14 hostmibd.log

There is also an interesting entry in clinfo.log :

After restarting all the snmp services, you may check for the following line in /var/hacmp/log/clinfo.log  :

Fri Feb 14 16:17:13 find_new_clusters: 1 clusters found.

…or a few lines later :

Fri Feb 14 16:17:13 find_new_clusters: 0 clusters found.

 # tail /var/hacmp/log/clinfo.log

Mon Feb 17 10:24:28 init_cl_sites: there are 0 sites in cluster 1590729639

Mon Feb 17 10:24:28 Returning from: init_cl_sites

Mon Feb 17 10:24:28 get_cl_map: init_cl_sites: RC: 0

Mon Feb 17 10:24:28 Returning from: get_cl_map

Mon Feb 17 10:24:29 find_new_clusters: send_recv_SNMP_packet successful.

Mon Feb 17 10:24:29 find_new_clusters: response from host:  127.0.0.1

Mon Feb 17 10:24:29 find_new_clusters: response from host:  127.0.0.1

Mon Feb 17 10:24:29 find_new_clusters: response from host:  127.0.0.1

Mon Feb 17 10:24:29 find_new_clusters: good address but non-unique cluster, discarding.

Mon Feb 17 10:24:29 find_new_clusters: 1 clusters found.

–> It should display those line if it worked.

Check the smuxd daemon (port 199) : should be LISTENing…

# netstat -Aan |grep -w 199
f1000e00054d43b8 tcp        0      0  *.199                 *.*                   LISTEN
f1000e000549e3b8 tcp4       0      0  127.0.0.1.199         127.0.0.1.38484       ESTABLISHED
f1000e00055c83b8 tcp4       0      0  127.0.0.1.38484       127.0.0.1.199         ESTABLISHED

If smux is not listening, then maybe clinfoES is stuck in « stopping » state ? Kill it !

Sometimes, even after restarting all those services, clstat still won’t  work…

Check the clinfoES state :

#  lssrc -s clinfoES
Subsystem         Group            PID          Status
clinfoES         cluster          18415676     stopping

That’s not OK , is should either « inoperative » or « active », not « stopping » …

Now, kill it, then restart it (it should be quick) :

 # kill 18415676

# startsrc -s clinfoES
0513-059 The clinfoES Subsystem has been started. Subsystem PID is 54591496.
# clstat

clstat - HACMP Cluster Status Monitor
-------------------------------------
[..] 

If it still doesn’t work, please recheck the port 199 with netstat -an . If the port isn’t on LISTEN mode, you might have to do the services’ restart all over again . It should **finally** work after that.

Links

Many thanks to the awesome blog at AixHealthCheck.com for pointing this issue out

How to fix clstat and cldump related problems

HACMP and SNMP utilities

How the SMUX protocol works

SNMPv3 architecture

The primary parts of the SNMPv3 architecture

Share

2 thoughts on “Update : PowerHA : clstat,cldump don’t work ? Check your SNMP configuration!

  1. Starting from PowerHA 7.1.2 there is a change in SMUX communication, right now IPv6 is used by default.

    There is an APAR IV42375 for changing Community in snmpdv3.conf and line for smux must be changed as well.

    Working snmpdv3.conf lines:

    smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password ::1 128
    COMMUNITY public public noAuthNoPriv 0::0 0

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Vous pouvez utiliser ces balises et attributs HTML : <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>