*** UPDATED ON 2014, February : More tips on how fix this, if the snmpd file modification isn’t enough ***
On my PowerHA nodes , I get often this kind of errors :
# clstat -o Failed retrieving cluster information. There are a number of possible causes: clinfoES or snmpd subsystems are not active. snmp is unresponsive. snmp is not configured correctly. Cluster services are not active on any nodes. # cldump cldump: Waiting for the Cluster SMUX peer (clstrmgrES) to stabilize ............. Failed retrieving cluster information. There are a number of possible causes: clinfoES or snmpd subsystems are not active. snmp is unresponsive. snmp is not configured correctly. Cluster services are not active on any nodes. Refer to the HACMP Administration Guide for more information.
It is pretty annoying because I can’t have a « live » output of the server (and I like the output of clstat very much), so I just run clfindres command instead… but theres is actually a fix, I don’t guarantee it will work in every case, but I tried this method and it worked everywhere so far:
First, read the f… README 🙂
# more /usr/es/sbin/cluster/README6.1.0.UPDATE
o APAR IZ41204: Enabling Internet MIB tree for clstat or cldump to work
clstat or cldump will not start if the internet MIB tree is not
enabled in snmpdv3.conf file. This behavior is usually seen in
AIX 6.1 onwards where this internet MIB entry was intentionally
disabled as a security issue. This internet MIB entry is required to
view/resolve risc6000clsmuxpd (188.8.131.52.184.108.40.206.220.127.116.11) MIB sub tree
which is used by clstat or cldump functionality.
There are two ways to enable this MIB sub tree(risc6000clsmuxpd) they
1) Enable the main internet MIB entry by adding this line in
VACM_VIEW defaultView internet – included –
But doing so is not advisable as it unlocks the entire MIB tree
2) Enable only the MIB sub tree for risc6000clsmuxpd without enabling
the main MIB tree by adding this line in /etc/snmpdv3.conf file
VACM_VIEW defaultView 18.104.22.168.22.214.171.124.126.96.36.199 – included –
Note: After enabling the MIB entry above snmp daemon must be restarted
with the following commands as shown below:
1) stopsrc -s snmpd
2) startsrc -s snmpd
After snmp is restarted leave the daemon running for about two minutes
before attempting to start clstat or cldump.
If this line is not present in snmpdv3.conf, then you need to add it :
Edit SNMP V3 configuration
Add this line in /etc/snmpdv3.conf :
VACM_VIEW defaultView 188.8.131.52.184.108.40.206.220.127.116.11 – included –
–> this is the MIB sub tree for risc6000clsmuxpd, full explanation in the README extract above this line.
Make sure to remove those lines’ comments in snmpdv3.conf
smux 18.104.22.168.22.214.171.124.126.96.36.199 gated_password # gated
smux 188.8.131.52.184.108.40.206.220.127.116.11 clsmuxpd_password # PowerHA SystemMirror clsmuxpd
Restart snmp and mibd services
You don’t need (in my experience, I might be wrong) to restart the aixmibd and the hostmibd services.
# stopsrc -s clinfoES # stopsrc -s aixmibd # stopsrc -s snmpmibd # stopsrc -s hostmibd # stopsrc -s snmpd # startsrc -s snmpd # startsrc -s snmpmibd # startsrc -s clinfoES # refresh -s clstrmgrES
Wait for 2 minutes and try again the command :
# clstat clstat - HACMP Cluster Status Monitor ------------------------------------- Cluster: cluster_1 (1587708273) Thu Oct 31 09:34:13 2013 State: UP Nodes: 2 SubState: STABLE Node: node1 State: UP Interface: node1_1 (0) Address: 10.245.37.46 State: UP Resource Group: heartbeatrg State: On line Node: node2 State: UP Interface: node2_2 (0) Address: 10.245.37.47 State: UP Resource Group: heartbeatrg State: On line
Now it works !
UPDATE : ***If it still doesn’t work ***
Let’s dive a little deeper into the abyss… the snmpd logs.
The logfiles of snmpd, mibd agents are located here, you might end up finding some more precise errors:
# cd /usr/tmp/ # ls -ltr total 968 -rw-r--r-- 1 root system 1244 Feb 14 14:09 snmpmibd.log -rw-r--r-- 1 root system 238 Feb 14 14:14 aixmibd.log -rw-r--r-- 1 root system 37506 Feb 14 14:14 snmpdv3.log -rw-r--r-- 1 root system 371 Feb 14 14:14 hostmibd.log
There is also an interesting entry in clinfo.log :
After restarting all the snmp services, you may check for the following line in /var/hacmp/log/clinfo.log :
Fri Feb 14 16:17:13 find_new_clusters: 1 clusters found.
…or a few lines later :
Fri Feb 14 16:17:13 find_new_clusters: 0 clusters found.
# tail /var/hacmp/log/clinfo.log Mon Feb 17 10:24:28 init_cl_sites: there are 0 sites in cluster 1590729639 Mon Feb 17 10:24:28 Returning from: init_cl_sites Mon Feb 17 10:24:28 get_cl_map: init_cl_sites: RC: 0 Mon Feb 17 10:24:28 Returning from: get_cl_map Mon Feb 17 10:24:29 find_new_clusters: send_recv_SNMP_packet successful. Mon Feb 17 10:24:29 find_new_clusters: response from host: 127.0.0.1 Mon Feb 17 10:24:29 find_new_clusters: response from host: 127.0.0.1 Mon Feb 17 10:24:29 find_new_clusters: response from host: 127.0.0.1 Mon Feb 17 10:24:29 find_new_clusters: good address but non-unique cluster, discarding. Mon Feb 17 10:24:29 find_new_clusters: 1 clusters found.
–> It should display those line if it worked.
Check the smuxd daemon (port 199) : should be LISTENing…
# netstat -Aan |grep -w 199 f1000e00054d43b8 tcp 0 0 *.199 *.* LISTEN f1000e000549e3b8 tcp4 0 0 127.0.0.1.199 127.0.0.1.38484 ESTABLISHED f1000e00055c83b8 tcp4 0 0 127.0.0.1.38484 127.0.0.1.199 ESTABLISHED
If smux is not listening, then maybe clinfoES is stuck in « stopping » state ? Kill it !
Sometimes, even after restarting all those services, clstat still won’t work…
Check the clinfoES state :
# lssrc -s clinfoES Subsystem Group PID Status clinfoES cluster 18415676 stopping
That’s not OK , is should either « inoperative » or « active », not « stopping » …
Now, kill it, then restart it (it should be quick) :
# kill 18415676 # startsrc -s clinfoES 0513-059 The clinfoES Subsystem has been started. Subsystem PID is 54591496. # clstat clstat - HACMP Cluster Status Monitor ------------------------------------- [..]
If it still doesn’t work, please recheck the port 199 with netstat -an . If the port isn’t on LISTEN mode, you might have to do the services’ restart all over again . It should **finally** work after that.
Many thanks to the awesome blog at AixHealthCheck.com for pointing this issue out