I’m discovering GPFS and I certainly will continue to post some tips about it here, so don’t be harsh with me, these will be some n00b posts, but it will help me and maybe help some of you too ! 🙂

High CPU usage… Suspect: mmfsd64

As you can see in topas output, mmfsd64 is consuming a lot of CPU, without any obvious reason:

Name            PID  CPU%  PgSp Owner
mmfsd64     2551992  68.9  59.4 root
dsmc        2220040   0.9  31.7 root

This server , still, is in good shape and doesn’t show any sign of abnormal activity, except for this mmfsd64 process, so let’s dig in .

Looking for a clue

With lpar2rrd I found that the high CPU consumption started  2 days ago at around 4PM. By searching into root’s history, I found out that, at the same time, some GPFS commands were executed :


984     2014/04/14 16:35:20 :: mmadddisk /dev/lv_gpfs "gpfs_disk2:::dataAndMetadata:77::system;gpfs_disk3:::dataAndMetadata:52::system"
988     2014/04/14 16:35:59 :: mmlsdisk /dev/lv_gpfs
989     2014/04/14 16:36:15 :: mmdf /dev/lv_gpfs
992     2014/04/14 16:36:46 :: mmrestripefs /dev/lv_gpfs -b

What has been done here, is that somebody added a disk to a GPFS volume, and then restriped the disks to spread the data all over the disks. It should do fine, but in this case, there is a serious problem, and the answer lies here :


# mmdf /dev/lv_gpfs
disk                disk size  failure holds    holds              free KB             free KB
name                    in KB    group metadata data        in full blocks        in fragments
--------------- ------------- -------- -------- ----- -------------------- -------------------
Disks in storage pool: system (Maximum disk size allowed is 780 GB)
gpfs_disk0     69772800       52 yes      yes        20269056 ( 29%)        478608 ( 1%)
gpfs_disk1     69772800       52 yes      yes        20636160 ( 30%)        252096 ( 0%)
gpfs_disk2     34886400       52 yes      yes        34883072 (100%)          1200 ( 0%) *
gpfs_disk3     34886400       77 yes      yes        34881536 (100%)          2608 ( 0%) *

-------------                         -------------------- -------------------
(pool total)        419686080                             122793984 ( 29%)       2234688 ( 1%)

=============                         ==================== ===================
(total)             419686080                             122793984 ( 29%)       2234688 ( 1%)

Inode Information
-----------------
Number of used inodes:           32117
Number of free inodes:          468619
Number of allocated inodes:     500736
Maximum number of inodes:      8491008

See ? GPFS has trouble managing multiple disks with different size, into the same NSD. In fact somebody added 2 disks which are half the size than the previous. 34GB instead of 69GB.

That’s why the GPFS daemon is fiercely working, he’s trying to stripe equally the data all over the disks, but it won’t be equally spread EVER.

Off with their heads !!

So what we have to do is suspend the smaller disks :

 # mmchdisk /dev/lv_gpfs suspend -d "gpfs_disk2;gpfs_disk3 "

Then restripe data over the other same-sized disks :

 # mmrestripefs /dev/lv_gpfs -b

After a while you can see that the disks are being emptied, and the others are filled up progressively:

# mmdf /dev/lv_gpfs

Then remove the small disks and check

# mmdeldisk /dev/lv_gpfs "gpfs_disk2;gpfs_disk3"

# mmlsdisk /dev/lv_gpfs |grep emptied
gpfs_disk2 nsd         512      52 yes      yes   being emptied up           system
gpfs_disk3 nsd         512      77 yes      yes   being emptied up           system

# mmlsnsd

File system   Disk name    NSD servers
---------------------------------------------------------------------------
/dev/lv_gpfs   gpfs_disk0 node1,node2
/dev/lv_gpfs   gpfs_disk1 node1,node2
(free disk)   gpfs_disk2 node1,node2
(free disk)   gpfs_disk3 node1,node2

#  mmdelnsd gpfs_disk2
#  mmdelnsd gpfs_disk3
#  mmlsnsd

File system   Disk name    NSD servers
---------------------------------------------------------------------------
/dev/lv_gpfs   gpfs_disk0 node1,node2
/dev/lv_gpfs   gpfs_disk1 node1,node2

 

You can see now that the mmfsd64 is back to normal :


Name            PID  CPU%  PgSp Owner
kdb_64      2920512  10.1   2.7 root
mmfsd64     7630868   3.9  22.9 root

…then ask your SAN team for new disks , with the same size as the others 🙂

Hope it helps.

Share Button
GPFS : mmfsd64 high CPU usage
Taggé sur :                

Laisser un commentaire