I had a problem once which caused the crash of a migrating LPAR, because of a SCSI reserve on my rootvg (rootvg’s hdisk should be ALWAYS set with the no_reserve attribute for LPM, wether they are managed by MPIO, HDLM or Powerpath, check my post about this ).
LPM validation loophole
The trick is that LPM validation doesn’t detect the HDLM reserve… and says that you’re good to go with the migration.
Everything works fine, till the end of the migration. If your reference code on both sides is stuck to « 2005 » for serveral minutes, you’re good to force shutdown the LPAR, because you just lost your rootvg disk…
In that case, you have many options to unlock the situation :
- You can migrate back using inactive live partition migration (lpar stopped), and clean the mess by restarting on the right frame, then try migrating again…. boring !!!
- Ask the SAN guys to change the zoning of your LUNs, to a server with HDLM , which will be able to remove the reserve on the disks, then ask them to re-zone the LUNs to your actual server. Not so exciting, huh?
- Or you can have fun, by doing it yourself, without asking anything to anybody : we’ll have to create a new virtual FC adapter with forced WWNs (of my crashed LPAR), in order to see and clear the LUNS on my working HDLM server :
So, we need to create a virtual FC adapter on the lpar #25 (my « working » LPAR), with the slot id 3, connected to the FC client adapter (slot id 4) hosted by the lpar #15, with the WWNS c0507604f3d10008 (npiv) and c0507604f3d10009 (lpm) (these WWNs belong to my crashed LPAR) :
- Add a client FC adapter on the other LPAR (the running one) , by specifying the WWNs we want to use :
hscroot@HMC# chhwres -r virtualio -m MyP795 -o a --id 25 \ --rsubtype fc -s 3 -a 'adapter_type=client,remote_lpar_id=15,remote_slot_num=4,\ wwpns="c0507604f3d10008,c0507604f3d10009"'
- Let’s add via DLPAR a FC client adapter on the virtual server
- Let’s discover the newly created adapter and perform a vfcmap on the VIOS, to « link it » to a physical port :
root@vio# cfgdev root@vio# lsmap –all -npiv root@vio# vfcmap -vadapter vfchost5 –fcp fcs0
- On the LPAR, let’s check if we see the FC adapter and its attached disks, and then clear the reserve on the disks by using HDLM command dlmpr:
root@lpar# cfgmgr ( fcs3 discovered) root@lpar# lspv root@lpar# dlmpr –k Self Reservation Key : [0x00000005f699234c] hdisk0 Reservation Key : [0x0000000000000000] hdisk1 Reservation Key : [0x0000000000000000] hdisk2 Reservation Key : [0x00000005f699144c]* Regist Key : [0x00000005f699144c] , Key Count : 4 hdisk3 Reservation Key : [0x0000000000000000] KAPL10665-I The dlmpr utility completed. root@lpar# # /usr/DynamicLinkManager/bin/dlmpr –c hdisk2 self Reservation Key : [0x00000005f699234c] hdisk2 Reservation Key : [0x00000005f699144c]* Regist Key : [0x00000005f699144c] , Key Count : 4 KAPL10665-I The dlmpr utility completed. root@lpar# rmdev –Rdl fcs3
- Ok now we’re clean on the LPAR.
- On theVIOS , delete the vfchost and take it back via DLPAR
root@vio# rmdev –Rdl vfchost5
Tadaaa ! Now you can successfully restart your crashed LPAR, it will boot just fine 🙂