As it looked somehow like a communication problem between the 2 nodes, network team has checked the switches involved - without any outcome.
Even crashing instances were a problem already, it get worse one day when one node rebooted (according to the clusters alert.log and cssd.log due to network heartbeat issues) and then the clusterstack did not start anymore.
2016-12-12 03:35:34.203 [CLSECHO(54825)]CRS-10001: 12-Dec-16 03:35 AFD-9204: AFD device driver installed or loaded status: 'false' 2016-12-12 09:17:25.698 [OSYSMOND(1247)]CRS-8500: Oracle Clusterware OSYSMOND process is starting with operating system process ID 1247 2016-12-12 09:17:25.699 [CSSDAGENT(1248)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 1248 2016-12-12 09:17:25.854 [OCSSD(1264)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 1264 2016-12-12 09:17:26.899 [OCSSD(1264)]CRS-1713: CSSD daemon is started in hub mode 2016-12-12 09:17:32.220 [OCSSD(1264)]CRS-1707: Lease acquisition for node yyy2 number 2 completed 2016-12-12 09:17:33.280 [OCSSD(1264)]CRS-1605: CSSD voting file is online: ORCL:ASM_OCR_VOTE_1; details in /xxx/app/grid/diag/crs/yyy1/crs/trace/ocssd.trc. 2016-12-12 09:17:33.289 [OCSSD(1264)]CRS-1672: The number of voting files currently available 1 has fallen to the minimum number of voting files required 1. 2016-12-12 09:27:25.925 [CSSDAGENT(1248)]CRS-5818: Aborted command 'start' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:0:22951} in /xxx/app/grid/diag/crs/yyy2/crs/trace/ohasd_cssdagent_root.trc. 2016-12-12 09:27:25.925 [OCSSD(1264)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /xxx/app/grid/diag/crs/yyy2/crs/trace/ocssd.trc 2016-12-12 09:27:25.926 [OCSSD(1264)]CRS-1603: CSSD on node yyy2 shutdown by user. Mon Dec 12 09:27:30 2016 Errors in file /xxx/app/grid/diag/crs/yyy2/crs/trace/ocssd.trc (incident=857): CRS-8503 [] [] [] [] [] [] [] [] [] [] [] [] Incident details in: /xxx/app/grid/diag/crs/yyy/crs/incident/incdir_857/ocssd_i857.trc
CSS trace is filled with messages reporting no connectivity with node1:
2016-12-12 09:27:20.375584 : CSSD:3154114304: clssscWaitOnEventValue: after CmInfo State val 3, eval 1 waited 1000 with cvtimewait status 4294967186 2016-12-12 09:27:20.585624 :GIPCHALO:3141216000: gipchaLowerSendEstablish: sending establish message for node '0x7f7f900a37e0 { host 'yyy1', haName '480e-0dfa-bf94-bbda', srcLuid c33a92f9-675f2c44, dstLuid 00000000-00000000 numInf 1, sentRegister 0, localMonitor 0, baseStream 0x7f7f9009b110 type gipchaNodeType12001 (20), nodeIncarnation 9ec9e8e8-682809fa incarnation 2 flags 0x102804}' 2016-12-12 09:27:20.633907 : CSSD:3635484416: clsssc_CLSFAInit_CB: System not ready for CLSFA initialization 2016-12-12 09:27:20.633912 : CSSD:3635484416: clsssc_CLSFAInit_CB: clsfa fencing not ready yet 2016-12-12 09:27:20.656587 : CSSD:3124418304: clssnmvDHBValidateNCopy: node 1, yyy1, has a disk HB, but no network HB, DHB has rcfg 371663236, wrtcnt, 11120596, LATS 232008644, lastSeqNo 11120595, uniqueness 1476197219, timestamp 1481534839/2789302712 2016-12-12 09:27:20.868210 : CSSD:3119687424: clssnmSendingThread: Connection pending for node yyy1, number 1, flags 0x00000002 2016-12-12 09:27:21.375702 : CSSD:3154114304: clssscWaitOnEventValue: after CmInfo State val 3, eval 1 waited 1000 with cvtimewait status 4294967186 2016-12-12 09:27:21.585813 :GIPCHALO:3141216000: gipchaLowerSendEstablish: sending establish message for node '0x7f7f900a37e0 { host 'yyy1', haName '480e-0dfa-bf94-bbda', srcLuid c33a92f9-675f2c44, dstLuid 00000000-00000000 numInf 1, sentRegister 0, localMonitor 0, baseStream 0x7f7f9009b110 type gipchaNodeType12001 (20), nodeIncarnation 9ec9e8e8-682809fa incarnation 2 flags 0x102804}' 2016-12-12 09:27:21.634038 : CSSD:3635484416: clsssc_CLSFAInit_CB: System not ready for CLSFA initialization 2016-12-12 09:27:21.634046 : CSSD:3635484416: clsssc_CLSFAInit_CB: clsfa fencing not ready yet 2016-12-12 09:27:21.657538 : CSSD:3124418304: clssnmvDHBValidateNCopy: node 1, yyy1, has a disk HB, but no network HB, DHB has rcfg 371663236, wrtcnt, 11120597, LATS 232009644, lastSeqNo 11120596, uniqueness 1476197219, timestamp 1481534840/2789303712 2016-12-12 09:27:21.868336 : CSSD:3119687424: clssnmSendingThread: Connection pending for node yyy1, number 1, flags 0x00000002 2016-12-12 09:27:22.375830 : CSSD:3154114304: clssscWaitOnEventValue: after CmInfo State val 3, eval 1 waited 1000 with cvtimewait status 4294967186 2016-12-12 09:27:22.586063 :GIPCHALO:3141216000: gipchaLowerSendEstablish: sending establish message for node '0x7f7f900a37e0 { host 'yyy1', haName '480e-0dfa-bf94-bbda', srcLuid c33a92f9-675f2c44, dstLuid 00000000-00000000 numInf 1, sentRegister 0, localMonitor 0, baseStream 0x7f7f9009b110 type gipchaNodeType12001 (20), nodeIncarnation 9ec9e8e8-682809fa incarnation 2 flags 0x102804}' 2016-12-12 09:27:22.634195 : CSSD:3635484416: clsssc_CLSFAInit_CB: System not ready for CLSFA initialization 2016-12-12 09:27:22.634203 : CSSD:3635484416: clsssc_CLSFAInit_CB: clsfa fencing not ready yet
After even more investigation on the Network another SR was filed.
Due to previous SRs oswatcher was installed already, and there we found the important information in the netstats segment:
zzz ***Fri Dec 9 14:54:54 GMT 2016 Ip: 13943376329 total packets received 129843 with invalid addresses 0 forwarded 0 incoming packets discarded 11934989273 incoming packets delivered 11631767391 requests sent out 2 outgoing packets dropped 148375 fragments dropped after timeout 2498052793 reassemblies required 494739589 packets reassembled ok 353229 packet reassembles failed 411073325 fragments received ok 2109526776 fragments created
and after 2 minutes:
zzz ***Fri Dec 9 14:56:55 GMT 2016 Ip: 13943469180 total packets received 129849 with invalid addresses 0 forwarded 0 incoming packets discarded 11935067348 incoming packets delivered 11631828206 requests sent out 2 outgoing packets dropped 148375 fragments dropped after timeout 2498069258 reassemblies required 494741345 packets reassembled ok 359542 packet reassembles failed 411073565 fragments received ok 2109528513 fragments created
The important part are the 6313 packet reassembles failed. In comparison to 16465 reassemblies required.
This led to some notes which describe both our symptoms (instance and cluster stack failure)
RHEL 6.6: IPC Send timeout/node eviction etc with high packet reassembles failure (Doc ID 2008933.1)
and
The CRSD is Intermediate State and Not Joining to the Cluster (Doc ID 2168576.1)
Reassembly happens when the sender wants so send more data than fits into a single packet. In this cluster the MTU size is 1500 - and in our example we had 16465 datagrams which needed to be reassembled, but 6131 failed. There are some variables in the Linux kernel, they can affect the buffer used in kernel to reassembly fragmented datagrams.
The solution for our system was to increase 2 parameters:
net.ipv4.ipfrag_high_thresh = 16777216 net.ipv4.ipfrag_low_thresh = 15728640
These can be changed in the running system in
/proc/sys/net/ipv4/ipfrag_low_thresh /proc/sys/net/ipv4/ipfrag_high_threshand for persistent changes in sysctl.conf
Unfortunately these parameters were not mentioned in any of the prerequisit scripts I found.
With all these knowledge, we identified an important difference to other clusters: This one is the only with MTU 1500 - so much more fragmented packages needed carehere.
After the issue itself was solved, I wondered if it can be found on a vanilla 12.1 crs installation.
(vanilla in comparison to our setup where oswatcher was installed due to the first SRs).
Yes, our beloved -MGMTDB holds the information already! It's in the documentation as well (Troubleshooting Oracle Clusterware) and in the output of oclumon dumpnodeview I can see
IPReasFail - Number of failures detected by the IPv4 reassembly algorithm
Node: yyy1 Clock: '16-11-26 06.55.27 Etc/GMT' SerialNo:443 NICS: bond0 netrr: 159.021 netwr: 181.688 neteff: 340.709 nicerrors: 0 pktsin: 412 pktsout: 358 errsin: 0 errsout: 0 indiscarded: 0 outdiscarded: 0 inunicast: 412 innonunicast: 0 type: PUBLIC lo netrr: 37.722 netwr: 37.722 neteff: 75.443 nicerrors: 0 pktsin: 95 pktsout: 95 errsin: 0 errsout: 0 indiscarded: 0 outdiscarded: 0 inunicast: 95 innonunicast: 0 type: PUBLIC bond1 netrr: 2350.313 netwr: 42989.510 neteff: 45339.823 nicerrors: 0 pktsin: 1927 pktsout: 31345 errsin: 0 errsout: 0 indiscarded: 0 outdiscarded: 0 inunicast: 1927 innonunicast: 0 type: PRIVATE PROTOCOL ERRORS: IPHdrErr: 0 IPAddrErr: 102203 IPUnkProto: 0 IPReasFail: 59886 IPFragFail: 0 TCPFailedConn: 12598 TCPEstRst: 335559 TCPRetraSeg: 67276584 UDPUnkPort: 40134 UDPRcvErr: 0
Unfortunately the format is kind of clumsy - I will need to dig into it's tables for a better output - especially for quick but powerful reports during problems.
During my research, I discovered it's not an oracle-only problem, others are affected as well (and provide a great description).
1 Kommentar:
Hi, thank you - this post helped me solve an issue with node eviction when other nodes were rebooted/stopped.
Kommentar veröffentlichen