Oracle RAC Database – CRS-4535/CRS-4536/CRS-4534 Error
1 of the RAC node crashed due to device full error. Upon starting up the RAC node, it hit the following error.
node1# /u01/app/11.2.0/grid/bin/crsctl check cluster CRS-4535: Cannot communicate with Cluster Ready Services CRS-4530: Communications failure contacting Cluster Synchronization Services daemon CRS-4534: Cannot communicate with Event Manager
From octssd.log (Note the bolded errors)
2016-03-06 14:39:17.943: [ CRSCCL][4036015872]USING GIPC ============ 2016-03-06 14:39:17.943: [ CRSCCL][4036015872]clsCclGipcListen: Attempting to listen on gipcha://node1:CTSSGROUP_1. 2016-03-06 14:39:17.943: [GIPCHGEN][4036015872] gipchaInternalRegister: Initializing HA GIPC 2016-03-06 14:39:17.944: [GIPCHGEN][4036015872] gipchaNodeCreate: adding new node 0x7f6fe4031900 { host '', haName 'b8b8-843d-c412-444e', srcLuid 1d17e809-00000000, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 917676, sentRegister 0, localMonitor 0, flags 0x1 } 2016-03-06 14:39:17.944: [GIPCHTHR][3935278848] gipchaWorkerThread: starting worker thread hctx 0x70b900 [0000000000000010] { gipchaContext : host 'node1', name 'b8b8-843d-c412-444e', luid '1d17e809-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0xc000 } 2016-03-06 14:39:17.944: [GIPCHDEM][3933177600] gipchaDaemonThread: starting daemon thread hctx 0x70b900 [0000000000000010] { gipchaContext : host 'node1', name 'b8b8-843d-c412-444e', luid '1d17e809-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0xc000 } 2016-03-06 14:39:17.945: [GIPCXCPT][3933177600] gipchaDaemonProcessConnect: connection to daemon failed for endp 0x731c20 [0000000000000118] { gipcEndpoint : localAddr 'ipc', remoteAddr 'ipc://gipcd_node1', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x71a660, sendp 0x71a4a0flags 0x8861a, usrFlags 0x24020 }, hctx 0x70b900 [0000000000000010] { gipchaContext : host 'node1', name 'b8b8-843d-c412-444e', luid '1d17e809-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x5 }, ret gipcretConnectionRefused (29) 2016-03-06 14:39:17.945: [GIPCHDEM][3933177600] gipchaDaemonThreadEntry: EXCEPTION[ ret gipcretConnectionRefused (29) ] terminating daemon thread due to exception 2016-03-06 14:39:17.945: [GIPCHDEM][3933177600] gipchaDaemonThreadEntry: daemon thread exiting state gipchaThreadStateFailed (5) 2016-03-06 14:39:17.945: [GIPCXCPT][4036015872] gipchaInternalResolve: failed to resolve ret gipcretDaemonLost (34), host 'node1', port 'CTSSGROUP_1', hctx 0x70b900 [0000000000000010] { gipchaContext : host 'node1', name 'b8b8-843d-c412-444e', luid '1d17e809-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0xd }, ret gipcretDaemonLost (34) 2016-03-06 14:39:17.945: [GIPCHGEN][4036015872] gipchaResolveF [gipcmodGipcResolve : gipcmodGipc.c : 806]: EXCEPTION[ ret gipcretDaemonLost (34) ] failed to resolve ctx 0x70b900 [0000000000000010] { gipchaContext : host 'node1', name 'b8b8-843d-c412-444e', luid '1d17e809-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0xd }, host 'node1', port 'CTSSGROUP_1', flags 0x0 2016-03-06 14:39:17.946: [GIPCXCPT][4036015872] gipchaProcessClientRequest: request failed due to failure in ha threads req 0x7ffc6a994200,req type 1, hctx 0x70b900 [0000000000000010] { gipchaContext : host 'node1', name 'b8b8-843d-c412-444e', luid '1d17e809-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0xd }, ret gipcretDaemonLost (34) 2016-03-06 14:39:17.946: [GIPCHGEN][4036015872] gipchaPublishF [gipcmodGipcBind : gipcmodGipc.c : 884]: EXCEPTION[ ret gipcretDaemonLost (34) ] failed to publish ctx 0x70b900 [0000000000000010] { gipchaContext : host 'node1', name 'b8b8-843d-c412-444e', luid '1d17e809-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0xd }, endp 0x7f6fe40375b0 [0000000000000126] { gipchaEndpoint : port 'CTSSGROUP_1', peer ':', srcCid 00000000-00000000, dstCid 00000000-00000000, numSend 0, maxSend 100, groupListType 1, hagroup 0x7f6fe4032310, usrFlags 0x4000, flags 0x0 }, port 'CTSSGROUP_1', flags 0x4000 2016-03-06 14:39:17.946: [GIPCXCPT][4036015872] gipcBindF [clsCclGipcListen : clsCclCommHandler.c : 3553]: EXCEPTION[ ret gipcretDaemonLost (34) ] failed to bind endp 0x7f6fe402fcc0 [0000000000000101] { gipcEndpoint : localAddr 'gipcha://node1:CTSSGROUP_1', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x7f6fe4024fa0, sendp (nil)flags 0x30400, usrFlags 0x20 }, addr 0x7f6fe402f230 [00000000000000ff] { gipcAddress : name 'gipcha://node1:CTSSGROUP_1', objFlags 0x0, addrFlags 0x0 }, flags 0x0 2016-03-06 14:39:17.946: [ CRSCCL][4036015872]gipcBind() failed. rc= 34. 2016-03-06 14:39:17.946: [ CRSCCL][4036015872]cclLibShutdown called 2016-03-06 14:39:17.946: [ CRSCCL][4036015872]ccllibShutdown done. 2016-03-06 14:39:17.946: [ CTSS][4036015872](:ctss_ccl_init1:): Fails to initialize CCL [2]. Returns [16] 2016-03-06 14:39:17.946: [ CTSS][4036015872]ctss_main: CCL init failed [16] 2016-03-06 14:39:17.946: [ CTSS][4036015872]ctss_main: CTSS daemon aborting [16]. 2016-03-06 14:39:17.946: [ CTSS][4036015872]CTSS daemon aborting
From gipcd.log (Note the bolded errors)
2016-03-06 14:39:16.979: [GIPCDCLT][3731310336] gipcdClientThread: gipcListen() failed (1) on endp 000000000000010c 2016-03-06 14:39:16.979: [ GIPCD][3731310336] gipcdSetThreadState: changing the status of clientThread. current status gipcdThreadStatusInit desired status gipcdThreadStatusOffline 2016-03-06 14:39:16.979: [ GIPCLIB][3731310336] gipclibMapSearch: gipcMapSearch() -> gipcMapGetNodeAddr() failed: ret:gipcretKeyNotFound (36), ht:0x5a72e0, idxPtr:0x7f61df29a8c0, key:0x7f61de6725b0, flags:0x0 2016-03-06 14:39:16.979: [GIPCXCPT][3731310336] gipcObjectLookupF [gipcPostF : gipc.c : 2022]: search found no matching oid 0000000000000000, ret gipcretKeyNotFound (36), ret gipcretInvalidObject (3) 2016-03-06 14:39:16.979: [GIPCXCPT][3731310336] gipcPostF [gipcdClientThread : gipcdClientThread.c : 3485]: EXCEPTION[ ret gipcretInvalidObject (3) ] failed to post obj 0000000000000000, flags 0x0 2016-03-06 14:39:16.980: [GIPCDCLT][3731310336] gipcdClientThread: Client thread has exited 2016-03-06 14:39:16.980: [ GIPCD][3727107840] gipcdThreadWait: GIPCD received a shutdown msg from agent framework or client/node/monitor thread died 2016-03-06 14:39:16.980: [GIPCDMON][3727107840] gipcdMonitorThread: clientThread/nodeThread did not came into 'READY' state 2016-03-06 14:39:16.980: [ GIPCD][3727107840] gipcdSetThreadState: changing the status of monitorThread. current status gipcdThreadStatusReady desired status gipcdThreadStatusOffline 2016-03-06 14:39:16.980: [GIPCDMON][3727107840] gipcdMonitorThread: Monitor thread is exiting.. 2016-03-06 14:39:16.980: [ GIPCD][3831981856] gipcdMain: invalid status of gipcd Threads. Status of clientThread gipcdThreadStatusOffline Status of nodeThread gipcdThreadStatusInit Status of monitorThread gipcdThreadStatusOffline 2016-03-06 14:39:16.986: [ GIPC][3722905344] gipcCheckInitialization: possible incompatible non-threaded init from [clsgpnp0.c : 769], original from [gipcd.c : 177] 2016-03-06 14:39:16.986: [ GIPCLIB][3722905344] gipclibSetTraceLevel: to set level to 0 [ CLWAL][3722905344]clsw_Initialize: OLR initlevel [30000] 2016-03-06 14:39:16.987: [ GIPC][3722905344] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [gipcd.c : 177] 2016-03-06 14:39:16.987: [ GIPCLIB][3722905344] gipclibSetTraceLevel: to set level to 0 2016-03-06 14:39:17.007: [ GPNP][3722905344]clsgpnp_getCachedProfileEx: [at clsgpnp.c:613] Result: (26) CLSGPNP_NO_PROFILE. Can't get offline GPnP service profile: local gpnpd is up and running. Use getProfile instead. 2016-03-06 14:39:17.007: [ GPNP][3722905344]clsgpnp_getCachedProfileEx: [at clsgpnp.c:623] Result: (26) CLSGPNP_NO_PROFILE. Failed to get offline GPnP service profile. 2016-03-06 14:39:17.015: [ GIPCLIB][3722905344] gipclibGetClusterGuid: retrieved cluster guid 770e66ca03467f5bbfbba3432ee2aa7b 2016-03-06 14:39:17.025: [ GIPCLIB][3722905344] gipclibSetTraceLevel: to set level to 0 [ CLWAL][3722905344]clsw_Initialize: OLR initlevel [70000] 2016-03-06 14:39:17.025: [ GIPCLIB][3722905344] gipclibSetTraceLevel: to set level to 0 2016-03-06 14:39:17.048: [ CLSINET][3722905344] Returning NETDATA: 1 interfaces 2016-03-06 14:39:17.048: [ CLSINET][3722905344] # 0 Interface 'bond0',ip='192.168.0.5',mac='40-f2-e9-25-b6-93',mask='255.255.255.0',net='192.168.0.0',use='cluster_interconnect' 2016-03-06 14:39:17.048: [GIPCHGEN][3722905344] gipchaNodeAddInterface: adding interface information for inf 0x9530d0 { host '', haName 'gipcd_ha_name', local (nil), ip '192.168.0.5', subnet '192.168.0.0', mask '255.255.255.0', mac '40-f2-e9-25-b6-93', ifname 'bond0', numRef 0, numFail 0, idxBoot 0, flags 0x1841 } 2016-03-06 14:39:17.048: [GIPCHTHR][3725006592] gipchaWorkerCreateInterface: created local interface for node 'node1', haName 'gipcd_ha_name', inf 'udp://192.168.0.5:42229' 2016-03-06 14:39:17.049: [GIPCHTHR][3725006592] gipchaWorkerCreateInterface: created local bootstrap multicast interface for node 'node1', haName 'gipcd_ha_name', inf 'mcast://224.0.0.251:42424/192.168.0.5' 2016-03-06 14:39:17.049: [GIPCHTHR][3725006592] gipchaWorkerCreateInterface: created local bootstrap multicast interface for node 'node1', haName 'gipcd_ha_name', inf 'mcast://230.0.1.0:42424/192.168.0.5' 2016-03-06 14:39:17.049: [GIPCHTHR][3725006592] gipchaWorkerCreateInterface: created local bootstrap broadcast interface for node 'node1', haName 'gipcd_ha_name', inf 'udp://192.168.0.255:42424' 2016-03-06 14:39:17.049: [GIPCDNDE][3729209088] gipcdNodeThread: gipcdNodeThread started 2016-03-06 14:39:17.049: [ GIPCD][3729209088] gipcdSetThreadState: changing the status of nodeThread. current status gipcdThreadStatusInit desired status gipcdThreadStatusReady 2016-03-06 14:39:17.050: [ GIPCD][3729209088] gipcdThreadWait: GIPCD received a shutdown msg from agent framework or client/node/monitor thread died 2016-03-06 14:39:17.050: [GIPCDNDE][3729209088] gipcdNodeThread: clientThread/monitorThread did not came into 'READY' state 2016-03-06 14:39:17.050: [GIPCDNDE][3729209088] gipcdNodeThreadShutdown: deleting all the peer connections 2016-03-06 14:39:17.050: [ GIPCD][3729209088] gipcdSetThreadState: changing the status of nodeThread. current status gipcdThreadStatusReady desired status gipcdThreadStatusOffline 2016-03-06 14:39:17.050: [GIPCDNDE][3729209088] gipcdNodeThread: Node thread has exited 2016-03-06 14:39:17.050: [ GIPCD][3831981856] gipcdMain: All threads terminated 2016-03-06 14:39:17.150: [ GIPCD][3831981856] gipcdMain: GIPCD terminated
Solution:
Issue was due to network socket files issue. When the node crashes previously, it did not clean up the socket files. Thus, when the node started, it couldn’t create the socket files to establish connectivity to the peer node.
root# crsctl stop crs -f root# rm -rf /usr/tmp/.oracle/* /var/tmp/.oracle/* /tmp/.oracle/* root# crsctl start crs
Regards,
Wei Shan