CRS-4123 CRS-4124 CRS-4000


Issue:   cluster start command "crsctl start crs"  fails to start crs services

CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-4124: Oracle High Availability Services startup failed
CRS-4000: Command Start failed, or completed with errors


Cause:

Recent changes to cluster hosts ...



  1. OS patching 
  2. Network subnet changes 
  3. OS reboot  
  4. Files system permission changes 
  5. ASM disk unavailability
  6. private network unreliability 


Possible Solution:


1. Verify if cluster interconnect is reachable : Obtain private interconnect IP from /etc/hosts and check ping response from all cluster nodes

2. Compare ifconfig -a output before & after OS changes to validate if Ethernet naming (e.g. eth0/eth1) has not changed , in case earlier o/p is not available second running node can also be referred


3. Verify ASM disk availability using kfod disks=all 


4. if OS kernal has been upgraded as a recent change take strace o/p as below

sudo strace $GRID_HOME/bin/crsctl start crs

If below errors observed in trace files then possible cause is OHASD process is not able to spawn the crs start
demons and stucked in "ohasd run" , in this scenario either /var/tmp/.oracle , /tmp/.oracle , /usr/tmp/.oracle need to be backed up and cleaned up followed by clean OS and crs restart

sendto(23, "\4", 1, MSG_NOSIGNAL, NULL, 0) = 1

connect(27, {sa_family=AF_LOCAL, sun_path="/var/tmp/.oracle/sprocr_local_conn_0_PROL"}, 110) = -1 E
ioctl(20, FIONBIO, [1])                 = 0

connect(20, {sa_family=AF_LOCAL, sun_path="/var/tmp/.oracle/sOHASD_UI_SOCKET"}, 110) = -1 ENOENT (No such file or directory)

5. if None of the above steps work then need to rollback the recently upgraded kernal or OS patch


6. You may use cluvfy stage -pre crsinst -n node1,node2  as a part of troubleshooting

Refer:

Top 5 Grid Infrastructure Startup Issues (Doc ID 1368382.1)

No comments:

Post a Comment