Thursday, November 28, 2013

ORA-01078 ORA-29701

Error:

SQL> startup
ORA-01078: failure in processing system parameters
ORA-29701: unable to connect to Cluster Synchronization Service


$/rdbms/11gR2/grid/bin/ocssd.bin start
Segmentation Fault - core dumped


Background:

Errors observed while starting up the single instance ASM database post crash happened due to killing of pmon background process or server restart without appropriate service stop

Verifications & Resolution:

ASM instance requires oracle cluster synchronization agent services & daemon in running state.

1. Verify the css process status

$ps -ef | grep -i css
  oracle  2385     1   0 01:23:53 ?           4:44 /rdbms/11gR2/grid/bin/ocssd.bin
  oracle  2373     1   0 01:23:53 ?           0:42 /rdbms/11gR2/grid/bin/cssdagent
  oracle   655   644   0 18:38:17 pts/1       0:00 grep -i css


2. check the status of the cluster resources like diskgroup , asm instance etc.

$crs_stat -t
Name                Type                Target         State        Host
------------- ----------------  -------------  ---------- --------
ora.DG_ARCH.dg ora....up.type    ONLINE    ONLINE    ace-...tuat
ora.DG_DATA.dg ora....up.type    ONLINE    ONLINE    ace-...tuat
ora.DG_INDX.dg ora....up.type     ONLINE    ONLINE    ace-...tuat
ora....ED01.dg   ora....up.type     ONLINE    ONLINE    ace-...tuat
ora....ED02.dg   ora....up.type     ONLINE    ONLINE    ace-...tuat
ora....ED03.dg   ora....up.type     ONLINE    ONLINE    ace-...tuat
ora.asm            ora.asm.type      ONLINE    ONLINE    ace-...tuat
ora.cssd           ora.cssd.type     ONLINE    ONLINE    ace-...tuat
ora.diskmon       ora....on.type     ONLINE    ONLINE    ace-...tuat

3. check alert log of asm for any errors

cat $ORACLE_BASE/log/diag/asm/+asm/<ASM_instance_name>/trace/alert_<ASM_instance_name>.log

Thu Nov 28 01:02:08 2013
Errors in file /rdbms/11gR2/grid/log/diag/asm/+asm/+ASM/trace/+ASM_gmon_17716.trc:
ORA-29702: error occurred in Cluster Group Service operation
ORA-29702: error occurred in Cluster Group Service operation
GMON (ospid: 17716): terminating the instance due to error 29702 


4. verify the existence of parameter file for ASM & check the accessibility of disk paths, other mounted disk

cat $ORACLE_HOME/dbs/init<ASM_instance_name>.ora

asm_diskgroups ='DG_DATA','DG_INDX','DG_ARCH','DG_RED01','DG_RED02','DG_RED03'
asm_diskstring ='/dev/zvol/rdsk/db-dataP*'
instance_type='asm'
large_pool_size=12M
 

$ls -lrt /dev/zvol/rdsk/db-dataP*
total 6
lrwxrwxrwx 1 root root  39 Sep  5  2012 DG_ARCH -> ../../../../devices/pseudo/zfs@0:2c,raw
lrwxrwxrwx 1 root root  39 Sep  5  2012 DG_DATA -> ../../../../devices/pseudo/zfs@0:3c,raw
lrwxrwxrwx 1 root root  39 Sep  5  2012 DG_INDX -> ../../../../devices/pseudo/zfs@0:5c,raw
lrwxrwxrwx 1 root root  39 Sep  5  2012 DG_RED01 -> ../../../../devices/pseudo/zfs@0:6c,raw
lrwxrwxrwx 1 root root  39 Sep  5  2012 DG_RED02 -> ../../../../devices/pseudo/zfs@0:7c,raw
lrwxrwxrwx 1 root root  39 Sep  5  2012 DG_RED03 -> ../../../../devices/pseudo/zfs@0:8c,raw
 


5. if all above checks found ok then try starting up the ASM instance again using srvctl , make sure ORACLE_SID & other environment variables are set correctly

$srvctl start asm

$ps -ef | grep -i pmon
  oracle  2435     1   0 01:24:10 ?           0:00 asm_pmon_+ASM
  oracle  2518  1498   0 01:24:35 pts/2       0:00 grep -i pmon


6. if Step 5 fails then repeat the same after server restart or ask Storage / Unix Admin to check for any issues as server restart may not be feasible in most of the cases