Saturday, February 16, 2013

Oracle 10g RAC Upgrade Lessons



Blog highlights basic steps to upgrade oracle 10g RAC db from 10.2.0.4 to 10.2.0.5 along with obstacles faced & Solution. 

 Systems:
Racdb1/Racdb2 ~ Sun OS 5.10 ~ Oracle 10204 ~ ASM ~ Primary
Racdbdr1/Racdbdr2 ~ Sun OS 5.10 ~ Oracle 10204 ~ ASM ~ Standby

 Activity: To Apply 10.2.0.5 Patch (8202632) + CRS PSU Jan 2011 (9952245) + RDBMS PSU JUL 2011 (12419392) to existing 10.2.0.4 oracle & cluster binaries

 Activity Sequence:
Upgrading 10g RAC from 10204 to 10205 involves below basic steps which needs to be performed on both primary & DR setups sequentially

  1.       Per-activity checks
ORACLE_HOME/ORA_CRS_HOME/oraInventory/OCR backup
Invalid Object/Index, backup status, datafile file status
Opatch version verification, current patchset

  2.       Differing archive sync
Racdb1/Racdb2 primary archive deferred with standby Racdbdr1/Racdbdr2

  3.       Shut all services
Shut down all oracle / CRS services on all nodes & verify

  4.       Apply 10.2.0.5 Patch Set on ORA_CRS_HOME
Apply 10.2.0.5 patch set on ORA_CRS_HOME

  5.       Perform post patch steps for CRS
Executing $ORA_CRS_HOME/install/root102.sh on both primary nodes
This script performs lib file patching & starts up all cluster services

  6.       Apply 10.2.0.5 Patch Set on ORACLE_HOME
Stop all services started post execution of root102.sh
Apply 10.2.0.5 patch set on ORACLE_HOME

  7.       Perform post patch steps for CRS
Execute $ORACLE_HOME/root.sh on both primary nodes
Set cluster_database to FALSE in no-mount stage, startup only one instance
@catupgrd.sql
@utlrp.sql

  8.       Verify 10.2.0.5 patch
Verify component status from dba_registry or registry$
Verify Invalid objects

  9.       Backup ORACLE_HOME/ORA_CRS_HOME/oraInventory/OCR

10.   Apply CRS PSU  to ORA_CRS_HOME

11.   Apply CRS PSU  to ORACLE_HOME

12.   Perform post PSU
@catbundle PSU apply
Verify registry$history
Verify Invalid objects

All above steps executed successfully on Racdb1/Racdb2 but multiple issues faced on Racdbdr1/Racdbdr2 as below

My Oracle Support SR’s (raised during crisis)

3-4716340091: Applying PSU in RAC 10g
3-4697480261: Applying 10.2.0.5 patch to RAC

Notes:

How to Downgrade/Remove Oracle Clusterware (CRS) Patchset Software (Doc ID 754095.1)
INIT.CSSD REMAINS IN STARTCHECK DURING STARTUP (Doc ID 757383.1)

Problems Encountered/Solution:

Problem  1:

Observed below errors while patching crs binaries with 10.2.0.5 patch set

    tar: can't set time on ./Queries21/generalQueries/10.1.0.2.0: Not owner
    tar: can't set time on ./Queries21/generalQueries: Not owner
    tar: can't set time on ./Queries21/generalQueries/10.1.0.3.0: Not owner
    tar: can't set time on ./Queries21/rgsQueries/10.1.0.2.0: Not owner
    tar: can't set time on ./Queries21/rgsQueries: Not owner
    tar: can't set time on ./Queries21/rgsQueries/10.1.0.3.0: Not owner
    tar: can't set time on ./Queries21/ClusterQueries/10.2.0.1.0: Not owner
    tar: can't set time on ./Queries21/ClusterQueries: Not owner
    tar: can't set time on ./Queries21/globalVarQueries/2.1.0.4.1: Not owner
   .
   .
On Activity Sequence: 4 (Apply 10.2.0.5 Patch Set on ORA_CRS_HOME)
Problem description:
Errors occurred while applying 10.2.0.5 to ORA_CRS_HOME at the remote binary copy stage
Action Performed:
Re-executed  ./runInstaller to apply patch on ORA_CRS_HOME
Problem Faced:
Binaries got over written, various file/folder permission’s got changed ,few utility scripts got modified i.e. rootconfig, crsctl etc.
Oracle Support(MOS) Solution:
Error can be ignored, without re-applying patch
Permissions for all impacted files should have been verified to match with primary node binaries.
If not then same should have been changed
Lessons:
1. Never re-run the runinstaller on installation failure
2. Never select ignore & move ahead in runinstaller

On Any error If option prompted for ignore/re-try/cancel, cancel should be selected incase re-try is failing to accept the changes made

Selecting cancel installation will roll back the changes made without leaving the new binaries in existing ORACLE/CRS_HOME

Problem 2:

Errors while executing root102.sh

root@Racdbdr2 # /oracle/clusterware/crs/oracle/product/10.2.0/install/root102.sh
Creating pre-patch directory for saving pre-patch clusterware files
Completed patching clusterware files to /oracle/clusterware/crs/oracle/product/10.2.0
Relinking some shared libraries.
ar: writing /oracle/clusterware/crs/oracle/product/10.2.0/lib/libn10.a
ar: writing /oracle/clusterware/crs/oracle/product/10.2.0/lib32/libn10.a
ar: writing /oracle/clusterware/crs/oracle/product/10.2.0/lib/libn10.a
Relinking of patched files is complete.
WARNING: directory '/oracle' is not owned by root
Preparing to recopy patched init and RC scripts.
Recopying init and RC scripts.
Startup will be queued to init within 30 seconds.
Starting up the CRS daemons.
Waiting for the patched CRS daemons to start.
This may take a while on some systems..
.
Timed out waiting for the CRS daemons to start. Look at the
system message file and the CRS log files for diagnostics.

On Activity Sequence: 5 (Perform post patch steps for CRS)
System: Racdbdr1
Problem description:
Unable to perform roo102.sh post patching activity on CRS
Action Performed:
10.2.0.4 ORA_CRS_HOME restored.
Problem Faced:
Unable to start CRS/DB post restoration
Solution:
N/w Socket file deletion, permission changes, OCR restore &  crs restart.
(Solution in detail provided in sub-sequent section)
Lessons:
 1.       Oracle CRS binaries should not be restored using root osuser
 2.       Permissions should be verified post restoration
 3.       OCR to be restored along with ORA_CRS_HOME binaries

Problem 3:

Unable to start cluster/database post  10204 ORA_CRS_HOME restoration.
Running services were not getting terminated post crs stop

root@Racdbdr1 # ps -ef | grep init
root 1 0 0 10:09:55 ? 0:01 /sbin/init
root 27379 2607 0 10:53:41 ? 0:00 /bin/sh /etc/init.d/init.cssd oclsomon
root 20430 1 0 10:43:13 ? 0:00 /bin/sh /etc/init.d/init.crsd run
root 2900 2607 0 10:13:07 ? 0:00 /bin/sh /etc/init.d/init.cssd oprocd
root 2607 1 0 10:13:05 ? 0:11 /bin/sh /etc/init.d/init.cssd fatal
root 3034 1 0 11:03:28 ? 0:00 /bin/sh /etc/init.d/init.evmd run
root 2955 2607 0 10:13:08 ? 0:00 /bin/sh /etc/init.d/init.cssd daemon
root 5119 2036 0 11:06:53 pts/1 0:00 grep init
root@Racdbdr1 # ps -ef | grep d.bin
root 20526 20430 0 10:43:14 ? 0:05 /oracle/clusterware/crs/oracle/product/10.2.0/bin/crsd.bin restart
oracle 3154 3153 0 11:03:30 ? 0:02 /oracle/clusterware/crs/oracle/product/10.2.0/bin/evmd.bin
oracle 3118 2955 0 10:13:08 ? 0:08 /oracle/clusterware/crs/oracle/product/10.2.0/bin/ocssd.bin
root 3051 2900 0 10:13:08 ? 0:00 /oracle/clusterware/crs/oracle/product/10.2.0/bin/oprocd.bin run -t 1000 -m 500

ocssd.log:
---------
[ CSSD]2011-10-16 11:34:13.634 [14] >TRACE: clsclisten: Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2011-10-16 11:34:13.634 [14] >ERROR: clssgmclientlsnr: listening failed for (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1)) (3)
[ CSSD]2011-10-16 11:34:13.634 [14] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2011-10-16 11:34:13.634 [14] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_Racdbdr1_crs))

System: Racdbdr1
Problem description:
Post problem 1 old (10.2.0.4) ORA_CR_HOME has been restored to restart the installation.
Post CRS home restoration CRS/CSS/EVM services were not starting
Action Performed:
How to Downgrade/Remove Oracle Clusterware (CRS) Patchset Software (Doc ID 754095.1)

 1.       Crs disable “crsctl stop crs”
 2.       Server rebooted
 3.       Cluster de-install & rebuild

Remove the socket files in /tmp/.oracle or /var/tmp/.oracle , reboot the OS
Stopped clusterware on both nodes
$ORA_CRS_HOME/install/rootdelete.sh on both nodes
$ORA_CRS_HOME/install/rootdeinstall on the first node
$ORA_CRS_HOME/root.sh on the node1 and then on the node2

 4.       OCR restored from pre-activity backup

ocrconfig –showbackup
octconfig –restore <ocr_file_name>

Tried to start Cluster on node 1

$ORA_CRS_HOME/root.sh
WARNING: directory '/oracle' is not owned by root
"/dev/rdsk/c2t40d2s5" does not exist. Create it before proceeding.
Make sure that this file is shared across cluster nodes.
"/dev/rdsk/c2t40d3s5" does not exist. Create it before proceeding.
Make sure that this file is shared across cluster nodes.
"/dev/rdsk/c2t40d4s5" does not exist. Create it before proceeding.
Make sure that this file is shared across cluster nodes.

Action Performed:
Verified $CRS_HOME/install/paramfile.crs & updated $CRS_HOME/install/rootconfig accordingly.

ORA_CRS_HOME=/oracle/clusterware/crs/oracle/product/10.2.0
CRS_ORACLE_OWNER=oracle
CRS_DBA_GROUP=oinstall
CRS_VNDR_CLUSTER=false
CRS_OCR_LOCATIONS=/dev/rdsk/c2t40d0s5,/dev/rdsk/c2t40d1s5
CRS_CLUSTER_NAME=crs
CRS_HOST_NAME_LIST=Racdbdr1,1,Racdbdr2,2
CRS_NODE_NAME_LIST=Racdbdr1,1,Racdbdr2,2
CRS_PRIVATE_NAME_LIST=Racdbdr1-priv,1,Racdbdr2-priv,2
CRS_LANGUAGE_ID='AMERICAN_AMERICA.WE8ISO8859P1'
CRS_VOTING_DISKS=/dev/rdsk/c2t40d2s5,/dev/rdsk/c2t40d3s5,/dev/rdsk/c2t40d4s5
CRS_NODELIST=Racdbdr1,Racdbdr2
CRS_NODEVIPS='Racdbdr1/Racdbdr1-vip/255.255.255.192/ce0,Racdbdr2/Racdbdr2-vip/255.255.255.192/ce0'


Corrected to

root@Racdbdr1 # cat /oracle/clusterware/crs/oracle/product/10.2.0/bin/install/paramfile.crs
ORA_CRS_HOME=/oracle/clusterware/crs/oracle/product/10.2.0
CRS_ORACLE_OWNER=root
CRS_DBA_GROUP=oinstall
CRS_VNDR_CLUSTER=false
CRS_OCR_LOCATIONS=/dev/rdsk/ocr15,/dev/rdsk/ocr25
CRS_CLUSTER_NAME=crs
CRS_HOST_NAME_LIST=Racdbdr1,1,Racdbdr2,2
CRS_NODE_NAME_LIST=Racdbdr1,1,Racdbdr2,2
CRS_PRIVATE_NAME_LIST=Racdbdr1-priv,1,Racdbdr2-priv,2
CRS_LANGUAGE_ID='AMERICAN_AMERICA.WE8ISO8859P1'
CRS_VOTING_DISKS=/dev/rdsk/ocr35,/dev/rdsk/ocr45,/dev/rdsk/ocr55
CRS_NODELIST=Racdbdr1,Racdbdr2
CRS_NODEVIPS='Racdbdr1/Racdbdr1-vip,Racdbdr2/Racdbdr2-vip'
O/p:
----------------rootdelete.sh--------------------
root@Racdbdr1 # $ORA_CRS_HOME/install/rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources. This could take several minutes.
Error while stopping resources. Possible cause: CRSD is down.
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down...
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in '/var/opt/oracle/scls_scr'
root@Racdbdr1 #
-----------------------------rootdeinstall.sh----------------------------------
root@Racdbdr1 # rootdeinstall.sh
Removing contents from OCR mirror device
2560+0 records in
2560+0 records out
Removing contents from OCR device
2560+0 records in
2560+0 records out
-------------------root.sh----------------------------------
root@Racdbdr1 # $ORA_CRS_HOME/root.sh
WARNING: directory '/oracle' is not owned by root
Checking to see if Oracle CRS stack is already configured
Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/oracle' is not owned by root
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: Racdbdr1 Racdbdr1-priv Racdbdr1
node 2: Racdbdr2 Racdbdr2-priv Racdbdr2
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /dev/rdsk/ocr35
Now formatting voting device: /dev/rdsk/ocr45
Now formatting voting device: /dev/rdsk/ocr55
Format of 3 voting devices complete.
Startup will be queued to init within 30 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
        Racdbdr1
CSS is inactive on these nodes.
        Racdbdr2
Local node checking complete.
Run root.sh on remaining nodes to start CRS daemons.
root@Racdbdr1 #

Problem 4:

Unable to start oracle/RDBMS services as an oracle user

oracle@Racdbdr1$crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora....dr1.gsd application    ONLINE    ONLINE    Racdbdr1 
ora....dr1.ons application    ONLINE    ONLINE    Racdbdr1 
ora....dr1.vip application    ONLINE    ONLINE    Racdbdr1 
ora....SM2.asm application    ONLINE    UNKNOWN   Racdbdr2 
ora....dr2.gsd application    ONLINE    ONLINE    Racdbdr2 
ora....dr2.ons application    ONLINE    ONLINE    Racdbdr2 
ora....dr2.vip application    ONLINE    ONLINE    Racdbdr2 

oracle@Racdbdr1$srvctl start  asm -n Racdbdr1 -i +ASM1 -o mount
PRKS-1009 : Failed to start ASM instance "+ASM1" on node "Racdbdr1", [CRS-1028: Dependency analysis failed because of:
CRS-0223: Resource 'ora.Racdbdr1.ASM1.asm' has placement error.]

System: Racdbdr1
Problem description:
After rebuilding the cluster ,we were unable to add new services like db, tns, instance as a oracle osuser to cluster services
Action Performed:
Srvctl remove & srvctl add service commands executed to remove already added services to cluster  as a root.

root@Racdbdr1 # crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora.DRNMS.db   application    OFFLINE   OFFLINE              
ora....s1.inst application    ONLINE    UNKNOWN   Racdbdr1 
ora....s2.inst application    ONLINE    OFFLINE              
ora....SM1.asm application    ONLINE    ONLINE    Racdbdr1 
ora....R1.lsnr application    ONLINE    ONLINE    Racdbdr1 
ora....dr1.gsd application    ONLINE    ONLINE    Racdbdr1 
ora....dr1.ons application    ONLINE    ONLINE    Racdbdr1 
ora....dr1.vip application    ONLINE    ONLINE    Racdbdr1 
ora....SM2.asm application    ONLINE    ONLINE    Racdbdr2 
ora....R2.lsnr application    ONLINE    ONLINE    Racdbdr2 
ora....dr2.gsd application    ONLINE    ONLINE    Racdbdr2 
ora....dr2.ons application    ONLINE    ONLINE    Racdbdr2 
ora....dr2.vip application    ONLINE    ONLINE    Racdbdr2

Problem 5:

Unable to execute root102.sh post 10205 CRS Patching

root@Racdbdr1 # /oracle/clusterware/crs/oracle/product/10.2.0/install/root102.sh
WARNING: directory '/oracle' is not owned by root
Preparing to recopy patched init and RC scripts.
Recopying init and RC scripts.
Startup will be queued to init within 30 seconds.
Starting up the CRS daemons.
Waiting for the patched CRS daemons to start.
  This may take a while on some systems.
.
.
.
.
Timed out waiting for the CRS daemons to start. Look at the
system message file and the CRS log files for diagnostics.

System: Racdbdr1
Problem description:
While running post script post applying 10205 CRS patch, we were unable to start the cluster
Action Performed:

1. Cluster stopped. /crsctl stop crs –f , removed /var/tmp/.oracle start the cluster on both nodes
2. Permission changed to oracle:oinstall from root:oinstall for below directories.
chown -R oracle:oinstall oui install ccr inventory odbc log diagnostics OPatch jre JRE

root@Racdbdr1 # root102.sh
WARNING: directory '/oracle' is not owned by root
Preparing to recopy patched init and RC scripts.
Recopying init and RC scripts.
Startup will be queued to init within 30 seconds.
Starting up the CRS daemons.
Waiting for the patched CRS daemons to start.
  This may take a while on some systems.
.
10205 patch successfully applied.
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Successfully deleted 1 values from OCR.
Successfully deleted 1 keys from OCR.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: Racdbdr1 Racdbdr1-priv Racdbdr1
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
clscfg -upgrade completed successfully
Creating '/oracle/clusterware/crs/oracle/product/10.2.0/install/paramfile.crs' with data used for CRS configuration
Setting CRS configuration values in /oracle/clusterware/crs/oracle/product/10.2.0/install/paramfile.crs


1 comment: