Wednesday, April 24, 2013

ORA-07286: sksagdi: cannot obtain device information



Error:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 04/23/2013 18:27:34
ORA-19870: error while restoring backup piece archive_3840_1_813515387
ORA-07286: sksagdi: cannot obtain device information.
Linux-x86_64 Error: 2: No such file or directory

When:

While restoring archives from veritas netbackup using rman on db server

Background:

1. Two node Oracle 11g RAC on Linux 
2. Veritas Netbackup configured on both nodes 
3. Full Back up & Archive backup Policy scheduled on Veritas
4. Veritas uses Storage Media server instead of TAPE to store RMAN backup sets
5. Veritas Backup policies calls rman scripts supplied by Veritas software binaries & modified as per DB setup
6. RMAN scripts uses multinode channel method for backup/restoration high availability 
7. RMAN creates channels to each node during full db & archive backup

Archive Backup command:

CMD_STR="
ORACLE_HOME=$ORACLE_HOME
export ORACLE_HOME
ORACLE_SID=$ORACLE_SID
export ORACLE_SID
$RMAN target $TARGET_CONNECT_STR nocatalog msglog $RMAN_LOG_FILE append << EOF
RUN {
sql 'alter system switch logfile';
# backup all archive logs
ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE'  CONNECT 'sys/Oracle_123@REMCORP1' ;
ALLOCATE CHANNEL ch01 TYPE 'SBT_TAPE'  CONNECT 'sys/Oracle_123@REMCORP2' ;
BACKUP
   filesperset 20
   FORMAT 'archive_%s_%p_%t'
   ARCHIVELOG ALL DELETE INPUT ;
RELEASE CHANNEL ch00;
RELEASE CHANNEL ch01;
}
EOF
"


Archives restore command:

CMD_STR="
ORACLE_HOME=$ORACLE_HOME
export ORACLE_HOME
ORACLE_SID=$ORACLE_SID
export ORACLE_SID
$RMAN target $TARGET_CONNECT_STR nocatalog msglog $RMAN_LOG_FILE append << EOF
RUN {
ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE'  CONNECT 'sys/Oracle_123@REMCORP1' ;
ALLOCATE CHANNEL ch01 TYPE 'SBT_TAPE'  CONNECT 'sys/Oracle_123@REMCORP2' ;
set archivelog destination to '/dump/backup/logmnr';
restore archivelog from logseq 5948 until logseq 5949 thread 1;
RELEASE CHANNEL ch00;
RELEASE CHANNEL ch01;
}
EOF
"

Cause:

During restoration, rman was looking out for the backup handle “archive_3840_1_813515387” which was part of backup piece /STU1/remedy-ebu-db2_1366714776_C1_F1 stored on Media server.

Below is truncated the output of list backup to identify the backup set required for particular archivelog file restoration from Backup Media server to local file system .

RMAN> List backup;

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
3759    1.08G      SBT_TAPE    00:00:29     23-APR-13
        BP Key: 3759   Status: AVAILABLE  Compressed: NO  Tag: TAG20130423T162946
        Handle: archive_3840_1_813515387   Media: /STU1/remedy-ebu-db2_1366714776_C1_F1

  List of Archived Logs in backup set 3759
  Thrd Seq     Low SCN    Low Time  Next SCN   Next Time
  ---- ------- ---------- --------- ---------- ---------
  1    5948    7845989896085 23-APR-13 7845991024579 23-APR-13
  1    5949    7845991024579 23-APR-13 7845991026707 23-APR-13
  2    3375    7845988715315 23-APR-13 7845991027317 23-APR-13

RMAN backup channels are created during backup which helps to restore if any of the node is down during recovery.

Due to multinode channeling during backup execution, restoration command was also formed with channels connecting to two instances.

If archivelog destination is not supplied in restore command then rman will restore the archives in ASM archive destination (or current db archive destination) by default

RUN {
ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE'  CONNECT 'sys/Oracle_123@REMCORP1' ;
ALLOCATE CHANNEL ch01 TYPE 'SBT_TAPE'  CONNECT 'sys/Oracle_123@REMCORP2' ;
set archivelog destination to '/dump/backup/logmnr';
restore archivelog from logseq 5948 until logseq 5949 thread 1;
RELEASE CHANNEL ch00;
RELEASE CHANNEL ch01;
}

As we have specified set archivelog destination to '/dump/backup/logmnr';” along with channels to restore backup from media device, rman assumes that file location ‘/dump/backup/logmnr' is shared across two nodes.

And just because /dump/backup/logmnr' is not shared storage, channel is unable to find the archive location to restore the archives resulting into ORA-19870: error while restoring backup piece archive_3840_1_813515387

Due to multinode channeling,  depending upon the instance channel which has taken backup of piece /STU1/remedy-ebu-db2_1366714776_C1_F1 will try to retrieve & restore the archives to that particular instance location when archive destination supplied is not shared resulting into ORA-07286: sksagdi: cannot obtain device information during retrieval.

Workaround/Solution:

Solution is to have shared archive destination across RAC instances such as ASM disk group or ACFS shared file system (11gR2) supporting multinode channels during restoration.

Workaround is to remove multinode channels & execute restoration from only a node with single channel which has taken the backup of particular set of archives to be restored.

Make sure archive destination specified is accessible

Also make sure media such as storage disk or TAPE is available

e.g.

RUN {
ALLOCATE CHANNEL ch01 TYPE 'SBT_TAPE'  CONNECT 'sys/Oracle_123@REMCORP2' ;
set archivelog destination to '/dump/backup/logmnr';
restore archivelog from logseq 5948 until logseq 5949 thread 1;
RELEASE CHANNEL ch01;
}

Wednesday, April 17, 2013

ORA-30012: undo tablespace 'UNDOTBS2' does not exist or of wrong type



Error:

SQL> startup
ORACLE instance started.
Total System Global Area 2221395968 bytes
Fixed Size                                2230272 bytes
Variable Size                      1325402112 bytes
Database Buffers               889192448 bytes
Redo Buffers                         4571136 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-30012: undo tablespace 'UNDOTBS2' does not exist or of wrong type
Process ID: 23812
Session ID: 3 Serial number: 3

Background:

This error appeared after adding undo tablespace for second RAC node.
DB was getting mounted but OPEN DATABASE was throwing ORA-30012

Cause/Possible Solution:

1. Did not used DBCA?

Database created without dbca may face this issue , so if it is feasible then try to create database instance using dbca than using create database command or through restoration

2. Undo tablespace missing

Undo tablespace itself is missing in database.
In RAC , every instance has its own undo tablespace

In Open mode from other RAC instance,

SQL>select TABLESPACE_NAME ,BLOCK_SIZE,INITIAL_EXTENT,MAX_EXTENTS,STATUS,CONTENTS,EXTENT_MANAGEMENT,SEGMENT_SPACE_MANAGEMENT from dba_tablespaces where TABLESPACE_NAME like 'UNDO%';

TABLESPACE_NAME       BLOCK_SIZE INITIAL_EXTENT MAX_EXTENTS STATUS    CONTENTS  EXTENT_MAN SEGMEN
----------------                        -------------- ---------- -------------- ----------- --------- --------- ---------- ------
UNDOTBS1                        8192          65536  2147483645 ONLINE    UNDO      LOCAL      MANUAL
UNDOTBS2                         8192          65536  2147483645 ONLINE    UNDO      LOCAL      MANUAL

From Mount mode,

SQL>select * from v$tablespace where NAME like 'UNDO%';

       TS# NAME                           INC BIG FLA ENC
    ---------- ------------------------------ --- --- --- ---
         2 UNDOTBS1                       YES NO  YES
         7 UNDOTBS2                       YES NO  YES

3. Undo tablespace created is Not UNDO

You have created UNDO tablespace without “undo” clause mentioned like below , which will be similar as other users tablespace with CONTENTS as PERMANENT than UNDO

Wrong Type:

SQL>create tablespace UNDOTBS2 datafile  '+DATA1_RPT' size 200M;

SQL> select * from dba_tablespaces  where TABLESPACE_NAME like 'UNDO%'  ;

TABLESPACE_NAME              BLOCK_SIZE INITIAL_EXTENT NEXT_EXTENT MIN_EXTENTS MAX_EXTENTS   MAX_SIZE PCT_INCREASE MIN_EXTLEN STATUS       CONTENTS  LOGGING   FOR EXTENT_MAN ALLOCATIO PLU SEGMEN DEF_TAB_ RETENTION   BIG PREDICA ENC COMPRESS_FOR
------------------------------ ---------- -------------- ----------- ----------- ----------- ---------- ------------ ---------- --------- --------- --------- --- ---------- --------- --- ------ -------- ----------- --- ------- --- ------------
UNDOTBS1                                              8192     65536                        1  2147483645 2147483645                       65536 ONLINE    UNDO     LOGGING   NO  LOCAL      SYSTEM    NO  MANUAL DISABLED NOGUARANTEE NO  HOST    NO
UNDOTBS2                                              8192     65536                        1  2147483645 2147483645                       65536 ONLINE    PERMANENT LOGGING   NO  LOCAL      SYSTEM    NO  AUTO   DISABLED NOT APPLY   NO  HOST    NO


Correct Type:

SQL> create undo tablespace UNDOTBS2 datafile '+DATA1_RPT' SIZE 200M;

SQL> select * from dba_tablespaces  where TABLESPACE_NAME like 'UNDO%';

TABLESPACE_NAME                BLOCK_SIZE INITIAL_EXTENT NEXT_EXTENT MIN_EXTENTS MAX_EXTENTS   MAX_SIZE PCT_INCREASE MIN_EXTLEN STATUS    CONTENTS  LOGGING   FOR EXTENT_MAN ALLOCATIO PLU SEGMEN DEF_TAB_ RETENTION   BIG PREDICA ENC COMPRESS_FOR
------------------------------ ---------- -------------- ----------- ----------- ----------- ---------- ------------ ---------- --------- --------- --------- --- ---------- --------- --- ------ -------- ----------- --- ------- --- ------------
UNDOTBS1                             8192          65536                       1  2147483645 2147483645                   65536 ONLINE    UNDO      LOGGING   NO  LOCAL      SYSTEM    NO  MANUAL DISABLED NOGUARANTEE NO  HOST    NO
UNDOTBS2                             8192          65536                       1  2147483645 2147483645                   65536 ONLINE    UNDO      LOGGING   NO  LOCAL      SYSTEM    NO  MANUAL DISABLED NOGUARANTEE NO  HOST    NO

4. Set undo_management to MANUAL

This is just a workaround if all above solutions does not work; it will help to open the database but plan to set it AUTO in future

SQL>Alter system set undo_management=’MANUAL’;


Friday, April 12, 2013

ORA-15021: parameter "remote_dependencies_mode" is not valid in asm instance

Error:

RMAN-00554: initialization of internal recovery manager package failed
RMAN-12001: could not open channel default
RMAN-10008: could not create channel context
RMAN-10002: ORACLE error: ORA-15021: parameter "remote_dependencies_mode" is not valid in asm instance
RMAN-10006: error running SQL statement: alter session set remote_dependencies_mode = signature


When: While trying to connect to rman using rman target /

Cause: 

Incorrect instance type specified to which connection has been attempted.
it should RDBMS , or an attempt has been made to connect to non RDBMS instance like ASM

Solution:

Verify the environment variable , it should be of RDBMS instance 
Run .profile of database instance & Re-try


$ . ./.bash_profile_ora
$ rman target /
Recovery Manager: Release 11.2.0.3.0 - Production on Fri Apr 12 19:13:34 2013
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
connected to target database: CRAMPRD (DBID=3112183348, not open)
RMAN> exit

Recovery Manager complete.

ORA-01156: recovery or flashback in progress may need access to files

Error:

ORA-01156: recovery or flashback in progress may need access to files

When: While adding standby redo log files in standby database

Solution:

1.Cancel the recovery 

alter database recover managed standby database cancel ;

2. Add standby redo log file 

alter database add standby logfile group 11 ('+DATA1','+DATA1') size 1G;

3. Restart the recovery

alter database recover managed standby database disconnect from session;
 





DEMO:


SQL> alter database add standby logfile group 9 ('+DATA1','+DATA1') size 1G;
alter database add standby logfile group 9 ('+DATA1','+DATA1') size 1G
*
ERROR at line 1:
ORA-01156: recovery or flashback in progress may need access to files



SQL> alter database recover managed standby database cancel ;

Database altered.

SQL> alter database add standby logfile group 9 ('+DATA1','+DATA1') size 1G;

Database altered.

SQL> alter database add standby logfile group 10 ('+DATA1','+DATA1') size 1G;

Database altered.

SQL> alter database add standby logfile group 11 ('+DATA1','+DATA1') size 1G;

Database altered.


SQL> select * from v$standby_log;
    GROUP# DBID                                        THREAD#  SEQUENCE#      BYTES  BLOCKSIZE       USED ARC STATUS     FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME LAST_CHANGE# LAST_TIME
---------- ---------------------------------------- ---------- ---------- ---------- ---------- ---------- --- ---------- ------------- --------- ------------ --------- ------------ ---------
         9 UNASSIGNED                                        0          0 1073741824        512          0 YES UNASSIGNED
        10 UNASSIGNED                                        0          0 1073741824        512          0 YES UNASSIGNED
        11 UNASSIGNED                                        0          0 1073741824        512          0 YES UNASSIGNED


SQL>  alter database recover managed standby database disconnect from session;

Database altered.




Note: 
+DATA1 is the ASM disk group path specified , in your case if its local/storage file system then specify the absolute path for redo log file & Ensure available free space in diskgroup or files system

Use Below to identify the free space in ASM diskgroups:

set line 900
set pages 900
col NAME for a20
col PATH for a50
select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,STATE,OS_MB,TOTAL_MB,FREE_MB,HOT_USED_MB,COLD_USED_MB,NAME,PATH ,CREATE_DATE
 from v$asm_disk;

 

Sunday, April 7, 2013

PRVF-9652 : Cluster Time Synchronization Services check failed

Error:

NTP daemon slewing option check failed on some nodes
Check failed on nodes:
PRVF-5436 : The NTP daemon running on one or more nodes lacks the slewing option "-x"
Clock synchronization check using Network Time Protocol(NTP) failed
PRVF-9652 : Cluster Time Synchronization Services check failed


When :

Error at cluster verification check while installing 11g Grid Infrastructure (11.2.0.3) on Linux

Analysis:

During GI installation cluster verification has failed without showing much details in GUI. So $GRID_HOME/cfgtoollogs/oui/installActions*.log has been mined and found cluster verification check failure errors .  

Installation did not impacted due to this error, after closing GUI which has shown partial successful completion of installation cluster verification has been invoked manually as below to verify the root cause


$cluvfy stage -post crsinst -n inpudicrmrdbprtrpt1,inpudicrmrdbprtrpt2
 
Performing post-checks for cluster services setup
Checking node reachability...
Node reachability check passed from node "inpudicrmrdbprtrpt1"
Checking user equivalence...
User equivalence check passed for user "ora11g"
Checking node connectivity...
Checking hosts config file...
Verification of the hosts config file successful
Check: Node connectivity for interface "bond0"
.

.
.
.
.
 Checking CSS daemon...
Oracle Cluster Synchronization Services appear to be online.
Cluster manager integrity check passed
Default user file creation mask check passed
Checking cluster integrity...
Cluster integrity check passed
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations
ASM Running check passed. ASM is running on all specified nodes
Checking OCR config file "/etc/oracle/ocr.loc"...
OCR config file "/etc/oracle/ocr.loc" check successful
Disk group for ocr location "+OCR1" available on all the nodes
NOTE:
 

.
.
.
ONS node application check passed
Checking Single Client Access Name (SCAN)...
Checking TCP connectivity to SCAN Listeners...
.

.
.
CTSS resource check passed
Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed
Check CTSS state started...

CTSS is in Observer state. Switching over to clock synchronization checks using NTP
Starting Clock synchronization checks using Network Time Protocol(NTP)...
 

NTP Configuration file check started...
NTP Configuration file check passed

Checking daemon liveness...
Liveness check passed for "ntpd"
Check for NTP daemon or service alive passed on all nodes

NTP daemon slewing option check failed on some nodes
Check failed on nodes:
    inpudicrmrdbprtrpt2,inpudicrmrdbprtrpt1
PRVF-5436 : The NTP daemon running on one or more nodes lacks the slewing option "-x"
Clock synchronization check using Network Time Protocol(NTP) failed

PRVF-9652 : Cluster Time Synchronization Services check failed
Checking VIP configuration.
Checking VIP Subnet configuration.
Check for VIP Subnet configuration passed.
Checking VIP reachability
Check for VIP reachability passed.

Post-check for cluster services setup was unsuccessful on all the nodes.


Clock synchronization check between two nodes using NTP got failed

Clock sync component has been verified using cluvfy comp as below

$cluvfy comp clocksync -n all

Verifying Clock Synchronization across the cluster nodes
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
CTSS resource check passed
Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed
Check CTSS state started...
CTSS is in Observer state. Switching over to clock synchronization checks using NTP
Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP Configuration file check started...
NTP Configuration file check passed

Checking daemon liveness...
Liveness check passed for "ntpd"
Check for NTP daemon or service alive passed on all nodes

NTP daemon slewing option check failed on some nodes
Check failed on nodes:
    inpudicrmrdbprtrpt2,inpudicrmrdbprtrpt1
PRVF-5436 : The NTP daemon running on one or more nodes lacks the slewing option "-x"
Clock synchronization check using Network Time Protocol(NTP) failed
PRVF-9652 : Cluster Time Synchronization Services check failed
Verification of Clock Synchronization across the cluster nodes was unsuccessful on all the specified nodes. 


ntp service status has been verified 

$  service ntpd status
ntpd (pid  5513) is running...

$ ps -ef|grep ntp
ntp       5513     1  0 Mar13 ?        00:00:02 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ora11g   13846 10941  0 12:35 pts/0    00:00:00 grep ntp


$ grep OPTIONS /etc/sysconfig/ntpd
OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid -g"

NTP service daemon found to be running in -g mode than -x , as the error itself suggest that it should be running in -x mode.  -x mode does not allow system clock to be adjusted backward in order to sync cluster node timings

PRVF-5436 : The NTP daemon running on one or more nodes lacks the slewing option "-x"


$ more /etc/sysconfig/ntpd
# Drop root to id 'ntp:ntp' by default.
OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid -g"


/etc/sysconfig/ntpd file has been modified to reflect -x option in all cluster nodes as below 

$ more /etc/sysconfig/ntpd
# Drop root to id 'ntp:ntp' by default.
OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid -x"


ntp service daemon has been restarted

$ sudo service ntpd stop
Shutting down ntpd: [60G[ [0;32m  OK  [0;39m]

$ sudo service ntpd start
Starting ntpd: [60G[ [0;32m  OK  [0;39m]

$ ps -ef| grep -i ntpd
ntp    14886     1  0 12:38 ?        00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -x
ora11g   14889 10941  0 12:38 pts/0    00:00:00 grep -i ntpd


Post ntp configuration file changes cluster verification utility has been re-executed

$cluvfy comp clocksync -n all

Verifying Clock Synchronization across the cluster nodes
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
CTSS resource check passed
Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed
Check CTSS state started...
CTSS is in Observer state. Switching over to clock synchronization checks using NTP
Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...
NTP Configuration file check passed


Checking daemon liveness...
Liveness check passed for "ntpd"
Check for NTP daemon or service alive passed on all nodes
NTP daemon slewing option check passed
NTP daemon's boot time configuration check for slewing option passed

NTP common Time Server Check started...
Check of common NTP Time Server passed
Clock time offset check from NTP Time Server started...
Clock time offset check passed

Clock synchronization check using Network Time Protocol(NTP) passed

Oracle Cluster Time Synchronization Services check passed
Verification of Clock Synchronization across the cluster nodes was successful. 



$cluvfy stage -post crsinst -n inpudicrmrdbprtrpt1,inpudicrmrdbprtrpt2

Performing post-checks for cluster services setup
Checking node reachability...
Node reachability check passed from node "inpudicrmrdbprtrpt1"
Checking user equivalence...
User equivalence check passed for user "ora11g"
.

.
.
.
Check of multicast communication passed.
Time zone consistency check passed
Checking Oracle Cluster Voting Disk configuration...
ASM Running check passed. ASM is running on all specified nodes
Oracle Cluster Voting Disk configuration check passed
Checking Cluster manager integrity...
Checking CSS daemon...
Oracle Cluster Synchronization Services appear to be online.
Cluster manager integrity check passed
.

.
.
Checking node application existence...
Checking existence of VIP node application (required)
VIP node application check passed
Checking existence of NETWORK node application (required)
NETWORK node application check passed
Checking existence of GSD node application (optional)
GSD node application is offline on nodes "inpudicrmrdbprtrpt2,inpudicrmrdbprtrpt1"
Check CTSS state started...
CTSS is in Observer state. Switching over to clock synchronization checks using NTP
Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP Configuration file check started...
NTP Configuration file check passed
Checking daemon liveness...
Liveness check passed for "ntpd"
Check for NTP daemon or service alive passed on all nodes
NTP daemon slewing option check passed
NTP daemon's boot time configuration check for slewing option passed
NTP common Time Server Check started...
Check of common NTP Time Server passed
Clock time offset check from NTP Time Server started...
Clock time offset check passed
Clock synchronization check using Network Time Protocol(NTP) passed
Oracle Cluster Time Synchronization Services check passed
Checking VIP configuration.
Checking VIP Subnet configuration.
Check for VIP Subnet configuration passed.
Checking VIP reachability
Check for VIP reachability passed.
Post-check for cluster services setup was successful.



Solution:

Modify  /etc/sysconfig/ntpd file on all Cluster nodes

OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid -g"

to

OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid -x"

And restart  ntp service


service ntpd stop
service ntpd start