BR0233E Backup utility has reported an error while saving file

Issue: SAP oracle database Archive log backup failing using BR Tool


Error:

BR0233E Backup utility has reported an error while saving file
ERROR: incorrect format or option in .utl file (-p)
BR0231E Backup utility call failed
ERROR: Can't change to working directory Errno = 13 : Permission denied
BR0005I BRARCHIVE terminated with errors

Background/Analysis:

SAP archive backup policy started failing which is using brarchive ( SAP supplied backup utility BR Tool ) command executed via backup policy

As a first action logs & errors identified 

Secondly , brarchive command executed manually from database server which is client to backup media server as below

brarchive -u / -c -d util_file -r /oracle/PRD/102_64/dbs/initPRD-logs.utl -s

Be careful , "-s" flag is for saving archive log files to backup device & "-sd" to save & delete from source post backup to backup device

First error observed was "ERROR: Can't change to working directory:  Errno = 13 : Permission denied " on specific folder mentioned in the error

BR0280I BRARCHIVE time stamp: 2013-03-25 19.29.54
BR0229I Calling backup utility with function 'backup'...
BR0278I Command output of 'backint -u PRD -f backup -i /oracle/PRD/saparch/.aekviixr.lst -t file -p /oracle/PRD/102_64/dbs/initPRD-logs.utl -c':
ERROR: Can't change to working directory: /usr/openv/netbackup/logs/user_ops/sap Errno = 13 : Permission denied
WARNING: sort_backup_type not specified in .utl file
WARNING: default sort_backup_type to <size> for backup
ERROR: incorrect format or option in .utl file (-p)


Action taken was to make such folder available with proper permission using & re-execute the manual backup command


mkdir -p /usr/openv/netbackup/logs/user_ops/sap





Previous error disappeared , but got new error "BR0233E Backup utility has reported an error while saving file"

BR0280I BRARCHIVE time stamp: 2013-03-25 19.57.04
BR0229I Calling backup utility with function 'backup'...
BR0278I Command output of 'backint -u PRD -f backup -i /oracle/PRD/saparch/.aekvilii.lst -t file -p /oracle/PRD/102_64/dbs/initPRD-logs.utl -c':
WARNING: sort_backup_type not specified in .utl file
WARNING: default sort_backup_type to <size> for backup
********************************************************************************
Program:                backint 7.1
Input File:             /oracle/PRD/saparch/.aekvilii.lst
Profile:                /oracle/PRD/102_64/dbs/initPRD-logs.utl
Function:               BACKUP
Backup Type:            BACKUP_FILE
********************************************************************************
BR0233E Backup utility has reported an error while saving file /oracle/PRD/oraarch/PRDarch1_242217_759153586.arcBR0233E Backup utility has reported an error while saving file /oracle/PRD/oraarch/PRDarch1_242218_759153586.arc
BR0233E Backup utility has reported an error while saving file /oracle/PRD/oraarch/PRDarch1_242219_759153586.arc
BR0233E Backup utility has reported an error while saving file /oracle/PRD/oraarch/PRDarch1_242220_759153586.arc
BR0233E Backup utility has reported an error while saving file /oracle/PRD/oraarch/PRDarch1_242221_759153586.arc
BR0233E Backup utility has reported an error while saving file /oracle/PRD/oraarch/PRDarch1_242222_759153586.arc
 


Action taken was  ..
=> Restarted backup client processes (bpcd -standalone & bpcompatd) on db server
=> Ensured backup master to client connectivity 
=> Verified drive availability
=> Verified if any media issue

But no luck !

Finally , Backup got successfully completed after restoring below folders in logs from Backup

cd /usr/openv/netbackup/logs/
ls -lrt
drwxrwxrwx 2 root root  4096 2011-01-13 15:28 bpbackupdrwxrwxrwx 6 root bin   4096 2011-03-06 13:59 user_opsdrwxrwxrwx 2 root root  4096 2011-10-05 12:42 brbackupdrwxrwxrwx 2 root bin   4096 2012-03-06 18:32 nbliveupdrwxr-xr-x 2 root root  4096 2013-03-26 01:45 bpbkardrwxrwxrwx 2 root root  4096 2013-03-26 04:43 backintdrwxrwxrwx 2 root root 12288 2013-03-26 04:44 bphdb 

BR0280I BRARCHIVE time stamp: 2013-03-25 20.35.39
BR0229I Calling backup utility with function 'backup'...
BR0278I Command output of 'backint -u PRD -f backup -i /oracle/PRD/saparch/.aekviotl.lst -t file -p /oracle/PRD/102_64/dbs/initPRD-logs.utl -c':
WARNING: sort_backup_type not specified in .utl file
WARNING: default sort_backup_type to <size> for backup
********************************************************************************
Program:                backint 7.1
Input File:             /oracle/PRD/saparch/.aekviotl.lst
Profile:                /oracle/PRD/102_64/dbs/initPRD-logs.utl
Function:               BACKUP
Backup Type:            BACKUP_FILE
********************************************************************************
[5715.01] 20:35:50 INF - Processing /oracle/PRD/oraarch/PRDarch1_242217_759153586.arc
20:35:51.263 [5715] <1> :  Backup of </oracle/PRD/oraarch/PRDarch1_242217_759153586.arc> is in progress.
[5715.01] 20:35:51 INF - Data buffer size = 262144
[5715.01] 20:35:53 INF - Using Media ID @AAAAD for Backup ID SAPDB_1364223945
[5715.01] 20:35:55 INF - Processing /oracle/PRD/oraarch/PRDarch1_242218_759153586.arc
20:35:55.265 [5715] <1> :  Backup of </oracle/PRD/oraarch/PRDarch1_242218_759153586.arc> is in progress.
[5715.01] 20:35:58 INF - Processing /oracle/PRD/oraarch/PRDarch1_242219_759153586.arc
20:35:59.268 [5715] <1> :  Backup of </oracle/PRD/oraarch/PRDarch1_242219_759153586.arc> is in progress.
[5715.01] 20:36:02 INF - Processing /oracle/PRD/oraarch/PRDarch1_242220_759153586.arc
20:36:02.270 [5715] <1> :  Backup of </oracle/PRD/oraarch/PRDarch1_242220_759153586.arc> is in progress.



Solution:

create below folders in /usr/openv/netbackup/logs/

bpbackup
user_ops
brbackup
nbliveup
bpbkar
backint
bphdb


Lesson Learned:

As above folders got deleted during cleanup operation of logs , cleanup activity should not delete folders & rm should be used ( not rm -rf ) using extension of file

ORA-03106: fatal two-task communication protocol error

Error:

ORA-24330: internal OCI error
ORA-03106: fatal two-task communication protocol error
ORA-03114: not connected to ORACLE


Where: Error logged on oracle 11.2.0.2 client application server. Application based on c/c++ coding. Solaris 10 SPARC

Database 11.2.0.2 compatible, Solaris 10 SPARC

Background:
ORA-24330: internal OCI error & ORA-03114: not connected to ORACLE errors caused due to session disconnection by ORA-03106: fatal two-task communication protocol error

So, if you receive any of these three errors there could be other error strings relevant to it.

First time we observed ORA-24330: internal OCI error & started hunting for resolution, but did not find anything than contact oracle support , However debugging further in logs found other two errors i.e. ORA-03114 & ORA-03106

Showing complete error logs below, where you will find the sql statement & errors

Being DBA It may take many efforts to get all these errors out of thousands of application logs.

Initially we found just OCI Error which took us in wrong direction completely until we (application team) sighted ORA-03106: fatal two-task communication protocol error

[RSP] 139100 SIVTMasterProProfList:1001 _getList_  SAVES=0 < "CCS"  "PRS"  F 1 "18512"  "":0 17000 "Database error: SQL> FETCH SelectCursor USING SQL DESCRIPTOR electBindDesc_."  <  "CCS"  "PRS"  F 1 "18512"  ""  :0 17000 "Database error: SELECT siv_t_mp_p0.sy_createddate, siv_t_mp_p0.rfu9, siv_t_mp_p0.mp_profname, siv_t_mp_p0.rfu8, siv_t_mp_p0.rfu7, siv_t_mp_p0.rfu6, siv_t_mp_p0.rfu5, siv_t_mp_p0.rfu4, siv_t_mp_p0.av_values, siv_t_mp_p0.rfu3, siv_t_mp_p0.rfu2, siv_t_mp_p0.rfu1, siv_t_mp_p0.sy_pendoperation, siv_t_mp_p0.rfu10, siv_t_mp_p0.vpn, siv_t_mp_p0.sy_modifieddate, siv_t_mp_p0.sy_committeddate, siv_t_mp_p0.domain, siv_t_mp_p0.sy_objectid, siv_t_mp_p0.metadata, siv_t_mp_p0.sy_profname, siv_t_mp_p0.sy_changeorderid, siv_t_mp_p0.sy_version, siv_t_mp_p0.sy_objectid FROM siv_t_mp_p siv_t_mp_p0 WHERE siv_t_mp_p0.sy_objectid = 'SIVU0000Fe' AND siv_t_mp_p0.sy_pendoperation != 2 AND (siv_t_mp_p0.sy_changeorderid = 'SYxf004PZK' AND siv_t_mp_p0.sy_version = 1 OR siv_t_mp_p0.sy_version = 0 AND NOT (siv_t_mp_p0.sy_objectid IN (SELECT siv_t_mp_p0.sy_objectid FROM siv_t_mp_p siv_t_mp_p0 WHERE siv_t_mp_p0.sy_changeorderid = 'SYxf004PZK' AND siv_t_mp_p0.sy_version = 1)))."  <  "CCS"  "PRS"  F 1 "18512"  ""  :0 17000 "Database error: ORA-24330: internal OCI error. (Code=-24330) (Position=0)."  <  "CCS"  "ATR"  P 1 "18512"  ""  SIVTMasterProProfList:1001 85010 "Getlist failed. [VPN Access Path Master Profile P List #1001 

[SIVTMasterProProfList:1001]]"  =TOP [->REQ] 139100 SIVTMasterProProfList:1001 _clearList_SAVES=0 <  "CCS"  "PRS"  F 1 "18512"  ""  :0 17000 "Database error: SQL> OPEN GetInfoCursor."  <  "CCS"  "PRS"  F 1 "18512"  ""  :0 17000 "Database error: ORA-03114: not connected to ORACLE. (Code=-3114) (Position=0)."  <  "CCS"  "PRS"  I 1 "18512"  ""  :0 17097 "Database server is no longer accessible, forcing NPA shutdown and restart."  =TOP [EVT->] 138999 LogMessage:1 systemadmin2 CCS PRS 18512 ":0"  0 17000 F 1 "!nouser!"  "Database error: ORA-03114: not connected to ORACLE. (Code=-3114) (Position=0)." [EVT->] 138999 LogMessage:1 systemadmin2 CCS PRS 18512 ":0"  0 17000 F 1 "!nouser!"  "Database error: SQL> OPEN GetInfoCursor."

SIVTMasterProProfList:1002 _getList_  SAVES=0 <  "CCS"  "PRS"  F 1 "29644"  ""  :0 17000 "Database error: SQL> FETCH SelectCursor USING SQL DESCRIPTOR SelectBindDesc_."  < "CCS"  "PRS"  F 1 "29644"  ""  :0 17000 "Database error: SELECT siv_t_mp_p0.sy_createddate, siv_t_mp_p0.rfu9, siv_t_mp_p0.mp_profname, siv_t_mp_p0.rfu8, siv_t_mp_p0.rfu7, siv_t_mp_p0.rfu6, siv_t_mp_p0.rfu5, siv_t_mp_p0.rfu4, siv_t_mp_p0.av_values, siv_t_mp_p0.rfu3, siv_t_mp_p0.rfu2, siv_t_mp_p0.rfu1, siv_t_mp_p0.sy_pendoperation, siv_t_mp_p0.rfu10, siv_t_mp_p0.vpn, siv_t_mp_p0.sy_modifieddate, siv_t_mp_p0.sy_committeddate, siv_t_mp_p0.domain, siv_t_mp_p0.sy_objectid, siv_t_mp_p0.metadata, siv_t_mp_p0.sy_profname, siv_t_mp_p0.sy_changeorderid, siv_t_mp_p0.sy_version, siv_t_mp_p0.sy_objectid FROM siv_t_mp_p siv_t_mp_p0 WHERE siv_t_mp_p0.sy_objectid = 'SIVU0000Fe' AND siv_t_mp_p0.sy_pendoperation != 2 AND (siv_t_mp_p0.sy_changeorderid = 'SYxf004M5G' AND siv_t_mp_p0.sy_version = 1 OR siv_t_mp_p0.sy_version = 0 AND NOT (siv_t_mp_p0.sy_objectid IN (SELECT siv_t_mp_p0.sy_objectid FROM siv_t_mp_p siv_t_mp_p0 WHERE siv_t_mp_p0.sy_changeorderid = 'SYxf004M5G' AND siv_t_mp_p0.sy_version = 1)))."  <  "CCS"  "PRS"  F 1 "29644"  ""  :0 17000 "Database error: ORA-03106: fatal two-task communication protocol error. (Code=-3106) (Position=0)."  <  "CCS"  "ATR"  P 1 "29644"  "" SIVTMasterProProfList:1002 85010 "Getlist failed. [VPN Access Path Master Profile P List #1002 [SIVTMasterProProfList:1002]]"  =TOP
Cause & Analysis:

As I explained earlier getting exact bunch of errors & the actual error out of it was the trick to get into correct direction to solve this problem 

ORA-03106: fatal two-task communication protocol was the actual error & rest are side effects of it causing disconnection of DB session 

Here, I would like to show step by step analysis that we did rather than jumping to solution directly which is at the end. In case the solution is not relevant to your problem you may get some hint or direction to resolve your problem based on analysis shown here below …

Day 1:  Application Team reported error ORA-03114 first time appearing within specific time interval 2-4 PM IST

What we did?  Of course pointed finger to network 

[EVT->] 138999 LogMessage:1 systemadmin2 CCS PRS 18512 ":0"  0 17000 F 1 "!nouser!"  "Database error: ORA-03114: not connected to ORACLE. (Code=-3114) (Position=0)."[EVT->] 138999 LogMessage:1 systemadmin2 CCS PRS 18512 ":0"  0 17000 F 1 "!nouser!"  "Databaseerror: SQL> OPEN GetInfoCursor." 

Issue went in Network and system admin’s bucket as no error was reported in alert log of DB

Day 2: Network team were ready to capture the log to identify what’s going wrong between client & server communications, but Nothing Found suspicious at Network level!

System admin also hold the line of control, they enabled something like tcpdump, packet snooping and other traces, No luck!

Day 3: Yes, it was DBA’s turn on day 3

We did below to capture what’s going wrong between 2-4 PM IST

I. Ensure AWR snaps being taken
II. Enabled 10046 trace for every incoming session from application server host
III. Continuously Monitored TOP cpu/memory/locking session, load on DB & APPS server

What we observed?
    - Load found to be going high on both DB & Apps server between 2-4 PM IST when error occurs
    - Huge COMMIT operation from application as if some batch job being executed
    - RMAN backup was running causing huge IO
    - 5 Queries with poor execution plan including a DML executing almost 50 K time , two of the SELECT queries using /*+ FIRST_ROWS_1*/ which suggests a COMMIT after every row causing high COMMIT rate
    - Log file sync , db sequential read , DB CPU were amongst top 5 wait events

What we did?
   - Highlighted these DB observation to application team for their analysis
   - As log file sync observed , when checked the redo log file it was just 50 MB * 3 redo groups, so we replaced it by 1G * 6 Redo Log group with 2 member each
   - RMAN backup scheduled changed to Night

What we got from application in reply to DB observation is New Error i.e. ORA-24330: internal OCI error along with previous ORA-03114: not connected

Day 4: ORA-24330: internal OCI error made this issue interesting for others (application, network team) as it confirms to be complete oracle issue either at oracle client or database

What we did / found?
    - Logged Sev 1 SR in My Oracle Support (MOS)
    - Relevant to OCI error ORA-24330 found below oracle notes

Effect of client_result_cache_size On Client Applications Or OCI Based Applications (Doc ID 1300727.1)
Bug 10427522 - ORA-24330 "internal OCI error" during DML (Doc ID 10427522.8)

In above oracle notes, it was suggested to disable 11g feature of client result cache by setting client_result_cache_size=0 in parameter file

If value set for client_result_cache_size is greater than 0, then during heavy DML buffer size limited to this value causing OCI error

But client_result_cache_size was already set to zero on db, still we enforced it by adding parameter client_result_cache_size=0 in pfile as db was running with pfile than spfile with a db reboot

At the end of day 4 we were completely dependent & waiting for SR updates for recommendation to get breakthrough

At the end of the day 4, one major challenge was to re-produce the scenario which application team was unable to do it using a sample code and till now exact part of the application code was not identified

MOS expecting us to create a testcase by re-producing the problem using below, but we were unable to do so
make -f demo_rdbms.mk build EXE=testcase OBJS=testcase.o

Day 5: Fortunately last day of Hunt, followed by Happy Weekend

Again error appeared at 3:46 PM IST & this time application team introduced us to new ORA error i.e.
ORA-03106: fatal two-task communication protocol error along with previous ORA-24330 & ORA-03114

It is not like the new errors appearing every day but as I mentioned in beginning, huge application logs to be mined (log created for almost every server process id at application level) took time to identify the ORA- errors which was there right from the beginning of the issues but we got it on day 5, unfortunately (or say fortunately on day 5 at least)

SIVTMasterProProfList:1002 _getList_  SAVES=0 <  "CCS"  "PRS"  F 1 "29644"  ""  :0 17000 "Database error: SQL> FETCH SelectCursor USING SQL DESCRIPTOR SelectBindDesc_."  < "CCS" "PRS"  F 1 "29644"  ""  :0 17000 "Database error: SELECT siv_t_mp_p0.sy_createddate, siv_t_mp_p0.rfu9, siv_t_mp_p0.mp_profname, siv_t_mp_p0.rfu8, siv_t_mp_p0.rfu7, siv_t_mp_p0.rfu6, siv_t_mp_p0.rfu5, siv_t_mp_p0.rfu4, siv_t_mp_p0.av_values, siv_t_mp_p0.rfu3, siv_t_mp_p0.rfu2, siv_t_mp_p0.rfu1, siv_t_mp_p0.sy_pendoperation, siv_t_mp_p0.rfu10, siv_t_mp_p0.vpn, siv_t_mp_p0.sy_modifieddate, siv_t_mp_p0.sy_committeddate, siv_t_mp_p0.domain, siv_t_mp_p0.sy_objectid, siv_t_mp_p0.metadata, siv_t_mp_p0.sy_profname, siv_t_mp_p0.sy_changeorderid, siv_t_mp_p0.sy_version, siv_t_mp_p0.sy_objectid FROM siv_t_mp_p siv_t_mp_p0 WHERE siv_t_mp_p0.sy_objectid = 'SIVU0000Fe' AND siv_t_mp_p0.sy_pendoperation != 2 AND (siv_t_mp_p0.sy_changeorderid = 'SYxf004M5G' AND siv_t_mp_p0.sy_version = 1 OR siv_t_mp_p0.sy_version = 0 AND NOT (siv_t_mp_p0.sy_objectid IN (SELECT siv_t_mp_p0.sy_objectid FROM siv_t_mp_p siv_t_mp_p0 WHERE siv_t_mp_p0.sy_changeorderid = 'SYxf004M5G' AND siv_t_mp_p0.sy_version = 1)))."  <  "CCS"  "PRS"  F 1 "29644"  ""  :0 17000 "Database error: ORA-03106: fatal two-task communication protocol error. (Code=-3106) (Position=0)."  <  "CCS"  "ATR"  P 1 "29644"  "" SIVTMasterProProfList:1002 85010 "Getlist failed. [VPN Access Path Master Profile P List #1002 [SIVTMasterProProfList:1002]]"  =TOP 

Also a hint that error is appearing while selecting rows from blank LOB datatype column value of the table, which application team were able to replicate using a sample pro*C code found on oracle site


Ansidyn1.pc is an Oracle Pro*C program that implements a dynamic SQL interpreter. Using the ANSI embedded SQL syntax, Ansidyn1.pc accepts dynamic SQL at the input command prompt (SQL>

We tried to compile this proc*C code which expects DB Username, Password & db service name as input

proc mode=ansi ansidyn1
Pro*C/C++: Release 11.2.0.2.0 - Production on Sat Mar 23 15:34:31 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
System default option values taken from: /rdbms/11gR2/db_2/precomp/admin/pcscfg.cfg
Error at line 41, column 10 in file ansidyn1.pc 
#include "tt_version.h" 
.........1
PCC-S-02015, unable to open include file

But got so many errors, so we need to rely on application team who handed over complied code file a.out
There were solutions to this compilation error, but i did not tried as got readily available compiled file

./a.out CE ***1234 CEPRD
Connecting as CE@CEPRD
Connected
You are now in a SQL intrepreter shell using ANSI Dynamic SQL.
Please enter SQL DDL and DML statements, eg create table, select, commit etc
SQL> select sraddcmds from sr_cfglet_p;
SRADDCMDS
ANSI sqlstate: 63000:
ORA-03106: fatal two-task communication protocol error
SQL> exit

In the select query used above to re-produce the problem sraddcmds was the CLOB column of table sr_cfglet_p with few empty rows

So the root cause of the problem was error being thrown for LOB empty column values
  

Now, Next question was why is it so ?
With the help of Oracle support we did get the answer i.e. db_securefile=ALWAYS parameter

ALWAYS
All LOBs created in the system are created as SecureFile LOBs. If the LOB is not created in an Automatic Segment Space Managed tablespace, then an error will occur. Any BasicFile LOB storage options are ignored. The SecureFile defaults will be used for all storage options not specified.

PERMITTED (default)
LOBs are allowed to be created as SecureFiles.

If the COMPATIBLE initialization parameter is not set to 11.1 or higher, then LOBs are not treated as SecureFiles.

So action plan was to make set  db_securefile=PERMITTED ,  but huge setback was just by setting a db parameter it was not going to resolve the problem completely as LOB tables we created with ALWAYS  options will throw same error if only db parameter is changed . So post changing db_securefile to PERMITTED it was mandatory to re-create those tables
Changing DB parameter along with tables re-creation was very much effort consuming task, as it involves major structure changes at schema as well at code level

To save these efforts a workaround was suggested to put some data or string "NULL" in all blank columns along with modifying column to NOT NULL constraint & set the DB parameter.
So that new tables will be created under default option of PERMITTED wherein LOBS are allowed to be created as secure files without restriction of ASSM

Due to some issues oracle analyst was unable to re-produce the problem in his setup & he was suspecting one parameter sql92_security=TRUE which is when set to FALSE could resolve the issue, but that did not happen

Bug has been created by MOS as below for this issue which is in progress while I am writing this article and you may get benefited with the solution/patch once made available

Bug 16536665 - ORA-3106 WHEN CLOB NULL AND SQL92_SECURITY=TRUE AND DB_SECUREFILE=ALWAYS

Just before enlisting solution, here are few snaps showing how we came to conclusion that setting parameter PERMITTED will resolve this issue

Test_1.  Tests based on setting db_securefile PERMITTED/ALWAYS at session level to re-produce the problem

When sample table created with session level parameter PERMITTED no error observed, but it does when set to ALWAYS

 At DB:


At Apps: test_lob_2  is created when session level  ALWAYS value was set for db_securefile parameter
  

Test_2.  To test the workaround that blank LOB column row removal or replacing with some value could resolve the issue 

 At DB:


At App:  Yes, it worked. But need to move such columns to NOT NULL for future
  

Solution:

1. Set db_securefile to PERMITTED & re-create all LOB data type tables (Workaround)
2. Replace blank row column values of the table by some value or NULL flag, and enable NOT NULL constraint for those columns

3. Apply Patch 

<Patch 14727315> DATABASE PATCH SET UPDATE 11.2.0.2.9 (INCLUDES CPUJAN2013)
since it includes accoding to
11.2.0.2 Patch Set Updates - List of Fixes in each PSU (Doc ID 1340011.1)
the fix for
Bug 11723722ORA-3106 READNG SECUREFILES BY OCI SQLTYPE SQLT_LBI AND SQLT_CHR


Lesson Learned:

As the parameter db_securefile default value is PERMITTED , which got changed to ALWAYS as a part if database hardening as per CSI benchmark 11g


Need to have proper testing & review from application after DB hardening

My Popular Posts