Tuesday, January 29, 2013

ORA-02049: timeout: distributed transaction waiting for lock



Issue:  ORA-02049: timeout: distributed transaction waiting for lock

Background:
Application has reported the error ORA-02049 during transaction workflow involving DB link to Oracle9i two nodes RAC. Just before this error starts appearing the new code was deployed on production database at source which is 11g two node RAC.

Analysis:
In first sight it looked like an application code issue without proper commit; within the transactions/DML’s.

ORA-02049 appears generally in RAC environments where more the one node is available to process the transaction. Multiple nodes executing distributed transaction are unable to acquire or acknowledge the lock on same resource across the instance till time limit specified by init parameter distributed_lock_timeout (default 60 seconds) has reached.

After 60 seconds this situation is treated similar as deadlock where distributed transaction involving sessions get killed with acknowledged message as “ORA-02049: timeout: distributed transaction waiting for lock” back to client

Solution:
We have made below attempts to resolve the issue

    1.   Identifying in-doubt  transaction
Below queries can be executed on remote database to which database link is pointing to identify if any in doubt transactions waiting for manual rollback or commit;
Unfortunately, in our case no in doubt transactions were found.

column local_tran_id new_value TRANS_ID
select local_tran_id from dba_2pc_pending;

select 'rollback force '''||local_tran_id||''';' from dba_2pc_pending;
select 'exec dbms_transaction.purge_lost_db_entry('''||local_tran_id||''');' from dba_2pc_pending;
select 'commit force '''||local_tran_id||''';' from dba_2pc_pending;

e.g.
rollback force '13.87.427441';
exec dbms_transaction.purge_lost_db_entry('13.87.427441');

    2.  OPEN_LINKS parameter

open_links init parameter which restricts the number of concurrent db link executions per session to 4 (default) which could be the possible cause of this error and same has been increased to 20
 
 alter system set open_links=20 scope=spfile;
 alter system set open_links_per_instance=20 scope=spfile;



    3.  Wait event of the running queries

Wait event of the running queries identified using below but it was not much useful as error being observed intermittently & it was nowhere logged than application error logs 

col sql_text for a80
set pages 200
set line 900
col PROGRAM for a20
col MACHINE for a20
col CPU 9999999999
alter session set nls_date_format='dd-mon-yyyy hh24:mi:ss';
col event for a40
select *
  from (select /*+ rule */          sa.sql_id,         sa.CPU_TIME "CPU",         s.sid "SID",         s.serial# "SERIAL",         s.program "PROGRAM",         s.machine "MACHINE",         sa.SQL_TEXT "SQL_TEXT",
         vp.spid,         sw.event,         s.logon_time,         s.last_call_et / 60,         s.username,         s.status
          from v$sqlarea sa, v$session s, v$process vp, v$session_wait sw
         where sa.address = s.sql_address
           and sw.sid = s.sid
           and s.paddr = vp.addr
         order by CPU_TIME desc)
 where rownum < 10;

    4. TOP 5 event in AWR
enq: TX row lock contention in TOP event for the problematic time span has given hint of locking issues.


Analyzing AWR further , UPDATE sql statement in TOP DB time consuming section getting locked due to slow responding 3rd select query  (as per  below snap) 


Query:
  

Execution Plan before Indexing



create index I_ T2114_009 on T2114(C800000002);


Execution Plan after Indexing
 


Indexing has reduced the cost/response of the SELECT query which in turn resolved the locking/timeout during UPDATED statement

enq: TX has disappeared from TOP 5 event of the next AWR report & no further “ORA-02049: timeout: distributed transaction waiting for lock” error being reported by application And Issue marked to be solved



Other way to handle the enq: TX – row lock contention is to identify the foreign keys whose reference columns does not hold any index.

col column_name for a40
SELECT * FROM (
SELECT c.owner,c.table_name, cc.column_name, cc.position column_position
FROM DBA_constraints c, DBA_cons_columns cc
WHERE c.constraint_name = cc.constraint_name
AND c.constraint_type = 'R'
and c.owner not in ('SYS','SYSMAN','SYSTEM') and c.owner='&Schema_name'
MINUS
SELECT i.owner,i.table_name, ic.column_name, ic.column_position
FROM DBA_indexes i, DBA_ind_columns ic
WHERE i.index_name = ic.index_name
)
ORDER BY table_name, column_position

Any Column found in above query should be analyzed for possible contention/locking & appropriate index should be created.