AIX paging issue



IssueHigh paging on AIX oracle DB server

Background/Observation:

                    i.  AIX server with version 6.1 TL 7 (6100-07-06-1241) running standalone Oracle DB 10.2.0.5 

                   ii.  Huge paging memory utilization found to be caused by db processes mainly

#lsps -s

Total Paging Space   Percent Used

      30720MB               40%



#ps -ealf | head -1 ; ps -ealf | sort -rn +9 | head

       F S      UID      PID     PPID   C PRI NI ADDR    SZ    WCHAN    STIME    TTY  TIME CMD

  240001 A ora10g  3736258    1   0  60 20 f60b76590 152524      Jul 31      -  4:40 ora_dbw0_ORADB

  240001 A ora10g  2883690    1   0  60 20 5c0bdc590 143288     Jul 31      -  4:29 ora_dbw1_ORADB

  240001 A ora10g  1835488    1   0  60 20 ad0c2d590 125636     Jul 31      -  8:33 ora_lgwr_ORADB

  240001 A ora10g 13041936   1   0  60 20 d95259590 118304      Jul 31      - 203:28 oracleORADB (LOCAL=NO)

  240001 A ora10g  5046804    1   0  60 20 4321c3590 118292      Jul 31      - 176:28 oracleORADB (LOCAL=NO)

  240001 A ora10g  9044626    1   5  62 20 1e5c9e590 118220      Jul 31      - 160:27 oracleORADB (LOCAL=NO)

  240001 A ora10g  2228980    1   0  60 20 5513d5590 117520      Jul 31      - 176:38 oracleORADB (LOCAL=NO)

  240001 A ora10g 11534512   1   0  60 20 5e7cde590 109812      Jul 31      - 185:14 oracleORADB (LOCAL=NO)

  240001 A ora10g  6619604   1   0  60 20 6e0bee590 109796      Jul 31      -  2:12 ora_arc1_ORADB

  240001 A ora10g  3801228   1   0  60 20 af0a2f590 109796        Jul 31      -  2:17 ora_arc0_ORADB



iii. Paging found to be getting reduced by few % on killing of old foreground oracle connection processes (LOCAL=NO)

iv. On clean DB restart paging memory utilization getting wiped out till 4%

v. Post DB restart with time lag of 1 week max. paging builds up again gradually

vi. Post 50% of paging utilization application starts facing slowness

Analysis:

This phase will help to understand aix paging & virtual memory manager concepts which are backbone for fixing the paging issue. Later in this section optimal aix memory relevant kernal parameter setting are explained

Understanding AIX paging & caching

Swap/paging allocation is controlled by the OS. Oracle is only requesting virtual memory to the OS which gives whatever it can either from real memory or from virtual/swaps pages.

AIX uses caching method for OS memory allocation wherein once real free memory is allocated it is being cached for re-use by other processes than releasing back as free real memory. Release of memory pages to real memory happens only on clean start/stop of database or server. Hence any graph or command would show real free memory as 1-2 % but actual available memory for processes could be more and calculated as below,

Actual Physical RAM in server:

#prtconf  | egrep -i "Good Memory"

Good Memory Size: 57344 MB


Vmstat command to get current memory utilization snap:

#vmstat 1 1

System configuration: lcpu=8 mem=57344MB

kthr    memory            page              faults        cpu

-----    -----------      ------------------------ ------------ -----------

 r  b   avm          fre         re  pi  po  fr   sr  cy  in   sy  cs us sy id wa

 4  2 12629663 57902   0   1   0 12181 12992   0 2597 127129 6356 81  9  7  3


Here,
 fre shows real free memory 57902   , 57902/256 (page size 64*4) = 226.1 MB (As said earlier it is just 1% free which will be reflected in graphs giving misinterpretation unless used NMON)
avm shows average memory used is  12629663 , 12629663/256 = 49334 MB = 48 GB

So, here actual free memory including cache is 57344 (RAM) – 49334 (avm) = 8010 MB = 7.8 GB

Most of the monitoring tools are unable to capture exact free memory on AIX os due to caching method which end up showing 100% memory utilized adding confusion

AIX Virtual Memory Manager (VMM)

The default values for the AIX VMM are generally not appropriate for use with databases. The default values for the AIX VMM will gradually allow up to 80% of physical memory to be used to buffer file I/O. Since Oracle is already buffering file I/O in the SGA, the same data is unnecessarily being buffered twice, and leaves only 20% of physical memory to run the Oracle database(s) and all other programs. This causes the majority of the Oracle database to be pushed out of physical memory to paging space; thus greatly impacting database performance.

To check whether your system is using the untuned default values for the AIX VMM, run the command:
/usr/sbin/vmo –a

/usr/sbin/vmo –a -F

If you do not have the /usr/sbin/vmo file, you will need to have your AIX systems administrator load the AIX fileset "bos.perf.tune".

The vmo command will list out all of the VMM parameters and their current values. The parameters you want to examine are the following:
MINPERM%, MAXPERM%, and MAXCLIENT%

Here is an example of the vmo report:

#sudo vmo -a -F | egrep -i "PERM|MAXCLIENT"

                   maxperm = 12805458

                       minperm = 711413

                    minperm% = 5

                   maxclient% = 90

                   maxperm% = 90

            numperm_global = 0

             strict_maxclient = 1

              strict_maxperm = 0



The untuned default settings are MINPERM%=20%, MAXPERM%=80%, and MAXCLIENT%=80%. There is no "correct" value for these parameters and only extensive testing will reveal the optimal values. The optimal value may be different for different databases on the same system, so keep this in mind when tuning the VMM and choose values which work best for all of the databases you will be running on the system & do take your AIX sys admin help.

Use the following values as a starting point:
MINPERM% = 10-15%, MAXPERM% = 20-30%, MAXCLIENT% = MAXPERM%

To get a snapshot of how much physical memory is being used by AIX to buffer file I/O, run the command:
/usr/bin/svmon -G

The svmon command is part of the same AIX fileset "bos.perf.tune" that vmo belongs to. The last line of the svmon output should be "in use". Add the values for "in use / pers" and "in use / clnt". Now divide the sum by the value for "memory / size". For best database performance, this value should generally not be higher than 30% (0.30).
Here is an example of the svmon output:

#sudo svmon -G

                    size         inuse        free         pin     virtual

memory     131072      129432     1640 11704 50091
pg space    7864320    278498


           work        pers        clnt       other

pin          11704            0             0            0
in use     47062      76126       6244            0

In this example, (in use / pers) 76126 plus (in use / clnt) 6244 equals 82370.
82370 divided by (memory / size) 131072 equals 0.628 or approximately 63% of physical memory being used by AIX to buffer file I/O. This indicates the AIX VMM needs to be tuned to allow more physical memory to be used by Oracle and other processes and less physical memory to be used to buffer file I/O.

Remember that although AIX associates this memory with the Oracle processes (because Oracle requested the file I/O), all of the memory used to buffer file I/O is completely allocated and controlled by AIX, not Oracle. If you need help checking, setting, or tuning the AIX VMM, contact your AIX systems administrator and/or IBM Support.

IBM has introduced a new VMM parameter which is also very helpful with this issue. The parameter is...
lru_file_repage

#sudo vmo -a -F  | grep -i lru_file
         lru_file_repage = 1

The default value is "1", but it is recommended to set this to "0".This setting hints to the VMM to only steal file pages (from the AIX file buffer cache) and leave the computational pages (from the SGA) alone.

This new lru_file_repage parameter is only available on AIX 5.2 ML04 or higher and AIX 5.3 ML01 or higher.

NOTE: If you are using an older AIX system which does not support the lru_file_repage parameter, then you must use the "legacy" settings above. If you are using a newer AIX system which does support the lru_file_repage parameter, then you should use the revised settings below instead.

As documented in the IBM "VMM Tuning Tip" referenced below, if you are running on AIX 5.2 ML04 or higher, or AIX 5.3 ML01 or higher, IBM is now recommending the following VMM settings for use with programs which need to protect computational memory (like Oracle):

strict_maxperm=0 (default)
strict_maxclient=1 (default)
lru_file_repage=0
maxperm%=90
minperm%=5 (physical RAM <32 GB)
minperm%=10 (physical RAM >32 GB but <64 GB)
minperm%=20 (physical RAM >64 GB)
v_pinshm=1
maxpin%=percent_of_real_memory
(IBM does not recommend decreasing the maxpin% value)

Where "percent_of_real_memory" = ( (size of SGA / size of physical memory) *100) + 3
and
Set Oracle database parameter LOCK_SGA to TRUE in the pfile/spfile
and
Sum of all SGAs on the system do not exceed approximately 60% of physical memory


References:

Excessive Paging on IBM AIX POWER Systems (64-bit) After OS Upgrade (Doc ID 1503072.1)
AIX: Database performance gets slower the longer the database is running (Doc ID 316533.1)

Solution:

1.       APAR IV26272 and IV27014

Apply below APAR to AIX OS, IV26272 for AIX 6 & IV27014 for AIX 7


    This seems to be a known issue to AIX (IV26272 and IV27014):

  

Kernal parameter Settings:

2.       numperm_global should be set to 0

                  #sudo vmo -a -F | grep numperm_global 
                         numperm_global = 0

3.       lru_file_repage should be set to 0

                  #sudo vmo -a -F  | grep -i lru 
                       lru_file_repage = 0

4.       minperm% setting as per physical memory capacity as below

minperm%=5  (physical RAM <32 GB)
minperm%=10 (physical RAM >32 GB but <64 GB)
minperm%=20 (physical RAM >64 GB)

                  #sudo vmo -a | grep minperm
          minperm% = 5

5.       maxperm% & maxclient% should be 90

                  #sudo vmo -a -F | grep maxperm
maxperm% = 90
                        
                  #sudo vmo -a -F | grep maxclient
maxclient% = 90

Solution 1, 2 & 3 are most effective.

No comments:

Post a Comment