DBWR and the Shared Pool – Part II

Buffer cache from SGA holds data blocks that a server process reads from disk storage. This is well known. Access to these data blocks in the memory is done via buffer headers that store data block metadata, such as touch count, dirty flag and etc. For example, when a server process needs to access a particular data block in memory the DBA is hashed and bucket where that buffer header would be is returned. Then the latch: cache buffers chains latch that protects this bucket from simultaneous access is acquired. Now, this latch is set not on the buffers themselves but their headers, on the array of the headers, to be more precise. The latter contains pointers to actual data buffers. So, once you land on a header in the chain, you follow the pointer to jump on the actual buffer frame.

DBW also eventually accesses buffer header array, for example, during a checkpoint: it needs to know which buffers are dirty and which are not. Array of buffer headers is externalized in x$bh which comes in handy when troubleshooting, for example, latch contention. It contains lots of other useful information which actually tells you that buffer header array is implemented as a doubly linked list (NXT_HASH and PRV_HASH columns).
According to some sources this structure is stored in the buffer cache itself. Other sources state that it lies in the shared pool.

Now, back to prior post named DWB and the shared pool where I wondered why the DBW should access the shared pool. So, if the buffer header array is indeed stored in the shared pool and not the buffer cache then it makes that clear. Remains to prove it, if so.

In Part III, I’ll come up with quick demos to show whether the buffer header chain is indeed in the shared pool or not. Stay tuned…

On ORDER BY clause removal by the CBO

CBO is smart enough. It kicks in whenever you issue a query and can apply lots of tricks to simplify query processing. So, another thing it can do is elimination of the ORDER BY clause from a query. This can be useful to know if you expect sorting in the plan which does not show up and you have to know the “why”. For example, consider the following query:

SELECT * FROM TESTTABLE1 S WHERE ID = 6 ORDER BY ID;

Looks like the plan should have a SORT ORDER BY operator which is not the case:

Plan hash value: 3427422027

————————————————————————————————————————————————
| Id  | Operation                              | Name             | Rows | Bytes | Cost (%CPU)  | Time      |
————————————————————————————————————————————————
| 0   | SELECT STATEMENT                       |                         | 2501 | 1292K| 122           (0)    | 00:00:01 |
|* 1 |     TABLE ACCESS FULL                   | TESTTABLE1 | 2501 | 1292K| 122           (0)    | 00:00:01 |
————————————————————————————————————————————————

Predicate Information (identified by operation id):
—————————————————

Let’s see what happens behind the scene via the CBO trace:

OBYE: Order by element marked for removal from query block SEL$1 (#0) equality filter pred
“ID”
OBYE: Removed ORDER BY from query block SEL$1 (#0)
Registered qb: SEL$9FB2EC53 0xbf94df8 (ORDER BY REMOVED FROM QUERY BLOCK SEL$1; SEL$1)

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT [not showing columns here] FROM “TESTTABLE1” WHERE “TESTTABLE1”.”ID”=6
kkoqbc: optimizing query block SEL$9FB2EC53 (#0)

So, the CBO sees an equality predicate and decides to remove the ORDER BY clause from it. This makes sense because no matter what the cardinality is – all values going to be the same – no need to do any sorting, therefore. It works in 12.1 and later; didn’t check it for earlier versions. So, no surprise why we expect sorting but don’t observe it in the final plan.

This can be controlled via “_optimizer_order_by_elimination_enabled” undocumented parameter. Plus, two hints are also there for switching it on and off: /*+ ELIMINATE_OBY */  and /*+ NO_ELIMINATE_OBY */.

INDEX FULL SCAN and Sorting Avoidance

Cases when sorting is avoided are quite known. Let’s take a look at yet another one when the optimizer ends up with a Sort Merge Join join method to execute a query. The query is a simple one and serves demonstrations purposes only:

select * from student s, takes t where s.id = t.id

with the following final plan:

The plan tells us that Sort  Merge Join is used to join the two tables.
The table at the bottom of the page is an exert from from the optimizer trace file to highlight some points (line numbers have been added).

Line 193: only 2 permutations are possible: STUDENT is outer table and TAKES is inner and visa versa.
To remind, number of permutations can be controlled by the _optimizer_max_permutations undocumented parameter which defaults to 2000.
Line 1: the optimizer starts estimating plan costs for the case when STUDENT is outer table and TAKES is inner.
Line 6: Nested Loop join is estimated and cost is 45 (line 20) when inner table TAKES is accessed by Full Scan (line 18) and is 30 (line 32) when TAKES is accessed via Range Scan by the index SYS_C0014106. Out of these two ways, the latter is a better one with less cost (line 35).
Line 63: The optimizer estimates cost for Sort Merge Join which is 12.002240. This is a case when both tables get sorted: STUDENT at line 45 and TAKES at line 56. For example, STUDENT table has only one block, so just one block needs to be sorted, line 49. There is just one run to sort and this is one-pass sort – no subsequent passes, hence # of merge passes is 0, line 50.
The optimizer picks up the plan with SMJ with cost 8.001695, lines 98 and 99 and this is where things get interesting (on line 104 the tables are swapped and other plans are considered and etc.).
Cost estimation for this plan kicks in on line 65SM Join (with index on outer).
The outer table STUDENT is accessed via index SYS_C0015206 and by INDEX FULL SCAN access path, line 68. Cost of this access is: 2.000510, line 72 (resc_cpu is CPU cost of access and ix_sel: 1.000000 is index selectivity: fraction of rows that qualify). What we observe is there is no sorting associated with this table here. Only inner table TAKES gets sorted, on line 81.
In case of INDEX FULL SCAN the database first lands on the leftmost leaf node of the index tree and then scans the index in the other direction i.e. scans the data in the order it is held in the index. Data in the index is sorted already. SYS_C0015206 is a PK on the ID column of the STUDENT table. So, it works as follows: get every entry from the index, get the rowid and fetch the table row by the rowid and the resultant data will be sorted by the ID column because from the index we get the data in a sorted manner. Hence, no need to invoke sorting on the table itself to prepare it for the Sort Merge Join: is actually what we observe from the plan. Here are the steps the database takes to execute it according to that plan (numbers are steps of the optimizer).
5 and 4:    SORT JOIN
                    TABLE ACCESS FULL
Here, the table TAKES is scanned fully and then sorted.

3 and 2:    TABLE ACCESS BY INDEX ROWID (note that no sorting is here)
                    INDEX FULL SCAN

                Here, the table STUDENT is accessed via index.

1:             MERGE JOIN
                Finally, invoking Sort Merge Join to join the two tables.

So, we have seen yet another case when the optimizer may skip sorting due an index on the join column.

1Join order[1]: STUDENT[S]#0 TAKES[T]#1
2
3***************
4Now joining: TAKES[T]#1
5***************
6NL Join
7Outer table: Card: 13.000000 Cost: 5.000998 Resp: 5.000998 Degree: 1 Bytes:
8Access path analysis for TAKES
9Scan IO Cost (Disk) = 3.076923
10Scan CPU Cost (Disk) = 41107.200000
11Total Scan IO Cost = 3.076923 (scan (Disk))
12+ 0.000000 (io filter eval) (= 0.000000 (per row) * 22.000000 (#rows))
133.076923
14Total Scan CPU Cost = 41107.200000 (scan (Disk))
15+ 1100.000000 (cpu filter eval) (= 50.000000 (per row) * 22.000000 (#rows))
1642207.2
17Inner table: TAKES Alias: T
18Access Path: TableScan
19NL Join: Cost: 45.015282 Resp: 45.015282 Degree: 1
20Cost_io: 45.000000 Cost_cpu: 587031
21Resp_io: 45.000000 Resp_cpu: 587031
22****** Costing Index SYS_C0014106
23SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
24
25SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
26
27Access Path: index (RangeScan)
28Index: SYS_C0014106
29resc_io: 2.000000 resc_cpu: 15143
30ix_sel: 0.083333 ix_sel_with_filters: 0.083333
31NL Join : Cost: 30.005924 Resp: 30.005924 Degree: 1
32Cost_io: 30.000000 Cost_cpu: 227573
33Resp_io: 30.000000 Resp_cpu: 227573
34
35Best NL cost: 30.005924
36resc: 30.005924 resc_io: 30.000000 resc_cpu: 227573
37resp: 30.005924 resp_io: 30.000000 resc_cpu: 227573
38SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
39Join Card: 22.000000 = outer (13.000000) * inner (22.000000) * sel (0.076923)
40Join Card – Rounded: 22 Computed: 22.000000
41Outer table: STUDENT Alias: S
42resc: 5.000998 card 13.000000 bytes: deg: 1 resp: 5.000998
43Inner table: TAKES Alias: T
44resc: 5.001070 card: 22.000000 bytes: deg: 1 resp: 5.001070
45using dmeth: 2 #groups: 1
46SORT ressource Sort statistics
47Sort width: 118 Area size: 131072 Max Area size: 20971520
48Degree: 1
49Blocks to Sort: 1 Row size: 38 Total Rows: 13
50Initial runs: 1 Merge passes: 0 IO Cost / pass: 0
51Total IO sort cost: 0.000000 Total CPU sort cost: 38415391
52Total Temp space used: 0
53SORT resource Sort statistics
54Sort width: 118 Area size: 131072 Max Area size: 20971520
55Degree: 1
56Blocks to Sort: 1 Row size: 40 Total Rows: 22
57Initial runs: 1 Merge passes: 0 IO Cost / pass: 0
58Total IO sort cost: 0.000000 Total CPU sort cost: 38417643
59Total Temp space used: 0
60SM join: Resc: 12.002240 Resp: 12.002240 [multiMatchCost=0.000000]
61SM Join
62SM cost: 12.002240
63resc: 12.002240 resc_io: 10.000000 resc_cpu: 76912478
64resp: 12.002240 resp_io: 10.000000 resp_cpu: 76912478
65SM Join (with index on outer)
66****** Costing Index SYS_C0015206
67SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
68Access Path: index (FullScan)
69Index: SYS_C0015206
70resc_io: 2.000000 resc_cpu: 19573
71ix_sel: 1.000000 ix_sel_with_filters: 1.000000
72Cost: 2.000510 Resp: 2.000510 Degree: 1
73Outer table: STUDENT Alias: S
74resc: 2.000510 card 13.000000 bytes: deg: 1 resp: 2.000510
75Inner table: TAKES Alias: T
76resc: 5.001070 card: 22.000000 bytes: deg: 1 resp: 5.001070
77using dmeth: 2 #groups: 1
78SORT ressource Sort statistics
79Sort width: 118 Area size: 131072 Max Area size: 20971520
80Degree: 1
81Blocks to Sort: 1 Row size: 40 Total Rows: 22
82Initial runs: 1 Merge passes: 0 IO Cost / pass: 0
83Total IO sort cost: 0.000000 Total CPU sort cost: 38417643
84Total Temp space used: 0
85SM join: Resc: 8.001695 Resp: 8.001695 [multiMatchCost=0.000000]
86Outer table: STUDENT Alias: S
87resc: 5.000998 card 13.000000 bytes: deg: 1 resp: 5.000998
88Inner table: TAKES Alias: T
89resc: 5.001070 card: 22.000000 bytes: deg: 1 resp: 5.001070
90using dmeth: 2 #groups: 1
91Cost per ptn: 0.015728 #ptns: 1
92hash_area: 124 (max=5120) buildfrag: 1 probefrag: 1 ppasses: 1
93Hash join: Resc: 10.017796 Resp: 10.017796 [multiMatchCost=0.000000]
94HA Join
95HA cost: 10.017796
96resc: 10.017796 resc_io: 10.000000 resc_cpu: 683594
97resp: 10.017796 resp_io: 10.000000 resp_cpu: 683594
98Best:: JoinMethod: SortMerge
99Cost: 8.001695 Degree: 1 Resp: 8.001695 Card: 22.000000 Bytes:
100***********************
101Best so far: Table#: 0 cost: 2.000510 card: 13.000000 bytes: 325.000000
102Table#: 1 cost: 8.001695 card: 22.000000 bytes: 1144.000000
103***********************
104Join order[2]: TAKES[T]#1 STUDENT[S]#0
105
106***************
107Now joining: STUDENT[S]#0
108***************
109NL Join
110Outer table: Card: 22.000000 Cost: 5.001070 Resp: 5.001070 Degree: 1 Bytes:
111Access path analysis for STUDENT
112Scan IO Cost (Disk) = 3.000000
113Scan CPU Cost (Disk) = 38337.200000
114Total Scan IO Cost = 3.000000 (scan (Disk))
115+ 0.000000 (io filter eval) (= 0.000000 (per row) * 13.000000 (#rows))
1163
117Total Scan CPU Cost = 38337.200000 (scan (Disk))
118+ 650.000000 (cpu filter eval) (= 50.000000 (per row) * 13.000000 (#rows))
11938987.2
120Inner table: STUDENT Alias: S
121Access Path: TableScan
122NL Join: Cost: 71.023399 Resp: 71.023399 Degree: 1
123Cost_io: 71.000000 Cost_cpu: 898826
124Resp_io: 71.000000 Resp_cpu: 898826
125****** Costing Index SYS_C0015206
126SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
127SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
128Access Path: index (UniqueScan)
129Index: SYS_C0015206
130resc_io: 1.000000 resc_cpu: 8381
131ix_sel: 0.076923 ix_sel_with_filters: 0.076923
132NL Join : Cost: 27.005870 Resp: 27.005870 Degree: 1
133Cost_io: 27.000000 Cost_cpu: 225499
134Resp_io: 27.000000 Resp_cpu: 225499
135****** Costing Index SYS_C0015206
136SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
137SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
138Access Path: index (AllEqUnique)
139Index: SYS_C0015206
140resc_io: 1.000000 resc_cpu: 8381
141ix_sel: 0.076923 ix_sel_with_filters: 0.076923
142NL Join : Cost: 27.005870 Resp: 27.005870 Degree: 1
143Cost_io: 27.000000 Cost_cpu: 225499
144Resp_io: 27.000000 Resp_cpu: 225499
145
146Best NL cost: 27.005870
147resc: 27.005870 resc_io: 27.000000 resc_cpu: 225499
148resp: 27.005870 resp_io: 27.000000 resc_cpu: 225499
149SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
150Join Card: 22.000000 = outer (22.000000) * inner (13.000000) * sel (0.076923)
151Join Card – Rounded: 22 Computed: 22.000000
152Outer table: TAKES Alias: T
153resc: 5.001070 card 22.000000 bytes: deg: 1 resp: 5.001070
154Inner table: STUDENT Alias: S
155resc: 5.000998 card: 13.000000 bytes: deg: 1 resp: 5.000998
156using dmeth: 2 #groups: 1
157SORT ressource Sort statistics
158Sort width: 118 Area size: 131072 Max Area size: 20971520
159Degree: 1
160Blocks to Sort: 1 Row size: 40 Total Rows: 22
161Initial runs: 1 Merge passes: 0 IO Cost / pass: 0
162Total IO sort cost: 0.000000 Total CPU sort cost: 38417643
163Total Temp space used: 0
164SORT ressource Sort statistics
165Sort width: 118 Area size: 131072 Max Area size: 20971520
166Degree: 1
167Blocks to Sort: 1 Row size: 38 Total Rows: 13
168Initial runs: 1 Merge passes: 0 IO Cost / pass: 0
169Total IO sort cost: 0.000000 Total CPU sort cost: 38415391
170Total Temp space used: 0
171SM join: Resc: 12.002240 Resp: 12.002240 [multiMatchCost=0.000000]
172SM Join
173SM cost: 12.002240
174resc: 12.002240 resc_io: 10.000000 resc_cpu: 76912478
175resp: 12.002240 resp_io: 10.000000 resp_cpu: 76912478
176Outer table: TAKES Alias: T
177resc: 5.001070 card 22.000000 bytes: deg: 1 resp: 5.001070
178Inner table: STUDENT Alias: S
179resc: 5.000998 card: 13.000000 bytes: deg: 1 resp: 5.000998
180using dmeth: 2 #groups: 1
181Cost per ptn: 0.015739 #ptns: 1
182hash_area: 124 (max=5120) buildfrag: 1 probefrag: 1 ppasses: 1
183Hash join: Resc: 10.017831 Resp: 10.017831 [multiMatchCost=0.000023]
184HA Join
185HA cost: 10.017831
186resc: 10.017831 resc_io: 10.000000 resc_cpu: 684944
187resp: 10.017831 resp_io: 10.000000 resp_cpu: 684944
188Join order aborted: cost > best plan cost
189***********************
190(newjo-stop-1) k:0, spcnt:0, perm:2, maxperm:2000
191
192*********************************
193Number of join permutations tried: 2
194*********************************

DBWR and the Shared Pool

Recently, a colleague of mine, over another discussion on performance tuning in Oracle databases mentioned that he came across an interesting note from Oracle’s Data Concurrency and Consistency Concepts guide. And here is how it reads:
“Typically, a single latch protects multiple objects in the SGA. For example, background processes such as DBW and LGWR allocate memory from the shared pool to create data structures. To allocate this memory, these processes use a shared pool latch that serializes access to prevent two processes from trying to inspect or modify the shared pool simultaneously.“. Interesting, isn’t it?

The highlighted sentence is what made me curious about it. I have some ideas (which for now may be wrong) why the DBW or LGWR may want to access the shared pool. Perhaps, some wait events are associated with it too. Need to check that as well. In the next post, I’ll try to reproduce it and come up with symptoms and solutions. I believe, it is going to be a valuable troubleshooting scenario. To be continued…

Preempted Mutex Holders

While browsing through Oracle docs, an interesting troubleshooting case hit me that happened some time ago. So, I’ll briefly describe the scenario here. The case was related to mutexes and OS intervention in this.

A 24×7 11g R1 database was running on HP-UX. Eventually, there were high spikes related to Concurrency class, the top event was library cache: mutex X.

Recall, that this event is posted when a session is trying to acquire the mutex which protects a bucket in library cache in exclusive mode. The session cannot get the mutex because it is already being held in either shared or exclusive mode. Drill down of typical reasons of this wait event, such as, high hard parse rates, didn’t reveal anything. High spikes were caused by other sessions trying to repeatedly acquire the mutex. Spinning burnt CPU. There didn’t seem to be any reason for holding a mutex for long periods of time on database level.

Oracle support provides a script that can be used to find out which session is holding a mutex. The script was used to identify the holder but the session should not have kept the mutex for long time.

The next step was to suspect OS in this. On HP-UX one can use glance performance monitoring tool to check resource utilization of processes. A typical output looks like follows (the output is not related to the problem) :

The column Pri shows process priority. Whenever, a process is created it is assigned priority. The priority decides how much CPU time a process can consume. When a process is eating up too much of CPU, then the OS can lower its priority to provide fairness to other processes in the system so that they don’t starve. This is up to the OS scheduler to decide.

What caught eyes in that output then, was the fact that the priority of the process holding the mutex was very low compared to other processes. Because of resource shortage on the machine, the scheduler lowered the priority of some processes and one of these processes eventually was the one holding the mutex. This is known as priority inversion and has been dealt with in academia by quite a number of papers. In other words, the session could have released the mutex but couldn’t do it because it couldn’t jump back on CPU.

Oracle provides parameter HPUX_SCHED_NOAGE for setting priority of oracle processes with the default value of  178.  This parameter was adjusted and the problem was gone. Further observations confirmed that oracle processes were assigned higher priorities which made the mutex holder to release it once it was done with it.

Latching and Result Cache – A Quick Troubleshooting Case

One of routine processes in an Oracle 11.2 database in one of the banks used to run just fine (~20 minutes) until a restart of the machine. From then on, the same process completed in around 4 hours. Symptom was a typical one, related to latches and workaround was straightforward, though a little tricky.  Yet another thing is that I didn’t save exact figures from that problem so below numbers are made up just from memory.

So, the first step was to see what wait events prevailed at that period. Quite a number of latch free wait events were at the top:

 ComponentWaitsTime(s)Avg wait (ms)% DB timeWait Class
latch free235,59824,906106309.52Other
DB CPU 1,039 12.91 
db file sequential read55,74441875.19User I/O
control file sequential read160,88116612.07System I/O
log file sync1141130.02Commit

Recall, when Oracle posts latch free wait event. Once a latch acquisition request is made and latch is not free, then the process spins for some number of times ( 10 _spin_count for exclusive latches) to get the latch. If yet not successful, then the processes sleeps until the lock holder notifies it upon latch release. This applies to all latches of class 0, by default. This behavior can also be changed by modifying _enable_reliable_latch_waits undocumented parameter. When set to false, the process will wake up periodically to spin to check if the latch is free.

So, when spinning is exhausted latch free wait event is published meaning that latch couldn’t be acquired in the intended period. The next was to find out what latch prevailed. Unfortunately, I don’t have AWR report data for this problem exactly, but the numbers were much worse than the following:

Latch NameGet RequestsMissesSleepsSpin Gets
Result Cache: RC Latch1,862,968288,200273,53061,310
space background task latch7,2522,098722,035
qmn task queue latch1,0101368128
cache buffers chains4,119,11342633
managed standby latch96615
call allocation3,344211
resmgr:active threads3,572110

From here it became apparent that the cause is related to result caching. When this feature is on, result of an SQL query is cached to avoid data retrieval for a similar query. The memory region allocated for the result cache of a query is protected by a latch. Further examination revealed that none of the queries in the system uses this feature via hints; nor this was on at the database level (RESULT_CACHE_MODE = MANUAL). Lots of queries were having RESULT_CACHE in their execution plans. But it shouldn’t be there as per Oracle docs. So, the feature was not supposed to be used for any query in the database. I had to turn off something that is already off, at least to have latches gone. From the other hand, RESULT_CACHE_MAX_SIZE parameter value was non-zero, around 40 MB. So, I’ve decided to zero out this value to disable this. This worked. Latches were gone. The process returned to its past behavior and now it runs for 20 minutes again.

This was not the cause resolution though but rather a workaround. The question that remains is why it was running fine before restart? Also, assuming that result cache should be there as per application requirement, the interesting research would be to find out why it caused that much latching. It could be related to exclusive latches, when underlying queries are run with different variable values, for example. As for the first question, according to Oracle docs, result_cache_max_size is calculated from the values of SHARED_POOL_SIZE, SGA_TARGET and MEMORY_TARGET depending on whether ASMM or MSMM is used. It could, for example, be that another pfile was picked up upon restart that force recalculation of the cache size. Need to analyze it yet…