Explain It To Me One More Time
By Scott Nelson
I do the very best I know how – the very best I can; and I mean to keep doing so until the end.
So begins the Optimizer chapter in the Oracle Server Concepts manual1 and it’s a fitting way to begin this paper. I take this quote as meaning that no matter how a query is written Oracle will not give up on it until either there is an answer or Oracle has consumed every resource available. Most of us have seen the later at one time or another. It’s that query from hell that brings out the best in a user community
This paper is intended to increase your knowledge of the Explain Plan and TKPROF utilities and gain an understanding how to affect the Oracle Optimizer with initialization parameters, query structure and hints. The Explain Plan and TKPROF utilities tend to be underutilized because their outputs seem so complicated and convoluted. This papers intentions is to help you learn to interpret the output of these utilities, understand why the optimizer chooses different execution plans and how to override the default optimization of a query. Nested loop, merge join, filter, projection and range scans represent some of the many options and methods of getting at the data. Understanding how Oracle manipulates and handles your query is the key to maximizing your applications performance.
“Attention: In general, you should use the cost-based optimization approach. The rule-based approach is available for the benefit of existing applications, but all new optimizer functionality uses the cost-based approach.”
Oracle 7 Server Tuning manual
Why use the cost based optimizer? Oracle says that the cost-based approach typically chooses an execution plan that is as good as or better than the plan chosen by the rule-based approach, that the cost-based approach improves productivity by eliminating the need to tune your SQL statements. And if you’re using star queries, hash joins, histograms, parallel query or partition views cost based optimization must be used to achieve efficient performance. In the case of parallel query and partition views Cost-based optimization is always used.
It is important to remember that the plans generated by the cost-based optimizer depend on the sizes of the tables. This is of particular relevance to developers when using the cost-based optimizer with a subset of the data for development or testing. It is important to not assume that the plan chosen by the optimizer for a fully populated database will be the same as that chosen for a subset of the database.
To enable cost-based optimization for an instance set the OPTIMIZER_MODE initialization parameter to CHOOSE. Alternatively to enable cost-based optimization for your session only, issue an ALTER SESSION statement with an OPTIMIZER_GOAL option value of ALL_ROWS or FIRST_ROWS. To limit the scope of the optimizer even more you may enable cost-based optimization for an individual SQL statement by using any hint other than RULE. Another important quote from the Oracle Documentation is “The RULE hint, along with the rule-based approach, will not be available in future versions of Oracle.” I would consider that as the definitive word on the subject (even though they have been saying it for years and it’s still around)
Even if rule-based optimization has been selected as the system default the use of hints may causes the cost-based optimization to be invoked. Therefore, because cost-based optimization is dependent on statistics, it is important to analyze all tables referenced in a query which has hints.
Because the cost-based approach relies on statistics they will need to be generated for all tables, clusters, and indexes accessed by SQL statements before using the cost-based approach. If the size and data distribution of these tables changes frequently, you should generate these statistics regularly to ensure that they accurately represent the data in the tables.
Statistics can be generated using the ANALYZE command by either estimating or computing the values. The time necessary to compute statistics for a table is approximately equal to the time required to perform a full table scan and a sort of the rows of the table. If there is not enough space in memory to perform the sort, temporary space may be required. To save time and resources use estimations as Oracle requires only enough space to perform a scan and sort of the rows in the requested sample. When estimating statistics for tables and clusters a recommended sample size of 20% provides, in almost all cases, accurate and statistically valid results. For indexes, computation does not take up as much time or space, so it is best to perform a computation whenever possible.
There are two goals that can be specified when using the cost based optimizer; best throughput or best response Time. The default cost-based approach optimizes for best throughput. This is good for batch jobs but not so good for OLTP systems where response time is important.
The OPTIMIZER_GOAL is an initialization parameter that can be overridden at the session level by issuing an ALTER SESSION statement with the OPTIMIZER_GOAL option of ALL_ROWS. This will cause the optimizer to select execution plans for best throughput. Conversely specifying FIRST_ROWS will cause the optimizer to select execution plans that will provide best response time
These initialization parameters have a direct im[act on the cost based optimizer:
- OPTIMIZER_SEARCH_LIMIT: the maximum number of tables in the FROM clause for which all possible join permutations will be considered
- OPTIMIZER_GOAL: dynamically changeable parameter you can use to modify the goal of the cost-based optimization approach for your session. Used only in the session; not used in initialization file.
- OPTIMIZER_MODE: sets the mode of the optimizer at instance startup: rule-based, cost based optimized for throughput or response time, or a choice based on presence of statistics. Used in initialization file only; use OPTIMIZER_MODE to change the value during a session.
- OPTIMIZER_PERCENT_PARALLEL: defines the amount of parallelism that the optimizer uses in its cost functions
- HASH_JOIN_ENABLED: enables or disables the hash join feature
- HASH_AREA_SIZE: larger value causes hash join costs to be cheaper, giving more hash joins
- HASH_MULTIBLOCK_IO_COUNT: larger value causes hash join costs to be cheaper, giving more hash joins
- SORT_AREA_SIZE: large value causes sort costs to be cheaper, giving more sort merge joins
- SORT_DIRECT_WRITES: gives lower sort costs and more sort merge joins
- SORT_WRITE_BUFFER_SIZE: large value causes sort costs to be cheaper, giving more sort merge joins
- DB_FILE_MULTIBLOCK_READ_COUNT: large value gives cheaper table scan cost and favors table scans over indexes
- ALWAYS_ANTI_JOIN: sets the type of antijoin that Oracle uses: NESTED_LOOPS/MERGE/HASH
- PARTITION_VIEW_ENABLED: enables partition views
- V733_PLANS_ENABLED: enables a set of special optimizations: GROUP BY No Sort, Index Fast Full Scans, and using B-tree indexes in bitmap access paths provided bitmap indexes exist on the table
- BITMAP_MERGE_AREA_SIZE: is the size of the area used to merge the different bitmaps that match a range predicate. Larger size will favor use of bitmap indexes for range predicates.
If you neither collect statistics nor add hints to your SQL statements, your statements will use rule-based optimization. With the rule-based approach, the optimizer chooses whether to use an access path based on these factors:
- the available access paths for the statement
- the ranks of these access paths
The optimizer looks at the conditions in the statements WHERE clause to determine which access paths are available. The optimizer then chooses the most highly ranked available access path. It is during this operation when the rule-based approach always chooses an access path that uses an index if one is available, even if a full table scan might execute faster.
Rank Access Path
1 Single row by ROWID
2 Single row by cluster join
3 Single row by hash cluster key with unique or primary key
4 Single row by unique or primary key
5 Cluster join
6 Hash cluster key
7 Indexed cluster key
8 Composite key
9 Single-column indexes
10 Bounded range search on indexed columns
11 Unbounded range search on indexed columns
12 Sort-merge join
13 MAX or MIN of indexed column
14 ORDER BY on indexed columns
15 Full table scan
Unranked Access Paths
- Fast full index scan (not available with the rule-based optimizer)
- Bitmap index scan (not available with the rule-based optimizer)
Full Table Scans: a full table scan retrieves all rows in the table, examining each row to determine whether it satisfies the statement’s WHERE clause.
Table Access by ROWID : a table access by ROWID is the fastest way for Oracle to find a single row.
Cluster Scans: after obtaining the ROWID of one of the selected rows by scanning the cluster index the rows are returned.
Hash Scans: determine the hash value by applying a hash function to a cluster key value specified by the statement then scans the data blocks containing rows with that hash value.
Index Scans: searche the index for the indexed column values accessed by the statement. If the statement accesses only columns of the index the values are returned directly from the index without having to access the table. An Index Scan can be one of these types:
- unique scan: a unique scan of an index returns only a single ROWID
- range scan: a range scan of an index can return zero or more ROWIDs.
- full scan: full scan can be used to eliminate a sort operation. It reads the blocks singly.
- fast full scan: fast full scan accesses the data in the index itself, without accessing the table. It cannot be used to eliminate a sort operation. It reads the entire index using multiblock reads (unlike a full index scan) and can be parallelized.
- bitmap: bitmap indexes use a bitmap for key values and a mapping function that converts each bit position to a ROWID. Bitmaps efficiently merge indexes that correspond to several conditions in a WHERE clause, using Boolean operations to resolve AND and OR conditions.
Choosing Execution Plans
The optimizer first determines whether joining two or more of the tables definitely results in a row source containing at most one row. The optimizer recognizes such situations based on UNIQUE and PRIMARY KEY constraints on the tables. If such a situation exists, the optimizer places these tables first in the join order. The optimizer then optimizes the join of the remaining set of tables.
For join statements with outer join conditions, the table with the outer join operator must come after the other table in the condition in the join order. The optimizer does not consider join orders that violate this rule.
“With the cost-based approach, the optimizer generates a set of execution plans based on the possible join orders, join operations, and available access paths. The optimizer then estimates the cost of each plan and chooses the one with the lowest cost. The optimizer estimates costs in these ways:
- The cost of a nested loops operation is based on the cost of reading each selected row of the outer table and each of its matching rows of the inner table into memory. The optimizer estimates these costs using the statistics in the data dictionary.
- The cost of a sort-merge join is based largely on the cost of reading all the sources into memory and sorting them.
- The optimizer also considers other factors when determining the cost of each operation. For example:
- A smaller sort area size is likely to increase the cost for a sort-merge join because sorting takes more CPU time and I/O in a smaller sort area. Sort area size is specified by the initialization parameter SORT_AREA_SIZE.
- A larger multiblock read count is likely to decrease the cost for a sort-merge join in relation to a nested loops join. If a large number of sequential blocks can be read from disk in a single I/O, an index on the inner table for the nested loops join is less likely to improve performance over a full table scan. The multiblock read count is specified by the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT.
- For join statements with outer join conditions, the table with the outer join operator must come after the other table in the condition in the join order. The optimizer does not consider join orders that violate this rule.
With the cost-based approach, the optimizer’s choice of join orders can be overridden with the ORDERED hint. If the ORDERED hint specifies a join order that violates the rule for outer join, the optimizer ignores the hint and chooses the order. You can also override the optimizer’s choice of join operations with hints.”
“With the rule-based approach, the optimizer follows these steps to choose an execution plan for a statement that joins R tables:
The optimizer generates a set of R join orders, each with a different table as the first table. The optimizer generates each potential join order using this algorithm:
- To fill each position in the join order, the optimizer chooses the table with the most highly ranked available access path according to the ranks for access paths. The optimizer repeats this step to fill each subsequent position in the join order.
- For each table in the join order, the optimizer also chooses the operation with which to join the table to the previous table or row source in the order. The optimizer does this by “ranking” the sort-merge operation as access path 12 and applying these rules:
- If the access path for the chosen table is ranked 11 or better, the optimizer chooses a nested loops operation using the previous table or row source in the join order as the outer table.
- If the access path for the table is ranked lower than 12, and there is an equijoin condition between the chosen table and the previous table or row source in join order, the optimizer chooses a sort-merge operation.
- If the access path for the chosen table is ranked lower than 12, and there is not an equijoin condition, the optimizer chooses a nested loops operation with the previous table or row source in the join order as the outer table.
The optimizer then chooses among the resulting set of execution plans. The goal of the optimizer’s choice is to maximize the number of nested loops join operations in which the inner table is accessed using an index scan. Since a nested loops join involves accessing the inner table many times, an index on the inner table can greatly improve the performance of a nested loops join.
Usually, the optimizer does not consider the order in which tables appear in the FROM clause when choosing an execution plan. The optimizer makes this choice by applying the following rules in order:
- The optimizer chooses the execution plan with the fewest nested-loops operations in which the inner table is accessed with a full table scan.
- If there is a tie, the optimizer chooses the execution plan with the fewest sort-merge operations.
- If there is still a tie, the optimizer chooses the execution plan for which the first table in the join order has the most highly ranked access path:
- If there is a tie among multiple plans whose first tables are accessed by the single-column indexes access path, the optimizer chooses the plan whose first table is accessed with the most merged indexes.
- If there is a tie among multiple plans whose first tables are accessed by bounded range scans, the optimizer chooses the plan whose first table is accessed with the greatest number of leading columns of the composite index.
- If there is still a tie, the optimizer chooses the execution plan for which the first table appears later in the query’s FROM clause.”
The EXPLAIN PLAN command displays the execution plan chosen by the Oracle optimizer for DML statements. The statement’s execution plan is the sequence of steps that Oracle performs to execute the statement and by examining the execution plan, you can see how Oracle will execute the SQL statement.
The explain plan produced is based on the data and statistics that is available when the explain plan is run. This can differ significantly between a test of development environment where a developer is likely to be working versus a production environment. According to Oracle version 8i of the database will allow for the export and import of statistics from one database to another.
Create the Output Table
Before you can issue an EXPLAIN PLAN or use the AUTOTRACE command, the PLAN_TABLE must be created to hold the commands output. The plan table is version specific so be sure to create the plan table appropriate for your version. In the account you intend to execute the EXPLAIN PLAN command run the SQL script UTLXPLAN.SQL (typically found in the rdbms directory) to create the PLAN_TABLE or issue the following SQL statement (version 8.0.4):
CREATE TABLE plan_table
The descriptions for the columns in this table can be found in the Oracle Tuning Manual.
If you’re using SQL*PLUS 3.3 or later you can use the autotrace command to automatically explain any DML statements executed during the session. Whether you are using autotrace or manually explaining the execution plan the PLAN_TABLE table must already have been created. Additionally, when using autotrace you need to be granted the PLUSTRACE role.
|SET AUTOTRACE OFF||No AUTOTRACE report is generated. This is the default.|
|SET AUTOTRACE ON EXPLAIN||The AUTOTRACE report shows only the optimizer execution path.|
|SET AUTOTRACE ON STATISTICS||Shows only the SQL statement execution statistics.|
|SET AUTOTRACE ON||Both optimizer execution path and SQL statement execution stats.|
|SET AUTOTRACE TRACEONLY||Suppresses the printing of the user’s query output, if any.|
SET AUTOTRACE ON
SQL> set autotrace traceonly
from scott.emp e,scott.dept d
order by 1;
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (ORDER BY)
2 1 NESTED LOOPS
3 2 TABLE ACCESS (FULL) OF ‘EMP’
4 2 TABLE ACCESS (BY INDEX ROWID) OF ‘DEPT’
5 4 INDEX (UNIQUE SCAN) OF ‘PK_DEPT’ (UNIQUE)
0 recursive calls
3 db block gets
17 consistent gets
0 physical reads
0 redo size
1093 bytes sent via SQL*Net to client
723 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
14 rows processed
Running EXPLAIN PLAN
To execute an EXPLAIN PLAN place the EXPLAIN PLAN and SET STATEMENT ID commands just prior to the statement to be explained. Then using SQL*PLUS execute the concatenated statement.
SQL> Truncate table plan_table;
SET STATEMENT_ID = ‘Explain Test’ FOR
from scott.emp e,scott.dept d
Upon execution the EXPLAIN PLAN statement will generate an execution plan and place the output in the PLAN_TABLE:
There are many ways to extract information from the PLAN_TABLE. To see the output of the explain plan in a tabular format the follwing select statement can be used:
SELECT operation, options, object_name, id, parent_id, position
WHERE statement_id = ‘Explain Test’
ORDER BY id;
OPERATION OPTIONS OBJECT_NAME ID PARENT_ID POSITION
——————– ————— ————— —- ——— ——–
SELECT STATEMENT 0
NESTED LOOPS 1 0 1
TABLE ACCESS FULL EMP 2 1 1
TABLE ACCESS BY INDEX ROWID DEPT 3 1 2
INDEX UNIQUE SCAN PK_DEPT 4 3 1
The ORDER BY clause returns the steps of the execution plan sequentially by ID value. However, Oracle does not perform the steps in this order. PARENT_ID receives information from ID, yet more than one ID step can feed into PARENT_ID.
Selecting from the Plan Table with a nested format of the output that more closely depicts the processing order used for the SQL statement.
If the dept and emp tables have been analyzed the plan looks like this:
select decode(id,0,”,lpad(‘ ‘,2*(level-1))||level||’.'||position)||’ ‘||
operation||’ ‘||options||’ ‘||object_name||’ ‘||object_type||”||
decode(id,0,’Cost = ‘||position) Query_plan
connect by prior id = parent_id and statement_id = ‘Explain Test’
start with id = 0 and statement_id = ‘Explain Test’;
SELECT STATEMENT Cost = 3
2.1 HASH JOIN
3.1 TABLE ACCESS FULL DEPT
3.2 TABLE ACCESS FULL EMP
And if the dept and emp tables have not been analyzed the plan looks like this:
select decode(id,0,”, lpad(‘ ‘,2*(level-1))||level||’.'||position)||’ ‘||
operation||’ ‘||options||’ ‘||object_name||’ ‘|| object_type||’ ‘||
decode(id,0,’Cost = ‘||position) Query_plan from plan_table
connect by prior id = parent_id and statement_id = ‘Explain Test’
start with id = 0 and statement_id = ‘Explain Test’;
SELECT STATEMENT Cost =
2.1 NESTED LOOPS
3.1 TABLE ACCESS FULL EMP
3.2 TABLE ACCESS BY INDEX ROWID DEPT
4.1 INDEX UNIQUE SCAN PK_DEPT UNIQUE
EXPLAIN PLAN Operations2
The operations and options used by the explain plan are summarized below:
|AND-EQUAL||An operation that accepts multiple sets of ROWIDs and returns the intersection of the sets, eliminating duplicates. This operation is used for the single-column indexes access path.|
|BITMAP||CONVERSION||TO ROWIDS converts the bitmap representation to actual ROWIDs that can be used to access the table.FROM ROWIDS converts the ROWIDs to a bitmap representation.COUNT returns the number of ROWIDs if the actual values are not needed.|
|INDEX||SINGLE VALUE looks up the bitmap for a single key value in the index.RANGE SCAN retrieves bitmaps for a key value range.FULL SCAN: A bitmap index full scan is performed if there is no start or stop key.|
|MERGE||Merges several bitmaps resulting from a range scan into one bitmap.|
|MINUS||Subtracts the bits of one bitmap from another. This row source is used for negated predicates and can be used only if there are some nonnegated predicates yielding a bitmap from which the subtraction can take place.|
|OR||Computes the bitwise OR of two bitmaps.|
|CONNECT BY||A retrieval of rows in a hierarchical order for a query containing a CONNECT BY clause.|
|CONCATENATION||An operation that accepts multiple sets of rows and returns the union-all of the sets.|
|COUNT||An operation that counts the number of rows selected from a table.|
|STOPKEY||A count operation where the number of rows returned is limited by the ROWNUM expression in the WHERE clause.|
|FILTER||An operation that accepts a set of rows, eliminates some of them, and returns the rest.|
|FIRST ROW||A retrieval on only the first row selected by a query.|
|FOR UPDATE||An operation that retrieves and locks the rows selected by a query containing a FOR UPDATE clause.|
|HASH JOIN||An operation that joins two sets of rows, and returns the result.|
|ANTI||A hash anti-join.|
|SEMI||A hash semi-join.|
|INDEX||UNIQUE SCAN||A retrieval of a single ROWID from an index.|
|RANGE SCAN||A retrieval of one or more ROWIDs from an index. Indexed values are scanned in ascending order.|
|RANGE SCAN DESCENDING||A retrieval of one or more ROWIDs from an index. Indexed values are scanned in descending order.|
|INLIST ITERATOR||CONCATENATED||Iterates over the operation below it, for each value in the IN list predicate.|
|INTERSECTION||An operation that accepts two sets of rows and returns the intersection of the sets, eliminating duplicates.|
|MERGE JOIN||An operation that accepts two sets of rows, each sorted by a specific value, combines each row from one set with the matching rows from the other, and returns the result.|
|OUTER||A merge join operation to perform an outer join statement.|
|ANTI||A merge anti-join.|
|SEMI||A merge semi-join.|
|CONNECT BY||A retrieval of rows in hierarchical order for a query containing a CONNECT BY clause.|
|MINUS||An operation that accepts two sets of rows and returns rows that appear in the first set but not in the second, eliminating duplicates.|
|NESTED LOOPS||An operation that accepts two sets of rows, an outer set and an inner set. Oracle compares each row of the outer set with each row of the inner set and returns those rows that satisfy a condition.|
|OUTER||A nested loops operation to perform an outer join statement.|
|PARTITION||CONCATENATED||Iterates over the operation below it, for each partition in the range given by the PARTITION_START and PARTITION_STOP columns.|
|PROJECTION||An internal operation.|
|REMOTE||A retrieval of data from a remote database.|
|SEQUENCE||An operation involving accessing values of a sequence.|
|SORT||AGGREGATE||A retrieval of a single row that is the result of applying a group function to a group of selected rows.|
|UNIQUE||An operation that sorts a set of rows to eliminate duplicates.|
|GROUP BY||An operation that sorts a set of rows into groups for a query with a GROUP BY clause.|
|JOIN||An operation that sorts a set of rows before a merge-join.|
|ORDER BY||An operation that sorts a set of rows for a query with an ORDER BY clause.|
|TABLE ACCESS||FULL||A retrieval of all rows from a table.|
|CLUSTER||A retrieval of rows from a table based on a value of an indexed cluster key.|
|HASH||Retrieval of rows from table based on hash cluster key value.|
|BY ROWID||A retrieval of a row from a table based on its ROWID.|
|UNION||An operation that accepts two sets of rows and returns the union of the sets, eliminating duplicates.|
|VIEW||An operation that performs a view’s query and then returns the resulting rows to another operation.|
SQL Trace and TKPROF
To find out additional information beyond what Explain can provide the SQL trace facility provides more in-depth information about the statement, session and instance. It generates the following statistics for each statement:
- Parse, execute, and fetch counts
- CPU and elapsed times
- Physical reads and logical reads
- Number of rows processed
- Misses on the library cache
- Username under which each parse occurred
- Each commit and rollback
The SQL trace facility can be enabled for single session or for an entire instance. When the SQL trace facility is invoked performance statistics for all SQL statements executed during a user session or for the entire instance are placed into a trace file.
Running the TKPROF program to formats the trace file and into a readable output file. TKPROF can also
- determine the execution plans of SQL statements
- create a SQL script that stores the statistics in the database
TKPROF reports how each statement was executed, the resources it consumed, the number of times it was called, and the number of rows processed.
References and footnotes
1 – Oracle Concepts Manual 8.0.4
2 – Oracle8 Tuning Manual Release 8.0
About the Author
Scott Nelson is the founder of Results Through Technology, Inc. an Indianapolis based consulting firm that specializes in the Oracle Database, Oracle Tools and Internet technologies. For the past 11 years he has specialized in Oracle based analysis, design, migrations, CASE, DBA, methodology, and project management areas. He is a certified DBA through both Chauncey and Oracle OCP programs and is a frequent presenter on Designer and DBA topics at local, ODTUG, IOUG-A, and EOUG user group conferences. He was appointed to the ODTUG board in 1998, elected Vice President of the group in 1999 and served as the ODTUG 99 and 2000 conference chair. He is a past president of the Indiana Oracle User Group and is currently serving as the group’s treasurer. For the past four years he has worked with the IOUG-A and Oracle OpenWorld conferences as a Focus Area Manager for the Methodology, Designer and Modeling streams.
Filed under: Technical Tips
Like this post? Subscribe to my RSS feed and get loads more!