Problems with using cost-based optimizer

Cost optimizer has significant improvement as compared to its original version. The author suggests that, in any medium, that recently used Oracle system is benefited from cost-based optimizer. Moreover, it should also think about it, to the centers, which are used in the optimizer re-gułowego, prepared an appropriate plan for the migration to cost-based optimizer. However, there are issues associated with this kind of optimizer to keep in mind. In Table 3, the most frequent problems (and the frequency of occurrence), the author was able to observe.

Often recurring problems in the case of cost-based optimizer


  1. The problem of asymmetry
  2. Analyzing data inadequate
  3. The common use of the joins Optimizers
  4. Choosing the wrong index
  5. Joining too many tables
  6. Inappropriate parameter settings in the file init.ora


The problem of asymmetry

Let us assume that the problem is a system in which there is a table with a cross bearing the name of the column status. Acceptable values ​​are two columns: About to sign a transaction open (open transactions), which are not yet posted, and C to denote the transaction closed (closed transaclions) that have already been posted and do not require any maintenance. There are over a million records that have the status of C and always only 100 lines, which have the status of O.

Created the following SQL statement, which is performed every few hundred times, but the response time is not satisfactory:

SELECT acct_no, customer, product, trans_date, amt
FROM trans
WHERE status = 'O';

Response time: 16.308 seconds

In this example - taken from life - cost optimizer decided that Oracle should conduct a review of the entire table (full table scan). This is because the optimizer to have information on the number of different values ​​that can take the STATUS field, but do not have information on the number of records with each of these values. As a result, the optimizer established uniform distribution of data (50/50) for each of the two values ​​A and C. Under this assumption, the Oracle system shall review the entire table to retrieve data for open transactions.

Oracle will have information about the asymmetry of distribution of the data, the number of rows having a specific value in the indexed columns if during the ANALYZE command or DBMS_STATS package when you execute given an option FOR ALL INDEXED COLUMNS. Suppose now that the Status column has an index. In order to analyze the table, use the following command:

ANALYZE COMPUTE STATISTICS TRANS TABLE FOR ALL INDEXED COLUMNS

After analyzing the table and calculate statistics for all indexed columns, the optimizer will have cost information that only about 100 lines is the value of O, which makes the use of this column for the index. As a result, you'll get a much faster response times:
Response time: 0.259 seconds

Usually, the cost optimizer conduct a review of the entire table, if the column value is present in over 12% of the rows in the table, and use the index when the value is less than 12% of the rows. The choice made by the optimizer cost is not based on such a simple rule, but the practice shows that this is typical behavior.

Before the introduction of Oracle9i - if you used the variables assigned to - the problem of asymmetry could still occur even when the option is used INDEXED COLUMNS FOR ALL. Take a close look at the following command:

local_status: = 'O';
SELECT acct_no, customer, product, trans_date, amt
FROM trans WHERE status = local_status;

Response time: 16.608 seconds

The response time is similar to that which occurred in the case of non-use option FOR ALL INDEXED COLUMNS. The problem occurs because the cost optimizer does not know the value of a variable when determining trailing execution plan. In general - in order to avoid the problem of asymmetry is:

• literal values ​​stored in the code directly (for example, you can use the WHERE STATUS = 'O' instead of WHERE
STATUS = local_status);
• Always perform the analysis with the option FOR ALL INDEXED COLUMNS.
If you are still experiencing performance problems associated with not using the optimizer cost index because dowiązanych variables, and you can not change the source code, is an attempt to remove the index statistics with the command:

ANALYZE INDEX
TRANS_STATUS_NDX
DELETE STATISTICS;

Deleting index statistics to improve the situation, because the behavior of forces applied by the rule-based optimizer, which always uses the existing index (revision instead of the entire table).

NOTE

In Oracle9i dowiązanych variable value is determined before deciding on implementation plan, which eliminates the need for direct recording in the code literal values.

Analyze the data unsuitable

By come into contact with a number of systems in which performance problems stemmed from the fact that the tables and indexes were analyzed at the time when the amount of data contained typical. Cost optimizer must have accurate information (including information on the volume of data) in order to determine an effective plan.

Situations in which statistics can be lost or become obsolete, can be re-creating the table or move, add or create a new index of the environment. For example, you forget to re-create the statistics after the transfer of the database schema to the production environment. Problems also occur when an administrator does not have enough information about the database, which manages and analyzes the table when it is empty, not when, after a short period of time, it has hundreds or thousands of rows.

How to confirm the date of the last analysis

In order to verify this, which tables, indexes and partitions were analyzed and when it was done the last time, you can query that retrieves the value of a column LAST_ANALYZED USER_XXX from different perspectives. For example, in order to determine the date of the last analysis all tables follow:

SELECT table_name, num_rows, last_analyzed FROM user_tables;

USER_TABLES addition there are many other perspectives, so you can check the date of the analysis of different objects. In order to obtain a complete list of prospects containing column LAST_ANALYZED perform the following query:

SELECT table_name
FROM all_tab_columns
WHERE column_name = 'LAST_ANALYZED';

Of course, the point is not to analyze the option COMPUTE perform as often as possible. Doing so may cause the SQL statement will be detuned.

During the analysis, the decision to re-analyze tables and indexes can be as dangerous as the adjustment indexes and ideally should be performed on a copy of the production database before the introduction of changes in the actual production database.

Peoplesoft software is an example of an application that uses temporary tables to store the data, with names ending in _TMP expression. When the start-up is not performed batch process, each of these tables is usually empty. During the execution of each stage batch process for the tables are made to insert and update operations data.

The last phase of the process is to insert the data into the main table Poplesoft transactional applications by extracting data from temporary tables. At the end of the batch process typically all rows are deleted from the temporary tables. Transactions related to these tables are not approved until the end of the process when there is no longer any data in them.

When it seems to ANALYZE the temporary tables, they are usually empty. When the cost optimizer receives information about the zero number of rows, it automatically takes a decision on the review of the entire table and use Cartesian join. To work around this problem by suggesting to fill the temporary table data for analysis. Then you can empty tables with the data and begin normal processing. Empty the table (TRUNCATE command) does not clear the statistics.

INSERT and UPDATE SQL statements used by the application to insert data into temporary tables can be checked using the procedure of tracing (tracing) the batch process that inserts and updates data. The same SQL commands can be used to fill the personal data table.

Using this approach the problem in one of the major centers in Australia, which benefited from Peoplesoft software, process, batch execution time dropped from 36 hours to less than 30 minutes.

If you analyze the tables that store temporary data including production volumes of data does not solve performance problems, consider removing the relevant statistics for these tables. This forces the use of the SQL statements that refer to those tables, the principles of rule-based optimizer. Statistics can be removed by using the DELETE command ANALYZE STATISTICS TableName. After their removal it is important to ensure that the tables were not used in joins with tables that have statistics. You should also ensure that the terms of niezanalizowanych tables were not used indexes with statistics. If the temporary tables are used separately and join are only among themselves, the preferred approach is often to use the principles of rule-based optimizer.

Common use of the joins Optimizers

As mentioned earlier, when the tables are junction and one of them will be analyzed, and the remaining tables do not, the optimizer works cost the least favorably.

Analyzing tables and indexes using DBMS_STATS procedures. GATHER_SCHEMA_STATS and procedures GATHER_TABLE_STATS be sure to include the option CASCADE => TRUE. By default DBMS_STATS package collects statistics only for tables. Having statistics for the tables, but not for the index, may also cause peeling by the optimizer cost inefficient execution plans.

One of the cases of this problem, with which he met author, took place in the table having niezanalizowaną trans and analyzed acct table. Administrator to remove the data again created a trans table, but forgot to do the analysis. The following example illustrates the performance of the implementation of the join of the two tables:

SELECT a.account_name, SUM (b.amount)
FROM trans b, and acct
WHERE b.trans_date> sysdate - 7
AND a.act_id = b.acct_id
AND a.acct_status = 'A'
GROUP BY account_name;
SORT GROUP BY
NESTED LOOPS
TABLE ACCESS BY ROWID ACCT
INDEX UNIQUE SCAN TABLE ACCESS FULL ACCT_PK TRANS

Response time: 410 seconds

The response time was significantly shorter after analyzing the trans table with the following command:

ANALYZE TABLE ESTIMATE STATISTICS trans
3 PERCENT SAMPLE
FOR ALL INDEXED COLUMNS

The new plan execution and response times were as follows:

SORT GROUP BY
NESTED LOOPS
TABLE ACCESS BY ROWID ACCT
INDEX UNIQUE SCAN ACCT_PK
TABLE ACCESS BY ROWID TRANS
INDEX RANGE SCAN TRANS_NDXl

Response time: 3.1 seconds

Choosing the wrong index

Cost optimizer selects the secondary index over time, even if it seems obvious that it should have used a different index. Take a close look at the following WHERE clause occurring in the software Peoplesoft:

where business_unit =: 5
and ledger = 6
and fiscal_year = 7
and accounting_period = 8
and affiliate = 9
and statisctics_code = 10
and project_id = 11
and account = 12
and currency_cd = 13
and deptid = 14
and product = 15

Peoplesoft system, from which the above example, had an index containing all the columns specified in the WHERE clause. It would seem that Oracle uses to execute the query is this index. However, the cost optimizer decided to use an index on columns (business_unit, ledger, fiscal_year, account). After playing the SQL command and execution time compared with the case using guidelines requiring the use of a larger index, it turned out that it is more than four times less than the execution time using the index selected by the optimizer.

Further studies have shown that the index should be created as a unique (UNIQUE), but in the process of deleting data and play table mistakenly created it as a non-unique. Of course, a four-time gain very happy user of the system.

But there are other problems. The same index was an ideal candidate for use in below the command that was one of the most frequently performed in the processing of data at the end of the month or the end of the year:

where business_unit = 5
and ledger = 6
and fiscal_year = 7
and accounting_period Between l and 12
and affiliate = 9
and statisctics_code = 10
and project_id = 11
and account = 12
and currency_cd = 13
and deptid = 14
and product = 15

Despite the establishment of the correct index as a unique, cost optimizer again it did not take into account. The only difference between the current and the previous command based on the fact that it covered the field more accounting periods (accounting period) for fiscal year (fiscal year), and not just one accounting period.

For the above WHERE clause used was the same as previously, with columns inadequate index (business_unit, ledger, fiscal_year, account). And again - after measuring the execution time of a command using the index chosen by the optimizer cost, and the index contains all the columns - it turned out that the latter provide at least three times faster performance.

Problem solved by column Reset items accounting_ period to the last position in the index (originally was on the third). The new index has the following form:

business_unit
ledger
fiscal_year
Affiliate
statisctics_code
project_id
account
currency_cd
deptid
product
accounting_period

Another way to force cost-based optimizer to use the index is to use one of the tips that will allow you to be identified. This is a good solution, but many sites use the packages provided by the developers, which can not be modified (and therefore can not be used tips). However, it is possible to create a perspective that contains the hint and give users permission to access this perspective. It will be useful if the SQL statement, the execution performance leaves much to be desired, is part of a report or a direct question, which can be read by the prospect.

In the end, sometimes it turns out that you can force the use of the index, if you remove the statistics. Sometimes you can also use the command ANALYZE ESTIMATE with only a core value of the analyzed 1,064 lines. It often happens that the plan will be changed to the desired, but this type of action is something of a,, casting ". Extremely important point is that using the" magic "of your actions carefully documented. Another method is to try to reduce the parameter OPTIMIZER_INDEX_COST_ADJ * (The value of this parameter set in the init.ora file - editor. Crowd) to a value in the range of 10 to 50

In summary you need to answer the question of why the optimizer cost take such inappropriate decisions. First of all - it should be noted that a bad decision on the execution plan is the exception rather than the rule. Examples of this subsection show that the columns are considered individually rather than collectively. If so, the first of the presented examples of cost optimizer would find - without the need to replace the index by the administrator as a unique - that each row has a unique value. The second example shows that if the number of columns in an index has a small number of different limit values, and the SQL command requires access to most of them, the cost optimizer often ignores the index. This is despite the fact that taken together the columns are well defined and the query returns few rows.

Little effect justifying optimizer should be noted that the use of a smaller number of index columns often produces a significant increase in exercise performance as compared to the indices of the plurality of columns.

Joining too many tables

The first versions of cost-based optimizer often used the method, divide and conquer "when złączaniu subject to more than five tables.

Query selects all the data associated with the undertaking of the account identifier (column acct_id) equal to 777,818. The company has several branches, and the query relates to a branch in the state of Washington (WA). Table A is a table acct, Table F is acct_address, and G is the address table.

Acct WHERE id = 77818

The user expects that the query returns a relatively small number of rows from different tables, and the response time will not exceed l seconds. It is best if Oracle receives acct_address rows from the table that corresponds to the calculus, and her connections with the address table to determine whether the address was correct at Washington.

However, due to the fact that such a junction is subject to multiple tables, cost optimizer will often decide that the tables will be processed F and G independently of each other and only at the end of the data to be merged.

The result of the join tables F and G will be that will have to be selected all the addresses that concern the state of Washington. This process can take a few minutes, which probably will result in the overall execution time will be much longer than that which would occur if Oracle control the access to all the tables from Table A.

Assuming that the table acct_address (F) has index acct_id column, you can solve the problem by using the appropriate clue instructing the optimizer cost that should be used for this particular index. This will greatly improve performance.

What's interesting - rule-based optimizer is often much more difficult for the proper determination of the execution plan for multiple tables złączania than the cost optimizer. Rule-based optimizer often do not use the table as a table control acct. To enforce the name in the FROM clause of table A place last.

If you use a ready-made software, the best way may be to create a perspective that contains a hint (if it is permissible and possible for your package).

Inappropriate parameter settings in the file init.ora

In many centers are using the pre-production database to test the performance of SQL execution before moving indexes and the code to the production database. Ideally, the database contains a number of pre-industrial production base close to the data, and the table is analyzed in exactly the same way as in the case of the production base. This pre-production base will often provide a copy of the actual data files, the production phase.

When administrators are testing the changes in a pre-production, usually everything works fine, but when you move to a production base sometimes it turns out that there are other plans peeled performance. The reason for these problems may be a different set of parameters in the init.ora file pre-production and production base.

I met one day below to update the data, the execution time amounted to four minutes, even though the condition in the WHERE clause contains reference the primary key table. Oddly - when the acct was taken from the table data (SELECT) and not updated (UPDATE) using exactly the same WHERE clause proved that the index has been used.

UPDATE SET proc_flag acct = 'Y'
WHERE pkey = 100;

# The response time was 4 minutes
# And the master key has not been used

Tried all means re-examining table, and finally removed her statistics. It turned out that when the rule-based optimizer was used, the command was executed efficiently.

After much analysis, it was decided to check the init.ora file parameters. It turned out that the COMPATIBLE parameter was set to system version 8.0.0, but work actually took place in Oracle 8.1.7. When you change the parameter to 8.1.7 and re-execution of the UPDATE, it turned out that the index is used properly this time and execution time of approximately O, l seconds.

COMPATIBLE parameter is the only one that requires the same settings in a pre-production and production to ensure identical operation cost-based optimizer.

These are:

SORT_AREA_SIZE

The number of bytes allocated to the session user to sort the data in memory. If the parameter is set to its default value of 64K, NESTED LOOPS (nested loops) will be preferred to SORT merges (join by sorting and merging) or HASH JOINS (hash join).

HASH_AREA_SIZE

The number of bytes allocated to the session user to perform joins in memory hash (hash joins). The default value is twice the SORT_AREA_SIZE. Hash join often do not work properly if this parameter is set to a value lower than l megabyte.

HASH_JOIN_ENABLED

Enables or disables use of hash joins. The default value is TRUE and usually does not require parameter settings.

OPTIMIZER_MODE

The possible values ​​are CHOOSE, FIRST_ROWS or ALL_ROWS. CHOOSE value makes the optimizer cost is used if statistics are available. Value FIRST_ROWS works the same way, but to increase the preference for NESTED LOOPS instead of SORT MERGE or HASH JOINS. The value of ALL_ROWS causes preference SORT MERGE and HASH JOINS NESTED LOOPS before.

DB_FILE_MULTIBLOCK_READ_COUNT

Number of blocks that Oracle charges for each read from the table. If you specify a large value (eg 16 or 32), Oracle will in many cases chose FULL TABLE SCANS (review of the entire table) instead of NESTED LOOPS.

OPTIMIZER_MODE_ENABLE

Optimizer provides new features. For example, setting the parameter to 8.1.7 allows you to use all mechanisms available in Oracle version 8.1.7 (and earlier). This parameter can also automatically adjust other parameters, such as FAST_FULL_SCAN_ENABLED.

Major improvements have been introduced in different versions of the system are: 8.0.4 - nested loops arranged (ordered nested loops), fast full inspection (fast fuli session) 8.0.5 - many bug fixes optimizer; 8.1.6 - Improved support for histograms, partitions and nested loops; 8.1.7
- Improved support for partition and optimization of subexpressions; 9.0.1 - much improved index join, merge complex perspectives, improving to use bitmap indexes, and join subexpressions.

OPTIMIZER_INDEX_CACHING

This parameter, "he says," the percentage of Oracle system data indexed, which can be expected in the memory. Default value for this parameter is 0, and the range of values ​​- from O to 100 The higher the value, the more likely it is to use NESTED LOOPS instead of SORT MERGE and HASH JOIN. observed in some systems to improve performance by setting it to the value of the 90th

OPTIMIZER_INDEX_ COST_ADJ

This parameter setting can increase the frequency of use of indexes. The default value is 100 Reducing it to 10 if the optimizer cost is lower the cost of using the index to 10% of the default. You can also specify the value as much higher than the 100 in order to force a SORT MERGE or HASH JOIN. In some places there was an increase performance by setting this parameter to a value derived from a range of 10 to 50 for OLTP systems * and 50 in the case of decision support systems. Reducing the value can speed up some queries OLTP, but at the same time makes the task execution time of a few hours can be significantly increased. Increasing this value can result in the opposite.

OLTP (On Line Transaction Processing) - Transaction Processing Systems

OLAP (On Line Analytical Processing) systems - Analytical Processing

STAR_TRANSFORMATION_ENABLED

Makes is used for the transformation of a star (star transformation) to merge bitmap indexes in the tables of facts *. This is a method different from the Cartesian join, which usually is the case for the optional use of data in the tables of the logical structure of a star (star schema).

Table facts (fact table) is the principal, normalized table for a star schema - the most frequently used logical structure of the database OLAP systems (data warehousing)

QUERY_REWRITE_ENABLED

Allows you to use index function (function-based in-dexes), as well as rewriting (rewrite) requests for prospects materialized. The default value is FALSE, which may explain why functional indexes are not used. Simply - to give parameter to TRUE.

PARTITION_VIEW_ENABLED

This parameter allows you to use the partition perspective. The default value is FALSE, so using outlook partition to give it a value of TRUE. Generally speaking - the perspective of the prospect of partition, which is a join table UNION ALL. They were introduced before the Oracle partitions and have been successfully used in many systems for archiving and increase productivity.

PARALLEL_BROADCAST_ENABLED

This parameter is used by the query in parallel (pa-rallel ąuery) when you are using a small table maps (lookup tables). Its default value is FALSE. When set to TRUE, the rows of small tables are sent to each child process to accelerate the implementation of MERGE JOIN and HASH JOIN when złączania a small table with a lot.

OPTIMIZER_MAX_PERMUTATIONS

It can be used to reduce the parsing time. However, reducing the number of permutations can cause inefficient execution plans specify, so in case of a default value for this parameter should not be changed.

CURSOR_SHAR1NG

When set to FORCE or SIMILAR can speed up parsing, reduce memory usage in the shared memory and reduce the amount of low-level lock (latch contention). This is achieved by modifying the command similar to the WHERE clause contains literals, variables using the command dowiązanych.

The default value is EXACT. Author suggests you set this parameter to SIMILAR Oracle9i system only if you are sure that the system there are many similar commands, which differ only in the values ​​of letters. Much better to save your application code so that the variables used were assigned.

Setting the parameter to FORCE causes similar commands use the same memory space SQL, which can degrade performance. This value should not be used.

It is worth noting that the STAR TRANSFORMATION option will not work if this value will give the value of SIMILAR or FORCE.

ALWAYS_SEMI_JOIN

This parameter can greatly improve performance for applications that frequently use WHERE EXISTS option. Setting it to a value MERGE or HASH could cause the previously executed command for a few hours will end after several minutes. The default value is STANDARD, which means that your main (but not your child) controls the execution plan. If you select this parameter, the query will be asking the child control.

ALWAYS_ANTI_JOIN

This parameter changes the behavior of the command type NOT IN and can greatly speed up the process if you give it a value of HASH or MERGE. This setting will cause the merge will be executed or hash join instead of time-consuming Cartesian join, which is the case for standard execution commands such as NOT IN.

Please note that if any of these parameters has a different value in a pre-production and production, it is possible that the plans for the implementation of the SQL commands will vary. Identical values ​​provide the same effect.

Problems in the use of rule-based optimizer

Rule-based optimizer offers many opportunities for tuning. Due to the fact that his behavior is predictable, conditioned by 20 ranks the conditions set out in the table l already, you can easily benefit from these opportunities.

I tried to determine the types of problems that arise when working with both optimizers and identify the best ways to solve them.

The main causes of poor work performance rule-based optimizer (often recurring problems in the case of rule-based optimizer):
  1. Inadequate control table
  2. Inadequate index
  3. Inadequate control index
  4. Using the index instead of the index ORDER BY WHERE

Inadequate control table

If the control table in a join is not selected optimally, it may be that significantly increases the time needed to execute the query. Earlier (in the section: 'RBO rules do not tell # 6 ") was already discussed what determines the choice of control table. Presented following example illustrates the potential differences in the time for completion:

SELECT COUNT (*)
Acct FROM a, b WHERE b.cost_center trans = 'MASS'
AND a.acct_name = 'MGA'
AND a.acct_name = b.acct_name;

In this example - if we assume that acct_name represents a unique key index, and the index cost_center represents a single column of values ​​nieunikatowych - make a unique key that the table becomes a table control acct.

If both cost_center and acct_name indexes were nieunikatowych individual columns of values, the rule-based optimizer for a table would take control of trans table because it was listed as the last in the FROM clause. Such a situation with a high degree of probability could mean waiting a long time to execute, because usually there is only one row in the table acct for a certain account name (accouni) and the number of transactions for each cost center (cost center).

In the case of rule-based optimizer - if the status of the two indices are the same - the system does Oracle simply click on the order in which during the parsing tables are defined. Due to the fact that the syntax analyzer converts table names from right to left, the table whose name was given as the last (in the example above - the table trans) is actually processed first (control table).

SELECT COUNT {*)
FROM acct a, Trans b
WHERE b.cost_center = 'MASS'
AND a.acct_name = 'MGA'
AND a.acct_name = b.acct_name;

Answer = 19.722 seconds

The response time when switching the order of the tables in the FROM clause is
as follows:

SELECT COUNT (*)
FROM trans b, and acct
WHERE b.cost_center = 'MASS'
AND a.acct_name = 'MGA'
AND a.acct_name = b.acct_name;

Answer = 1.904 seconds

It is important that the table in the FROM clause that are listed as the last, contained the smallest number of rows returned. You can also customize the indexing so that it forced the use of a specific table as a control table. For example, you can make cost_center index index skonkatenowanym, joined with the column, which is often used in SQL queries along with the current column. Thus avoid giving it such a high rank, when there is a join of the tables.

Inadequate index

WHERE clause often contains multiple indexes, which can use rule-based optimizer. The optimizer does not have, however, no information on the number of rows that will need to review the index or the potential impact of your choice on the bad response time. Inappropriate selection of the index significantly increases the time.

Rule-based optimizer is guided by the simple principles of selection index. These rules (as well as some scenarios) described earlier (in the following sections under the heading "RBO rules do not tell").

It is at this point consider an example situation.

ERP package developed in general form to allow the use of columns for reporting purposes in any, as selected by the user. Suppose you have a column called business_unit, which has a single-column index. In most centers there are hundreds of business units (business unit), but in the other - only one.

Journal table has one index in column (business_ unit) and the second in columns (business_unit, account, journal_data). WHERE expression used in the query is as follows:

WHERE business_unit - 'A203'
 AND account = 801919
BETWEEN AND journal_data
'01-DEC-2001 'AND '3 l-DEC-2001'

Single-column index takes precedence over trzykolumnowym index, although the latter returns the result in a fraction of the response time using one-column index. This happens because the request is using all one-column the column index. In this case, the only possibility is to remove the index, or use the cost-based optimizer. If no customized software is used, it can also use clues.

Inadequate control index

The way in which conditions are given in the expression (expressions) WHERE SELECT statement has the greatest impact on the performance of SQL statements, because the order in which conditions are given, the index influences the choice made by the optimizer.

If the rank two indices are equal (when, for example, in the WHERE clause are columns two indices jednokolumnowych) merges Oracle indexes. The order of consolidation can have a huge impact on the command. If the index that controls the query returns more rows than the other indexes, the query performance will not be optimal. The effect is very similar to the one that played in the wrong order of tables in the FROM clause.

It should consider the following example:

SELECT COUNT (*)
FROM trans
WHERE cost_center - 'MASS'
AND bmark_id = 9;

Response time: 4.255 seconds

The index, which has a column in the WHERE clause referred to as the first, it will control index. If the above command cost_center indexed value = 'MASS' pay far more rows than bmark_id = 9, in which case it will return one or at most two rows.

The following query order conditions in the WHERE clause was converted, which results in much shorter execution time.

COUNTf SELECT *)
FROM trans
WHERE bmark_id = 9;
AND cost_center = 'MASS'

Response time: 1.044 seconds

In the case of rule-based optimizer is the first place the WHERE clause conditions that will return the minimum number of lines.

Use the index instead of the index ORDER BY WHERE

Less frequently occurring problems associated with the choice of an index that observed in systems using rule-based optimizer:

SELECT fod_flag, account_no ...
FROM account_master
WHERE (account_code LIKE '1% ')
ORDER BY account_no;
Index_l UNIQUE (ACCOUNT_NO)
Index_2 (ACCOUNT_CODE)

At the specified index of this query execution time was 20 minutes. Request was used to create the report and performed daily by many brokers.

In this example, the optimizer tries to avoid sorting and chooses the index that contains a column with ORDER BY expression, instead of having an index column in the WHERE clause.

Centre, which described the problems occurred, the brokerage firm. SQL statement was executed frequently in order to create financial account summaries.

Managed to solve the problem of creating a concatenated index on both columns:

# Additional Index (ACCOUNT_CODE, ACCOUNT_NO)

It was decided to remove the index Index_2 (ACCOUNT_CODE), which was no longer needed, because the column account_code was leading a column of the new index. Account_no column has been added to the new index in order to take advantage of the growing data storage in an index. This allows avoiding the need to sort and, consequently, execution time decreased to 10 seconds.

Choice of Optimizer

If the reader will now use rule-based optimizer, by all means, it is the transition to cost-based optimizer. Reasons for this are many.


  • The time spent on coding is reduced.
  • The programmer does not need to know the rules.
  • Optimizer has more features, and many more tools for tuning.
  • Significantly increased the chance of finding packages from different suppliers, but which operate efficiently. Many of these packages developed for platforms running on DB2, Informix, SQL * Server, and Oracle. The code is written so as to meet the needs of the rule-based optimizer, but is a general.
  • End users can develop tuned code without having to learn a sizeable resource optimizer rules.
  • Optimizer cost is gradually improved in future versions of Oracle. The development of rule-based optimizer has been abandoned.
  • There is less risk associated with adding new indexes.
  • There are many features that are available only in the case of cost-based optimizer. These include the identification of prospects materialized (materialized views), the transformation of a star (star transformation), the use of functional indexes, etc. The number of similar mechanisms is huge and over time the differences between optymalizatorami will grow.
  • Oracle has made a number of mechanisms (such as DBMS_STATS package and scenarios) to resolve known problems with the inconsistencies of cost-based optimizer in different environments.

Common misconceptions of the optimizers

It is worth a try at this point to clarify some misunderstandings about optimizers.

  • Oracle8i and Oracle9i systems do not use the rule-based optimizer
Absolutely not true. Some publishers gave that information some time ago, but Oracle provides, it certainly is not.


  • Guidelines can not be used in the case of rule-based optimizer
Most of the tips can be successfully used in SQL statements using rule-based optimizer.


  • SQL command will be adjusted according to the rules made by the cost efficient
With a large bit of luck could happen, but the cost-based optimizer going to have to be prepared for the need to optimize the number of SQL commands. The author is not met, however, a system in which it would be impossible to change after such adjustment.


  • Tuned SQL command will be run at cost efficiently according to the rules
This is unlikely, unless it has been written taking into account the characteristics of rule-based optimizer.


  • You can not simultaneously use the rule-based optimizer and a cost
You can use both optimizers, if init.ora parameter file OPTIMIZER_MODE give the value CHOOSE, and some tables in the system is analyzed, and some do not. Be careful not to connect the tables, which were analyzed with tables that have not been analyzed.

Histograms

Before the entering of the system, Oracle 7.3 histograms, cost optimizer was not able to distinguish strongly uneven distribution of key data.

Use of system resources

By default, the cost optimizer assumes that the user is the only one that works with the database. In Oracle9i can store information on the use of system resources, allowing you to make much better decisions on the basis of load (DBMS_STATS procedure. GATHER_SYSTEM_STATS).

Relevant statistics are current

Cost optimizer may incorrectly choose a plan, if the table was analyzed, but the indexes were analyzed, as well as when the indexes are analyzed, and the table does not.

One should not impose cost-based optimizer to use the system via included instructions when for any table with SQL commands are not available statistics. Using the old (outdated) statistics could pose a greater danger than the estimation of statistics at runtime. But we must remember that the statistics change frequently and may not be desirable (especially for mission critical systems used by many users). Always back up statistics with DBMS_STATS procedures. EXPORT_SCHEMA_STATS before re-analysis.

Analysis of large tables and indexes associated with the COMPUTE option takes time, requires a lot of CPU, input-output system, and temporary tablespaces. It is often unnecessary. Analysis of a consistent value (for example ESTIMATE 3%) usually allows the optimizer to undertake optimal decisions.
The combination of the information received by the principles of selectivity with other information about the system input-output database allows the optimizer to calculate the cost of an execution plan.

EXPLAIN PLAN option cost-based optimizer

Oracle can provide information on the cost of a query when you use EXPLAIN PLAN feature. EXPLAIN PLAN feature can be used to display the calculated cost of execution after certain extensions. In particular, the COST column of the table contains the value of the plan, which reflects a decrease or increase the relative cost of a query.

For example:

EXPLAIN PLAN FOR SELECT COUNT (*)
FROM winners, horses WHERE winners.owner = horses.owner
Winners.horse_name AND LIKE 'Mr.%';
COLUMN 'SQL' FORMAT A56
SELECT lpad ('', 2 * level) | operation | |''
| [Options '' jobject_namej | decode (OBJECT "TYPE, • '•'
1 ('| | v ect_type | |') ') "SQL" cost "Cost", cardinality "Num Rows" FROM plan_table
CONNECT BY PRIOR id = parent_id START WITH id = 0;

By manipulating the optimizer cost effective (for example, through the included instructions, by creating or deleting indexes, by adjusting how are analyzed indexes or tables)

You can see the difference in the cost of performance calculated by the optimizer.

It used EXPLAIN PLAN to compare different versions of the query and select the version with the lowest relative cost.

In order to achieve the absolute best performance in many centers analyzed most of the tables and indexes, and a small number of tables, which are used separately, are not subject to analysis. This is usually to provide rules-based behavior of the tables which are not analyzed. However, the important issue is to ensure that the tables were Connectors Not analyzed with tables analyzed.

Tuning the database before entering the production phase

The main dilemma associated with the use of cost-based optimizer (CBO) is a SQL tuning before entering the database in production phase. Most of the development and test database contains much less data than the production database. It is therefore very likely that the CBO will take different decisions on the execution plan. Many sites can not afford the cost and inconvenience of copying the production database to the pre-production.

Oracle8i and later systems offer a variety of mechanisms to solve this problem, and the DBMS_STATS package and the mechanism of scenarios (outlines).

Internal cost-performance optimizer

In contrast to the rule-based optimizer cost optimizer does not have rigid rules that allow quick identification of execution path. CBO is a tool flexible and can adapt to the working environment. Such adaptations are possible if they are refreshed (re-examined) statistics concerning the required objects. Constant is the only method by which the cost optimizer calculates every possible plan and evaluate its cost (efficiency).

Operation can be cost-based optimizer (informal) break down into the following steps:


  1. Parsing SQL statements (syntax checking, powers possessed by objects, etc.).
  2. Create a list of all possible execution plans.
  3. Calculate (estimate) the cost of each execution plans using available statistics objects.
  4. Selecting the execution plan with the lowest cost.


Cost optimizer is used when even one table with SQL command has statistics (statistics for tables niezanalizowanych are estimated). If there are no statistics available for any of the tables that appear in the command, the database system uses rule-based optimizer, unless forced to use a cost-based optimizer earlier guidance levels of command (hints) or option ALL_ROWS or FIRST_ROWS optimizer.
Understanding of the cost-based optimizer - and consequently its ability to use - requires an understanding of how it is "thinking."

Equality primary key and (or) a unique index

The selectivity of a unique index (including in its definition of the UNIQUE keyword) is 100%. No other method of access to data using an index could not be better. Therefore, a unique index is used whenever possible.

Equality non-unique Index

If indexes are calculated  non-unique their selectivity. Cost optimizer assumes that the table (and the corresponding indices) has data distributed evenly, unless the ANALYZE command option is used INDEXED COLUMNS FOR ALL. This option causes the optimizer receives information about possible asymmetry data in indexed columns.

Calculation of the range

In the case of execution plans for ranges of index values ​​calculated their selectivity. These calculations are based on the values ​​of the upper and lower recently created statistics. Cost optimizer again assumes a uniform distribution of the data in the table (and the relevant index), except that during the process of analyzing the table option is specified INDEXED COLUMNS FOR ALL.

Calculation based on the scope of bind variables

In this case, the execution plans for a selectivity index ranges guessed. Before the introduction of Oracle9i - due to the fact that the values ​​of bind variables were not available at the time of parsing (values ​​are passed to the cursor after determining the execution plan) - optimizer can not make decisions based on these values. According to a rule of thumb therefore assumed 25% selectivity for any range of bind variables (for example, WHERE dept_no =: b1) and 50% selectivity to the limited range (for example, WHERE dept_no>: b1 AND dept_no

Cost-based optimizer

Cost-based optimizer is a more sophisticated tool than rule-based optimizer. In order to determine the best path execute a command instead of rigid rules it uses information about the database, such as table size, number of rows, the width of the key, etc.

Information required by the optimizer cost will only be available after for the table ANALYZE command or DBMS_STATS package using. If the table has not been analyzed, in order to select the best path for data access cost optimizer can only use logic rules. You can make the scenario combining cost and rule-based behavior to analyze only by the tables

NOTE

ANALYZE command and DBMS_STATS package features collect statistics on tables, clusters and indexes and keep them in the data dictionary (data dictionary).

By default, the SQL command is subject to cost optimization, if at least one table in it used was analyzed. Cost optimizer then takes the information from the data dictionary appropriate decisions in order to determine the best path for the other tables.

The kernel of a database management system uses a cost-based optimizer when:
  • init.ora file contains the following entry OPTIMIZER_MODE = CHOOSE, and there are statistics for at least one of the tables that appear in the command;
  • issued the command ALTER SESSION SET OPTIMIZER_MODE = CHOOSE and created statistics for at least one table linked to the executed command;
  • issued the command ALTER SESSION SET OPTIMIZER_ MODS = FIRST_ROWS (or ALL_ROWS) and set up statistics for at least one table linked to the executed command:
  • the executed command uses the FIRST_ROWS or ALL_ROWS guidelines (for example, SELECT /*+ FIRST_ROWS */ ...).

ANALYZE command

The way the tables are analyzed, it can have a huge impact on the performance of SQL statements. If the database administrator forget about analyzing tables or indexes after restoring the table, the effect of such negligence on the performance can be very negative. If the administrator performs the analysis at the weekend, can be achieved by the new threshold and Oracle will change the plan. The new plan will usually be accounted for improvement, but sometimes it can be worse than the last.

It's hard to emphasize strongly enough that if all SQL commands have been fine-tuned, it should not be carried out only for the sake of the analysis of the analysis. In one of the centers, in which the author conducted tuning SQL statements drew critical data in less than a second. However, an administrator at the weekend carried out the analysis of the belief that the path performance will be further improved. No wonder that one day, it turned out that the response time is lengthened to 310 seconds.

If we want to carry out the analysis often is before the re-analysis of the use of DBMS_STATS procedures.

EXPORT_SCHEMA_STATS to backup the existing statistics. This gives you the ability to restore previous statistics when something goes wrong.

During the analysis, the system can tell Oracle to look through all the rows in the table (ANALYZE_COMPUTE) or only some of the rows (ANALYZE_ESTIMATE). In a typical situation, by using the option ANALYZE_ESTIMATE for very large tables (l 000 000 or more rows) and ANALYZE_COMPUTE for small and medium-sized tables.

Highly recommended to use the analysis options INDEXED COLUMNS FOR ALL for each table, which can be characterized by a strong asymmetry in the data. For example, if a large percentage of rows in the table has the same value in a column, this is just an example of asymmetry. If you use the FOR ALL INDEXED COLUMNS makes the optimizer cost beyond the cardinality (number of distinct values) will also have information about the asymmetry of the data in the column.

When analyzing the table using the ANALYZE command are analyzed and all related indexes. If the index is then removed and restored, must be re-analyzed. Keep in mind that DBMS_STATS procedures. DBMS_TABLE_STATS GATHER_SCHEMA_STATS and analyze only the default tables (not the index). Using this procedure, for indexes use the CASCADE => TRUE, so that they were also analyzed.

The following are some examples of using the ANALYZE command:

ANALYZE TABLE ESTIMATE STATISTICS SAMPLE EMP 5 PERCENT FOR ALL INDEXED COLUMNS;

ANALYZE INDEX ESTIMATE STATISTICS SAMPLE EMP_NDX1 5
PERCENT FOR ALL INDEXED_COLUMNS;

ANALYZE COMPUTE STATISTICS EMP TABLE FOR ALL INDEXED
COLUMNS;

If you analyze a table by mistake, you can remove these statistics.

For example:

ANALYZE TABLE EMP DELETE STATISTICS;

The analysis process can take a very long time if you use COMPUTE option for large objects. It turns out that almost on every occasion the command ANALYZE ESTIMATE 5 PERCENT for a large table makes the optimizer to take the same decision as using ANALYZE COMPUTE.