Often recurring problems in the case of cost-based optimizer
- The problem of asymmetry
- Analyzing data inadequate
- The common use of the joins Optimizers
- Choosing the wrong index
- Joining too many tables
- Inappropriate parameter settings in the file init.ora
The problem of asymmetry
Let us assume that the problem is a system in which there is a table with a cross bearing the name of the column status. Acceptable values are two columns: About to sign a transaction open (open transactions), which are not yet posted, and C to denote the transaction closed (closed transaclions) that have already been posted and do not require any maintenance. There are over a million records that have the status of C and always only 100 lines, which have the status of O.
Created the following SQL statement, which is performed every few hundred times, but the response time is not satisfactory:
SELECT acct_no, customer, product, trans_date, amt
FROM trans
WHERE status = 'O';
Response time: 16.308 seconds
In this example - taken from life - cost optimizer decided that Oracle should conduct a review of the entire table (full table scan). This is because the optimizer to have information on the number of different values that can take the STATUS field, but do not have information on the number of records with each of these values. As a result, the optimizer established uniform distribution of data (50/50) for each of the two values A and C. Under this assumption, the Oracle system shall review the entire table to retrieve data for open transactions.
Oracle will have information about the asymmetry of distribution of the data, the number of rows having a specific value in the indexed columns if during the ANALYZE command or DBMS_STATS package when you execute given an option FOR ALL INDEXED COLUMNS. Suppose now that the Status column has an index. In order to analyze the table, use the following command:
ANALYZE COMPUTE STATISTICS TRANS TABLE FOR ALL INDEXED COLUMNS
After analyzing the table and calculate statistics for all indexed columns, the optimizer will have cost information that only about 100 lines is the value of O, which makes the use of this column for the index. As a result, you'll get a much faster response times:
Response time: 0.259 seconds
Usually, the cost optimizer conduct a review of the entire table, if the column value is present in over 12% of the rows in the table, and use the index when the value is less than 12% of the rows. The choice made by the optimizer cost is not based on such a simple rule, but the practice shows that this is typical behavior.
Before the introduction of Oracle9i - if you used the variables assigned to - the problem of asymmetry could still occur even when the option is used INDEXED COLUMNS FOR ALL. Take a close look at the following command:
local_status: = 'O';
SELECT acct_no, customer, product, trans_date, amt
FROM trans WHERE status = local_status;
Response time: 16.608 seconds
The response time is similar to that which occurred in the case of non-use option FOR ALL INDEXED COLUMNS. The problem occurs because the cost optimizer does not know the value of a variable when determining trailing execution plan. In general - in order to avoid the problem of asymmetry is:
• literal values stored in the code directly (for example, you can use the WHERE STATUS = 'O' instead of WHERE
STATUS = local_status);
• Always perform the analysis with the option FOR ALL INDEXED COLUMNS.
If you are still experiencing performance problems associated with not using the optimizer cost index because dowiązanych variables, and you can not change the source code, is an attempt to remove the index statistics with the command:
ANALYZE INDEX
TRANS_STATUS_NDX
DELETE STATISTICS;
Deleting index statistics to improve the situation, because the behavior of forces applied by the rule-based optimizer, which always uses the existing index (revision instead of the entire table).
NOTE
In Oracle9i dowiązanych variable value is determined before deciding on implementation plan, which eliminates the need for direct recording in the code literal values.
Analyze the data unsuitable
By come into contact with a number of systems in which performance problems stemmed from the fact that the tables and indexes were analyzed at the time when the amount of data contained typical. Cost optimizer must have accurate information (including information on the volume of data) in order to determine an effective plan.
Situations in which statistics can be lost or become obsolete, can be re-creating the table or move, add or create a new index of the environment. For example, you forget to re-create the statistics after the transfer of the database schema to the production environment. Problems also occur when an administrator does not have enough information about the database, which manages and analyzes the table when it is empty, not when, after a short period of time, it has hundreds or thousands of rows.
How to confirm the date of the last analysis
In order to verify this, which tables, indexes and partitions were analyzed and when it was done the last time, you can query that retrieves the value of a column LAST_ANALYZED USER_XXX from different perspectives. For example, in order to determine the date of the last analysis all tables follow:
SELECT table_name, num_rows, last_analyzed FROM user_tables;
USER_TABLES addition there are many other perspectives, so you can check the date of the analysis of different objects. In order to obtain a complete list of prospects containing column LAST_ANALYZED perform the following query:
SELECT table_name
FROM all_tab_columns
WHERE column_name = 'LAST_ANALYZED';
Of course, the point is not to analyze the option COMPUTE perform as often as possible. Doing so may cause the SQL statement will be detuned.
During the analysis, the decision to re-analyze tables and indexes can be as dangerous as the adjustment indexes and ideally should be performed on a copy of the production database before the introduction of changes in the actual production database.
Peoplesoft software is an example of an application that uses temporary tables to store the data, with names ending in _TMP expression. When the start-up is not performed batch process, each of these tables is usually empty. During the execution of each stage batch process for the tables are made to insert and update operations data.
The last phase of the process is to insert the data into the main table Poplesoft transactional applications by extracting data from temporary tables. At the end of the batch process typically all rows are deleted from the temporary tables. Transactions related to these tables are not approved until the end of the process when there is no longer any data in them.
When it seems to ANALYZE the temporary tables, they are usually empty. When the cost optimizer receives information about the zero number of rows, it automatically takes a decision on the review of the entire table and use Cartesian join. To work around this problem by suggesting to fill the temporary table data for analysis. Then you can empty tables with the data and begin normal processing. Empty the table (TRUNCATE command) does not clear the statistics.
INSERT and UPDATE SQL statements used by the application to insert data into temporary tables can be checked using the procedure of tracing (tracing) the batch process that inserts and updates data. The same SQL commands can be used to fill the personal data table.
Using this approach the problem in one of the major centers in Australia, which benefited from Peoplesoft software, process, batch execution time dropped from 36 hours to less than 30 minutes.
If you analyze the tables that store temporary data including production volumes of data does not solve performance problems, consider removing the relevant statistics for these tables. This forces the use of the SQL statements that refer to those tables, the principles of rule-based optimizer. Statistics can be removed by using the DELETE command ANALYZE STATISTICS TableName. After their removal it is important to ensure that the tables were not used in joins with tables that have statistics. You should also ensure that the terms of niezanalizowanych tables were not used indexes with statistics. If the temporary tables are used separately and join are only among themselves, the preferred approach is often to use the principles of rule-based optimizer.
Common use of the joins Optimizers
As mentioned earlier, when the tables are junction and one of them will be analyzed, and the remaining tables do not, the optimizer works cost the least favorably.
Analyzing tables and indexes using DBMS_STATS procedures. GATHER_SCHEMA_STATS and procedures GATHER_TABLE_STATS be sure to include the option CASCADE => TRUE. By default DBMS_STATS package collects statistics only for tables. Having statistics for the tables, but not for the index, may also cause peeling by the optimizer cost inefficient execution plans.
One of the cases of this problem, with which he met author, took place in the table having niezanalizowaną trans and analyzed acct table. Administrator to remove the data again created a trans table, but forgot to do the analysis. The following example illustrates the performance of the implementation of the join of the two tables:
SELECT a.account_name, SUM (b.amount)
FROM trans b, and acct
WHERE b.trans_date> sysdate - 7
AND a.act_id = b.acct_id
AND a.acct_status = 'A'
GROUP BY account_name;
SORT GROUP BY
NESTED LOOPS
TABLE ACCESS BY ROWID ACCT
INDEX UNIQUE SCAN TABLE ACCESS FULL ACCT_PK TRANS
Response time: 410 seconds
The response time was significantly shorter after analyzing the trans table with the following command:
ANALYZE TABLE ESTIMATE STATISTICS trans
3 PERCENT SAMPLE
FOR ALL INDEXED COLUMNS
The new plan execution and response times were as follows:
SORT GROUP BY
NESTED LOOPS
TABLE ACCESS BY ROWID ACCT
INDEX UNIQUE SCAN ACCT_PK
TABLE ACCESS BY ROWID TRANS
INDEX RANGE SCAN TRANS_NDXl
Response time: 3.1 seconds
Choosing the wrong index
Cost optimizer selects the secondary index over time, even if it seems obvious that it should have used a different index. Take a close look at the following WHERE clause occurring in the software Peoplesoft:
where business_unit =: 5
and ledger = 6
and fiscal_year = 7
and accounting_period = 8
and affiliate = 9
and statisctics_code = 10
and project_id = 11
and account = 12
and currency_cd = 13
and deptid = 14
and product = 15
Peoplesoft system, from which the above example, had an index containing all the columns specified in the WHERE clause. It would seem that Oracle uses to execute the query is this index. However, the cost optimizer decided to use an index on columns (business_unit, ledger, fiscal_year, account). After playing the SQL command and execution time compared with the case using guidelines requiring the use of a larger index, it turned out that it is more than four times less than the execution time using the index selected by the optimizer.
Further studies have shown that the index should be created as a unique (UNIQUE), but in the process of deleting data and play table mistakenly created it as a non-unique. Of course, a four-time gain very happy user of the system.
But there are other problems. The same index was an ideal candidate for use in below the command that was one of the most frequently performed in the processing of data at the end of the month or the end of the year:
where business_unit = 5
and ledger = 6
and fiscal_year = 7
and accounting_period Between l and 12
and affiliate = 9
and statisctics_code = 10
and project_id = 11
and account = 12
and currency_cd = 13
and deptid = 14
and product = 15
Despite the establishment of the correct index as a unique, cost optimizer again it did not take into account. The only difference between the current and the previous command based on the fact that it covered the field more accounting periods (accounting period) for fiscal year (fiscal year), and not just one accounting period.
For the above WHERE clause used was the same as previously, with columns inadequate index (business_unit, ledger, fiscal_year, account). And again - after measuring the execution time of a command using the index chosen by the optimizer cost, and the index contains all the columns - it turned out that the latter provide at least three times faster performance.
Problem solved by column Reset items accounting_ period to the last position in the index (originally was on the third). The new index has the following form:
business_unit
ledger
fiscal_year
Affiliate
statisctics_code
project_id
account
currency_cd
deptid
product
accounting_period
Another way to force cost-based optimizer to use the index is to use one of the tips that will allow you to be identified. This is a good solution, but many sites use the packages provided by the developers, which can not be modified (and therefore can not be used tips). However, it is possible to create a perspective that contains the hint and give users permission to access this perspective. It will be useful if the SQL statement, the execution performance leaves much to be desired, is part of a report or a direct question, which can be read by the prospect.
In the end, sometimes it turns out that you can force the use of the index, if you remove the statistics. Sometimes you can also use the command ANALYZE ESTIMATE with only a core value of the analyzed 1,064 lines. It often happens that the plan will be changed to the desired, but this type of action is something of a,, casting ". Extremely important point is that using the" magic "of your actions carefully documented. Another method is to try to reduce the parameter OPTIMIZER_INDEX_COST_ADJ * (The value of this parameter set in the init.ora file - editor. Crowd) to a value in the range of 10 to 50
In summary you need to answer the question of why the optimizer cost take such inappropriate decisions. First of all - it should be noted that a bad decision on the execution plan is the exception rather than the rule. Examples of this subsection show that the columns are considered individually rather than collectively. If so, the first of the presented examples of cost optimizer would find - without the need to replace the index by the administrator as a unique - that each row has a unique value. The second example shows that if the number of columns in an index has a small number of different limit values, and the SQL command requires access to most of them, the cost optimizer often ignores the index. This is despite the fact that taken together the columns are well defined and the query returns few rows.
Little effect justifying optimizer should be noted that the use of a smaller number of index columns often produces a significant increase in exercise performance as compared to the indices of the plurality of columns.
Joining too many tables
The first versions of cost-based optimizer often used the method, divide and conquer "when złączaniu subject to more than five tables.
Query selects all the data associated with the undertaking of the account identifier (column acct_id) equal to 777,818. The company has several branches, and the query relates to a branch in the state of Washington (WA). Table A is a table acct, Table F is acct_address, and G is the address table.
Acct WHERE id = 77818
The user expects that the query returns a relatively small number of rows from different tables, and the response time will not exceed l seconds. It is best if Oracle receives acct_address rows from the table that corresponds to the calculus, and her connections with the address table to determine whether the address was correct at Washington.
However, due to the fact that such a junction is subject to multiple tables, cost optimizer will often decide that the tables will be processed F and G independently of each other and only at the end of the data to be merged.
The result of the join tables F and G will be that will have to be selected all the addresses that concern the state of Washington. This process can take a few minutes, which probably will result in the overall execution time will be much longer than that which would occur if Oracle control the access to all the tables from Table A.
Assuming that the table acct_address (F) has index acct_id column, you can solve the problem by using the appropriate clue instructing the optimizer cost that should be used for this particular index. This will greatly improve performance.
What's interesting - rule-based optimizer is often much more difficult for the proper determination of the execution plan for multiple tables złączania than the cost optimizer. Rule-based optimizer often do not use the table as a table control acct. To enforce the name in the FROM clause of table A place last.
If you use a ready-made software, the best way may be to create a perspective that contains a hint (if it is permissible and possible for your package).
Inappropriate parameter settings in the file init.ora
In many centers are using the pre-production database to test the performance of SQL execution before moving indexes and the code to the production database. Ideally, the database contains a number of pre-industrial production base close to the data, and the table is analyzed in exactly the same way as in the case of the production base. This pre-production base will often provide a copy of the actual data files, the production phase.
When administrators are testing the changes in a pre-production, usually everything works fine, but when you move to a production base sometimes it turns out that there are other plans peeled performance. The reason for these problems may be a different set of parameters in the init.ora file pre-production and production base.
I met one day below to update the data, the execution time amounted to four minutes, even though the condition in the WHERE clause contains reference the primary key table. Oddly - when the acct was taken from the table data (SELECT) and not updated (UPDATE) using exactly the same WHERE clause proved that the index has been used.
UPDATE SET proc_flag acct = 'Y'
WHERE pkey = 100;
# The response time was 4 minutes
# And the master key has not been used
Tried all means re-examining table, and finally removed her statistics. It turned out that when the rule-based optimizer was used, the command was executed efficiently.
After much analysis, it was decided to check the init.ora file parameters. It turned out that the COMPATIBLE parameter was set to system version 8.0.0, but work actually took place in Oracle 8.1.7. When you change the parameter to 8.1.7 and re-execution of the UPDATE, it turned out that the index is used properly this time and execution time of approximately O, l seconds.
COMPATIBLE parameter is the only one that requires the same settings in a pre-production and production to ensure identical operation cost-based optimizer.
These are:
SORT_AREA_SIZE
The number of bytes allocated to the session user to sort the data in memory. If the parameter is set to its default value of 64K, NESTED LOOPS (nested loops) will be preferred to SORT merges (join by sorting and merging) or HASH JOINS (hash join).
HASH_AREA_SIZE
The number of bytes allocated to the session user to perform joins in memory hash (hash joins). The default value is twice the SORT_AREA_SIZE. Hash join often do not work properly if this parameter is set to a value lower than l megabyte.
HASH_JOIN_ENABLED
Enables or disables use of hash joins. The default value is TRUE and usually does not require parameter settings.
OPTIMIZER_MODE
The possible values are CHOOSE, FIRST_ROWS or ALL_ROWS. CHOOSE value makes the optimizer cost is used if statistics are available. Value FIRST_ROWS works the same way, but to increase the preference for NESTED LOOPS instead of SORT MERGE or HASH JOINS. The value of ALL_ROWS causes preference SORT MERGE and HASH JOINS NESTED LOOPS before.
DB_FILE_MULTIBLOCK_READ_COUNT
Number of blocks that Oracle charges for each read from the table. If you specify a large value (eg 16 or 32), Oracle will in many cases chose FULL TABLE SCANS (review of the entire table) instead of NESTED LOOPS.
OPTIMIZER_MODE_ENABLE
Optimizer provides new features. For example, setting the parameter to 8.1.7 allows you to use all mechanisms available in Oracle version 8.1.7 (and earlier). This parameter can also automatically adjust other parameters, such as FAST_FULL_SCAN_ENABLED.
Major improvements have been introduced in different versions of the system are: 8.0.4 - nested loops arranged (ordered nested loops), fast full inspection (fast fuli session) 8.0.5 - many bug fixes optimizer; 8.1.6 - Improved support for histograms, partitions and nested loops; 8.1.7
- Improved support for partition and optimization of subexpressions; 9.0.1 - much improved index join, merge complex perspectives, improving to use bitmap indexes, and join subexpressions.
OPTIMIZER_INDEX_CACHING
This parameter, "he says," the percentage of Oracle system data indexed, which can be expected in the memory. Default value for this parameter is 0, and the range of values - from O to 100 The higher the value, the more likely it is to use NESTED LOOPS instead of SORT MERGE and HASH JOIN. observed in some systems to improve performance by setting it to the value of the 90th
OPTIMIZER_INDEX_ COST_ADJ
This parameter setting can increase the frequency of use of indexes. The default value is 100 Reducing it to 10 if the optimizer cost is lower the cost of using the index to 10% of the default. You can also specify the value as much higher than the 100 in order to force a SORT MERGE or HASH JOIN. In some places there was an increase performance by setting this parameter to a value derived from a range of 10 to 50 for OLTP systems * and 50 in the case of decision support systems. Reducing the value can speed up some queries OLTP, but at the same time makes the task execution time of a few hours can be significantly increased. Increasing this value can result in the opposite.
OLTP (On Line Transaction Processing) - Transaction Processing Systems
OLAP (On Line Analytical Processing) systems - Analytical Processing
STAR_TRANSFORMATION_ENABLED
Makes is used for the transformation of a star (star transformation) to merge bitmap indexes in the tables of facts *. This is a method different from the Cartesian join, which usually is the case for the optional use of data in the tables of the logical structure of a star (star schema).
Table facts (fact table) is the principal, normalized table for a star schema - the most frequently used logical structure of the database OLAP systems (data warehousing)
QUERY_REWRITE_ENABLED
Allows you to use index function (function-based in-dexes), as well as rewriting (rewrite) requests for prospects materialized. The default value is FALSE, which may explain why functional indexes are not used. Simply - to give parameter to TRUE.
PARTITION_VIEW_ENABLED
This parameter allows you to use the partition perspective. The default value is FALSE, so using outlook partition to give it a value of TRUE. Generally speaking - the perspective of the prospect of partition, which is a join table UNION ALL. They were introduced before the Oracle partitions and have been successfully used in many systems for archiving and increase productivity.
PARALLEL_BROADCAST_ENABLED
This parameter is used by the query in parallel (pa-rallel ąuery) when you are using a small table maps (lookup tables). Its default value is FALSE. When set to TRUE, the rows of small tables are sent to each child process to accelerate the implementation of MERGE JOIN and HASH JOIN when złączania a small table with a lot.
OPTIMIZER_MAX_PERMUTATIONS
It can be used to reduce the parsing time. However, reducing the number of permutations can cause inefficient execution plans specify, so in case of a default value for this parameter should not be changed.
CURSOR_SHAR1NG
When set to FORCE or SIMILAR can speed up parsing, reduce memory usage in the shared memory and reduce the amount of low-level lock (latch contention). This is achieved by modifying the command similar to the WHERE clause contains literals, variables using the command dowiązanych.
The default value is EXACT. Author suggests you set this parameter to SIMILAR Oracle9i system only if you are sure that the system there are many similar commands, which differ only in the values of letters. Much better to save your application code so that the variables used were assigned.
Setting the parameter to FORCE causes similar commands use the same memory space SQL, which can degrade performance. This value should not be used.
It is worth noting that the STAR TRANSFORMATION option will not work if this value will give the value of SIMILAR or FORCE.
ALWAYS_SEMI_JOIN
This parameter can greatly improve performance for applications that frequently use WHERE EXISTS option. Setting it to a value MERGE or HASH could cause the previously executed command for a few hours will end after several minutes. The default value is STANDARD, which means that your main (but not your child) controls the execution plan. If you select this parameter, the query will be asking the child control.
ALWAYS_ANTI_JOIN
This parameter changes the behavior of the command type NOT IN and can greatly speed up the process if you give it a value of HASH or MERGE. This setting will cause the merge will be executed or hash join instead of time-consuming Cartesian join, which is the case for standard execution commands such as NOT IN.
Please note that if any of these parameters has a different value in a pre-production and production, it is possible that the plans for the implementation of the SQL commands will vary. Identical values provide the same effect.