Here are some
very simple yet powerful SQL tips to remember.
Avoid using the
following:
Boolean operators >, <, >=, <=, is
null,is not null.
Not in, !=
Like '%pattern', not exists
Calculations on unindexed columns or (use
union instead)
Having (use a WHERE clause instead)
Do use the following:
• Enable aliases
to prefix all columns
• Place indexed
columns higher in the WHERE clause
• Use SQL Joins
instead of using sub-queries
• Make the table
with the least number of rows the driving table by making it first in the FROM
clause
Other important
points for SQL Tuning:
• Establish a
tuning environment that reflects your production database.
• Establish
performance expectations before you begin
• Always Design
and develop with performance in mind
• Create Indexes
to support selective WHERE clauses and join conditions.
• Use
concatenated indexes where appropriate
• Consider
indexing more than you think you should, to avoid table lookups.
• Pick the best
join method.
• Nested loops
joins are best for indexed joins of subsets.
• Hash joins are
usually the best choice for "big" joins.
• Pick the best
join order.
• Pick the best
"Driving" table.
• Eliminate rows
as early as possible in the join order
• Use bind
variables. Bind variables are key to application scalability.
• Use Oracle
hints where appropriate.
• Compare
performance between alternative syntax for your SQL statement.
• Consider
utilizing PL/ SQL to overcome difficult SQL tuning issues.
• Consider using
third party tools to make the job of SQL tuning easier.
• Never do a
calculation on an Indexed column (e.g., WHERE salary*5 > :myvalue).
• Whenever
possible, use the UNION statement instead of OR conditions.
• Avoid the use
of NOT IN or HAVING in the WHERE clause. Instead, use the NOT EXISTS clause.
• Always specify
numeric values in numeric form and character values in character form (e.g.,
WHERE
emp_number =
565, WHERE emp_name = ‘Jones’).
• Avoid
specifying NULL in an indexed column.
• Avoid the LIKE
parameter if = will suffice. Using any Oracle function will invalidate the
index, causing a full-table scan.
• Never mix data
types in Oracle queries, as it will invalidate the index. If the column is
numeric,
remember not to
use quotes (e.g., salary = 50000). For char index columns, always use single
quotes (e.g., name = ‘BURLESON’).
• Avoid using
subqueries when a JOIN will do the job.
• Use the Oracle
“decode” function to minimize the number of times a table has to be selected.
• If your query
will return more than 20 percent of the rows in the table, a full-table scan
may be better than an index scan.
• Always use
table aliases when referencing columns.
• To turn off an
index you do not want to use (only with a cost-based optimizer), concatenate a
null string to the index column name (e.g., name||') or add zero to a numeric
column name (e.g., salary+0).
With the
rule-based optimizer, this allows you to manually choose the most selective
index to service your query. These rules may seem simplistic but following them
in a diligent manner will generally relieve more than half of the SQL tuning
problems that are experienced:
Remember that
Oracle’s rule-based optimizer looks at the order of table names in the FROM
clause to determine the driving table. Always make sure that the last table
specified in the FROM clause is the table that will return the smallest number
of rows. In other words, specify multiple tables with the largest result set
table specified first in the FROM clause.
SQL Query Tuning
Tips
Efficient SQL
statements play a very important role in maintaining the database. They not
only improve the performance of the database but also help in reducing the
network traffic. The following can be stated as some of the tips for tuning SQL
statements used to retrieve data from the database:
1. Know your
application and business data well: Before writing our SQL statements, we need
to familiarize ourselves with different business data sources. We must be aware
of the data volume and its distribution in the database along with thorough
understanding of the business data model – such as relationships among business
entities.
2. Test your
queries with realistic data: After writing an SQL statement, we must test it in
an environment which reflects the production database. The behavior of the SQL
statement depends a lot on the environment wherein it is tested.
3. Write
identical SQL statements in your applications: Use bind variables, stored
procedures and packages whenever possible. The benefits of using identical SQL
statements include reduced memory use on the database server and faster
execution, as parsing is not done.
4. Use indexes
carefully on the table: Be sure of creating the necessary indexes on the table
used in the SQL statement. But creating too many of them may result in
performance degradation of the server. Try to follow the below rules in
creating indexes on a table:
1. Create
indexes on the columns which are frequently used in the WHERE clause of the
application SQL or the queries given by the end users.
2. Index those
columns which are frequently used in joining tables in SQL statements.
3. Use only
index columns that select small percent of rows with the same value.
4. Do not index
the columns which are used with functions and operators in the WHERE clause of
the SQL statement.
5. Do not index
those columns which get modified frequently as the index maintenance will
become an issue during execution of DML statements on the table.
6. Unique
indexes are better than non-unique indexes due to better selectivity. Create
unique indexes on PK columns and non-unique indexes on FK columns and the
columns frequently used in the WHERE clause.
7. Create index
so that the column used in the WHERE clause make up a leading portion of the
index.
5. Make an
indexed path available to SQL: Sometimes even after creating the indexes, they
won’t get used in the execution of the query. One of the different ways of
making the indexed path available to the optimizer is to use SQL hints.
6. Use Explain
Plan and TKPROF wherever possible: These are the tools that come along with
Oracle server and help in understanding the execution of our SQL statements.
Explain Plan – helps us to know the access path the optimizer opted. TKPROF –
shows the actual performance statistics of the SQL statement.
7. Understand
the Optimizer: SQL statement can be executed using rule-based optimizer or
cost-based optimizer. In older applications, Oracle has used rule-based
approach whereas in newer applications, oracle is promoting more of cost-based
approach. If you opt to use cost-based approach, then it is necessary to run
ANALYZE schema regularly as doing that stores the recent database statistics in
the data dictionary views which are used by the cost-based optimizer. The SQL
query can tuned only if cost based approach is opted for.
8. Think
globally when acting locally: Keep in mind that the modifications you do to
improve the
performance of
one SQL statement affects other SQL statements used by applications or other
users.
9. The WHERE
clause is crucial: The following can be said to be some of the cases where the
WHERE
clause will not
use index even if it is created on a column. The index is created on column
COL1 but it
wont be used in
the below illustrated examples.
1. COL1 < COL2
2. COL1 >
COL2
3. COL1 <=
COL2
4. COL1 >=
COL2
5. COL1 is null
– The index does not store null values hence when queries for null values index
won’t get
used
6. COL1 is not
null – The index does not store null values hence when queries for null values
index won’t
get used.
7. COL1 not in
(value1, value2...)
8. COL1 !=
expression1
9. COL1 like
‘%pattern1’ – index is not used in this case as the leading edge of the index
is suppressed;
but whereas if
queried as COL1 like ‘pattern1%’, then index is used as they would result in a
bounded
range index scan
10. Not exists
(subquery)
11. Expression1
= expression2 – All expressions, functions or calculations involving indexes
column
would prohibit
using the index on that column.
10. Use WHERE
clause instead of HAVING clause to filter the records : Using an indexed column
in
HAVING clause of
the query to filter records will result in full table scan and avoid the usage
on the index
on that column.
Instead first filter the records using WHERE clause to filter the records which
makes use
of the index on
the column. Below queries illustrate the usage of index, if an index is created
on DEPTID
column of EMP
table.
1. Select
dept_id, sum(salary) from emp group by dept_id having dept_id = 100;
2. Select
dept_id, sum(salary) from emp where
dept_id = 100 group by dept_id;
In case a) Query
will not use the index created on dept_id column whereas case b) query makes
use of
it.
11. Specify the
leading index columns: In case of a compound index, it is used in an SQL query
only if the
leading column
of it is used. Below queries are used to illustrate the same with an assumption
of a
compound index
(PART_NUM, PRODUCT_ID) existing on the table PARTS.
1. Select * from
parts where part_num = 2;
2. Select * from
parts where product_id = 1002;
Case a) would
use the index as the column used in the query WHERE clause is a leading column
of the
compound index;
whereas case b) query would not use the index. If incase of case b), you want
to make
use of index we
can re-write the query in the following way:
Select * from
party where part_num > 0 and product_id = 1002;
12. Evaluate
index scan v/s full table scans: If selecting more than 15 percent of the
records from a table, it is better to use/force a full table scan on the table
for such query. When using an index does more harm than good to the query, we
can use techniques the following to suppress the usage of
index.
1. Select /*+
full(a)*/ * from emp where salary = 5000;
A full hint
forces the full table scan on the table and avoids the usage of index on salary
column.
2. Select * from
emp where salary + 0 = 5000;
The salary
column in WHERE clause is modified to an expression which results in avoiding
the usage of index on the column.
3. Select * from
emp where salary = ‘5000’;
An implicit
conversion on the indexed column results in avoiding the index usage.
4. Select * from
emp where ss# ||’ ‘|| = ‘111-222-333’;
When the
percentage of table rows accessed is 15 percent or less, an index scan will
work better because it results in multiple logical reads per row accessed,
whereas a full table scan can read all the rows in a block in one logical read.
Thus, the performance of full table scan is better when accessing a large
percentage of rows from a table.
To illustrate this point, say the ANALYZE
command is issued against the EMP table and all its indexes. Oracle generates
the following statistics in the data dictionary table USER_TABLES and
USER_INDEXES:
Table Statistics:
NUM_ROWS = 1000
BLOCKS = 100
Index
Statistics:
BLEVEL = 2
AVG_LEAF_BLOCKS_PER_KEY
= 1
AVG_DATA_BLOCKS_PER_KEY
= 1
Based on these statistics, the
following would be the logical reads (block accessed) for different types of
scans:
Use of index to return one row = 3
(BLEVEL+(AVG_LEAF_BLOCKS_PER_KEY
- 1) + AVG_DATA_PER_KEY)
Use of index to
return all rows = 3000
(NUM_ROWS * Blocks accessed to return one row using index)
Full table scan = 100(BLOCKS)
13. Use an ORDER
BY clause for index scan: Oracle optimizer uses an index scan if the ORDER BY
clause is used on an indexed column. The query would retrieve ROWID from index
and access the table using the ROWID.
14. Know thy
data: You need to know the data and its distribution in your database server in
order to understand the usage of indexes on the columns and their execution.
You need to keep in mind that the SQL query performance varies as the database
grows and data distributions changes.
15. Know when to
use large-table scans: A full table scan provides better performance when
compared to an index scan on a very small table or a very large table. An index
scan on a very large table may require scanning of many indexes and table
blocks. When these blocks are brought to database buffer cache, they are kept
as long as possible. But as these are not needed for other queries, the hit
ratio of the database buffer declines resulting in performance degradation. The
blocks read by a full table scan are removed from database buffer cache much
earlier does not hamper the performance as an index scan on a very large table.
16. Minimize the table access (passes): Usually
by reducing the number of table passes in a SQL statement improves the
performance.
Here’s an example to illustrate the same:
The STUDENT table has four columns
named NAME, STATUS, PARENT_INCOME, and SELF_INCOME. The name is the primary
key. The values of the STATUS column are 0 for independent students and 1 for
dependent students.
The following query returns the
name and income for dependent as well as independent
students. It
results in two passes through the STUDENT table, creates a temporary table for
processing, and initiates a sort to eliminate duplicates:
SELECT NAME, PARENT_INCOME
FROM STUDENT
WHERE STATUS = 1
UNION
SELECT NAME,
SELF_INCOME
FROM STUDENT
WHERE STATUS =
0;
The same request
can be fulfilled by this query, which results in only one pass through the
table:
SELECT NAME, PARENT_INCOME*STATUS +
SELF_INCOME (1-STATUS)
FROM STUDENT;
17. Join tables
in the proper order: The order in which the tables are joined in a multiple
table join is crucial. Perform the most restrictive search first in order to
filter out the maximum number of rows and only a few records are processed in
further levels. You need to make sure that the driving table (master table in a
master-detail join) returns the least number of rows. For cost-based optimizer,
the order in which the tables appear in WHERE clause is irrelevant as it picks
the best execution plan on its own. We can use the ORDERED hint to control the
order in which
the tables are
accessed in the execution plan.
18. Use
index-only searches whenever possible: If possible, write the queries that use
index-only searches. The optimizer will have to search only index and not the
full table to satisfy the SQL resulting in improving the performance. For
example, a compound index exists on L_NAME and F_NAME columns
in EMP table.
1. Select f_name
from emp where l_name = ‘Smith’;
2. Select
f_name, salary from emp where l_name = ‘Smith’;
In Case a)
query, the optimizer performs an index-only search whereas in case b) query, the
optimizer performs a table search in order to get the data for a column –
salary which is not a part of the compound index. Hence we need to observe the
column list considered in the SELECT clause as well while writing a query.
19. Redundancy
is good: Provide as much information as possible in the WHERE clause of the
query. If in
a WHERE clause
of a query, there is COL1 = Col2 and Col1 = 10 then the optimizer infers that
COL2 = 10.
But at if there
is a condition like COL1 = COL2 and COL2 = COL3, then the optimizer does not
infer it as
COL1 = COL3.
20. Keep it
simple, stupid: Make your SQL statements as simple as possible as simple
statements yield better performance than a single complex SQL statement. The
lower the Explain Plan Cost of an SQL, the better the performance of the query.
Hence we need to try to tune our queries to reduce the cost as much as
possible.
21. You can
reach the same destination in different ways: In many cases, more than one SQL
can provide with the same desired results. Each query uses a different plan and
performs differently. For example,
the use of MINUS
operator may perform faster than the use of NOT IN or NOT EXISTS predicates. If
an
SQL involves an
OR in the WHERE clause, we can re-write the same by using a UNION. Hence we need
to evaluate all the possible queries for a requirement and choose the best
access path query.
22. Use the
special columns: Make use of the special columns like ROWID and ROWNUM in your
queries. ROWID searches are faster than searches performed on any columns. A
ROWID is not a constant in the database; hence do not hard-coded the value in
your SQLs and applications.
23. Use explicit
cursors over implicit cursors: Implicit cursors are opened by Oracle for
DELETE, UPDATE,
SELECT and
INSERT and use an extra fetch. Explicit cursors are opened by developers using
DECLARE,
PEN, FETCH and
CLOSE cursor statements.
24. Explore and
take advantage of the oracle parallel query option: Using the parallel query
option, you can execute SQL in parallel for faster performance. This parallel
query option can be used only in SMP and MPP systems containing multiple disk
drivers. The Oracle database has to be configured for this feature to be used
and SQLs specific for that feature should be written.
25. Reduce the
network traffic and increase throughput: Using array processing and PL/SQL
blocks can achieve better performance and reduce the network traffic. Array
processing allows a single SQL to process multiple rows. Using arrays in an
insert, we can insert 1000 rows in a table. Significance performance gain can
be achieved in client/servers and batch systems using this technique. Multiple
SQL statements
can cause heavy network traffic. However, if the SQL statements are in a single PL/SQL block, the entire block can be
sent to Oracle server, processed there and results returned to the application
running on the client.