B树索引
对于非唯一性索引,rowid和索引key组成唯一性。唯一性索引,oracle不将rowid加入索引的key。
索引压缩
可以对索引进行压缩,例如all_objects中有大量重复的owner,object_type的值。增加CPU的时间,减少I/O的时间。
create index t_idx on all_objects(owner,object_type,object_name);
create index t_idx on all_objects(owner,object_type,object_name) compress 1;
create index t_idx on all_objects(owner,object_type,object_name) compress 2;
Reserve Key indexes
对于那些邻近得值,如果对它们创建索引,索引也是序列递增的,非常可能存放在同一个块上,这样增加冲突的可能性。通过反转,可以让索引更好的分布。但是对于条件where X>5,X列上的反转索引就会失效。
SQL> select 90101, dump(90101,16), dump(reverse(90101),16) from dual
2 union all
3 select 90102, dump(90102,16),dump(reverse(90102),16) from dual
4 union all
5 select 90103, dump(90103,16),dump(reverse(90103),16) from dual
6 /
90101 DUMP(90101,16) DUMP(REVERSE(90101),1
---------- --------------------- ---------------------
90101 Typ=2 Len=4: c3,a,2,2 Typ=2 Len=4: 2,2,a,c3
90102 Typ=2 Len=4: c3,a,2,3 Typ=2 Len=4: 3,2,a,c3
90103 Typ=2 Len=4: c3,a,2,4 Typ=2 Len=4: 4,2,a,c3
降序索引
索引创建的时候,是按照索引字段的值升序排列,查询的时候排序因子的字段使用不同的排序方式(例如一个字段升序,一个字段降序),那么通过执行计划可以看到数据库会多执行一个Sort的步骤。这种情况下,可以建立降序索引。感觉类似于基于函数的索引。
SQL> CREATE TABLE T AS select * from all_objects;
Table created.
--建立多个字段的索引
SQL> create index t_idx on t(owner,object_type,object_name);
Index created.
--结果升序排列,使用索引,执行计划中无排序步骤
SQL> select owner, object_type
2 from t
3 where owner between 'T' and 'Z'
4 and object_type is not null
5 order by owner ASC,object_type ASC;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE)
--结果降序排列,使用索引,有排序步骤
SQL> select owner, object_type
2 from t
3 where owner between 'T' and 'Z'
4 and object_type is not null
5 order by owner DESC, object_type DESC;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (ORDER BY)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE)
--分析表
SQL> exec dbms_stats.gather_TABLE_stats( user, 'T' );
PL/SQL procedure successfully completed.
--分析表之后,结果降序排列,使用索引,无排序步骤
SQL> select owner, object_type
2 from t
3 where owner between 'T' and 'Z'
4 and object_type is not null
5 order by owner DESC, object_type DESC;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=7018 Bytes=10
5270)
1 0 INDEX (RANGE SCAN DESCENDING) OF 'T_IDX' (NON-UNIQUE) (Cos
t=7 Card=7018 Bytes=105270)
--两个字段排序方式不一致,有排序步骤
SQL> select owner, object_type
2 from t
3 where owner between 'T' and 'Z'
4 and object_type is not null
5 order by owner ASC,object_type DESC;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=34 Card=7018 Bytes=1
05270)
1 0 SORT (ORDER BY) (Cost=34 Card=7018 Bytes=105270)
2 1 INDEX (FAST FULL SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=4 C
ard=7018 Bytes=105270)
--建立降序索引
SQL> create index desc_t_idx on t(owner ASC,object_type DESC);
Index created.
--根据排序语句建立对应索引后,无排序步骤,使用降序索引
SQL> select owner, object_type
2 from t
3 where owner between 'T' and 'Z'
4 and object_type is not null
5 order by owner ASC,object_type DESC;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=7018 Bytes=10
5270)
1 0 INDEX (RANGE SCAN) OF 'DESC_T_IDX' (NON-UNIQUE) (Cost=7 Ca
rd=7018 Bytes=105270)
使用B树索引的两个原则
a. 如果访问的记录数占整个表纪录数百分比较少。
b. 如果索引包含足够的信息,而查询的时候需要再去查询标的数据块。
继续上面的例子:
--使用T-IDX索引,因为这个查询中多了object_name字段,通过T-IDX索引就不需要访问表的数据块,通过FAST FULL SCAN可以完成
SQL> select owner, object_type,object_name
2 from t
3 where owner between 'T' and 'Z'
4 and object_type is not null
5 order by owner ASC,object_type DESC;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=62 Card=7018 Bytes=2
73702)
1 0 SORT (ORDER BY) (Cost=62 Card=7018 Bytes=273702)
2 1 INDEX (FAST FULL SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=4 C
ard=7018 Bytes=273702)
--再增加一个查询字段created,这两个索引中都没有这个字段,所以都需要访问表的数据块。而,查询返回的结果占表纪录行的大部分(owner between 'T' and 'Z'),通过先访问索引再访问数据块,效率更低。Oracle选择全表扫描。
SQL> select owner, object_type,object_name,created
2 from t
3 where owner between 'T' and 'Z'
4 and object_type is not null
5 order by owner ASC,object_type DESC;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=108 Card=7018 Bytes=
329846)
1 0 SORT (ORDER BY) (Cost=108 Card=7018 Bytes=329846)
2 1 TABLE ACCESS (FULL) OF 'T' (Cost=40 Card=7018 Bytes=3298
46)
--限定只返回少量的纪录,发现又重新开始使用索引。
SQL> select owner, object_type,object_name,created
2 from t
3 where owner = 'T'
4 and object_type is not null
5 order by owner ASC,object_type DESC;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=31 Card=1046 Bytes=4
9162)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=31 Card=1046 By
tes=49162)
2 1 INDEX (RANGE SCAN) OF 'DESC_T_IDX' (NON-UNIQUE) (Cost=2
Card=1046)
当然上述两条原则并不是适用于任何情况,有许多因素影响到执行计划。下面这个例子说明数据的存储对索引的影响。通过建立两张表,一张无序存储,一张有序存储,来比较两种情况下使用索引所消耗的资源和时间。
--创建有序存储的表,相邻记录存储在同一个数据块
SQL> create table colocated ( x int, y varchar2(2000) ) pctfree 0;
Table created.
SQL> begin
2 for i in 1 .. 100000
3 loop
4 insert into colocated values ( i, rpad(dbms_random.random,75,'*'
) );
5 end loop;
6 end;
7 /
PL/SQL procedure successfully completed.
--同一个数据块储存数据不相邻
SQL> create table disorganized nologging pctfree 0
2 as
3 select x, y from colocated ORDER BY y
4 /
Table created.
--创建主健,同时也会创建索引
SQL> alter table colocated add constraint colocated_pk primary key(x);
Table altered.
SQL> alter table disorganized add constraint disorganized_pk primary key(x);
Table altered.
SQL> commit;
Commit complete.
SQL> set timing on
SQL> set autotrace traceonly
--对于有序存储的数据,查询的时候只有3000的逻辑I/O,时间为0.03秒
SQL> select * from COLOCATED where x between 20000 and 40000;
20001 rows selected.
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'COLOCATED'
2 1 INDEX (RANGE SCAN) OF 'COLOCATED_PK' (UNIQUE)
Statistics
----------------------------------------------------------
156 recursive calls
0 db block gets
2908 consistent gets
43 physical reads
0 redo size
1805701 bytes sent via SQL*Net to client
15162 bytes received via SQL*Net from client
1335 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
20001 rows processed
--对于无序存储的数据,查询的时候只有将近20000的逻辑I/O,时间为8秒。同样的数据和索引,相对而言,无序存储的数据使用索引查询消耗更多的时间。
SQL> select * from DISORGANIZED where x between 20000 and 40000;
20001 rows selected.
Elapsed: 00:00:08.00
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'DISORGANIZED'
2 1 INDEX (RANGE SCAN) OF 'DISORGANIZED_PK' (UNIQUE)
Statistics
----------------------------------------------------------
156 recursive calls
0 db block gets
21388 consistent gets
1119 physical reads
0 redo size
1805701 bytes sent via SQL*Net to client
15162 bytes received via SQL*Net from client
1335 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
20001 rows processed
--对于无序存储的数据,强制用全表扫描只需要0.04秒,说明全表扫描比使用索引更有效。那为什么Oracle执行计划中为什么不使用全表扫描来查询呢?在CBO的优化模式下,因为没有对表进行分析,Oracle并没有足够的信息来选择最优的执行路径。下面对表进行分析。
SQL> select /*+ FULL(DISORGANIZED) */ * from DISORGANIZED where x between 20000
and 40000;
20001 rows selected.
Elapsed: 00:00:00.04
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=105 Card=220 Bytes=2
23300)
1 0 TABLE ACCESS (FULL) OF 'DISORGANIZED' (Cost=105 Card=220 B
ytes=223300)
Statistics
----------------------------------------------------------
60 recursive calls
0 db block gets
2407 consistent gets
1 physical reads
0 redo size
1805701 bytes sent via SQL*Net to client
15162 bytes received via SQL*Net from client
1335 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
20001 rows processed
--分析表
SQL> set autotrace off;
SQL> set timing off;
SQL> analyze table colocated
2 compute statistics
3 for table
4 for all indexes
5 for all indexed columns
6 /
Table analyzed.
--分析表
SQL> analyze table disorganized
2 compute statistics
3 for table
4 for all indexes
5 for all indexed columns
6 /
Table analyzed.
--一旦分析完成,可以查询user_indexes表中的CLUSTERING_FACTOR字段的值。如果这个值接近数据块的数量,证明这张表是存储相当好。单个索引叶子节点的索引项通常指向同一个数据块。如果这个值接近数据行的数量,说明这张表是随机存储的。单个索引叶子节点的索引项通常不指向同一个数据块。如上的两张表中,表'COLOCATED_PK'的主健的CLUSTERING_FACTOR字段的值为1073,接近块的数量,存储有序。表'DISORGANIZED_PK'的主健CLUSTERING_FACTOR字段的值为99907,接近行的数量,存储无序。
SQL> select a.index_name,
2 b.num_rows,
3 b.blocks,
4 a.clustering_factor
5 from user_indexes a, user_tables b
6 where index_name in ('COLOCATED_PK', 'DISORGANIZED_PK' )
7 and a.table_name = b.table_name
8 /
INDEX_NAME NUM_ROWS BLOCKS CLUSTERING_FACTOR
------------------------------ ---------- ---------- -----------------
COLOCATED_PK 100000 1073 1073
DISORGANIZED_PK 100000 1076 99907
SQL> set timing on
SQL> set autotrace traceonly
--经过表分析之后,Oracle拥有足够的信息知道通过全表访问更有效,这个时候不用通过增加hint,Oracle会自动选择全表扫描的方式查询。
SQL> select * from DISORGANIZED where x between 20000 and 30000;
10001 rows selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=105 Card=9995 Bytes=
839580)
1 0 TABLE ACCESS (FULL) OF 'DISORGANIZED' (Cost=105 Card=9995
Bytes=839580)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1743 consistent gets
0 physical reads
0 redo size
903094 bytes sent via SQL*Net to client
7825 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10001 rows processed
2007年8月3日星期五
订阅:
博文评论 (Atom)
没有评论:
发表评论