DB2 : Cost Calculation 2 : index only

Index

Why indexes
Searching through millions of rows can be the right thing to do (TBSCAN). It might be the only option available to get the data you need.But in a lot of cases we need smarter ways to get to our data. If you need only one row,it seems silly to process thousands of pages to look for that one row. Then indexes came along: a bit of the data in a table but then already ordered for fast lookup.

What an index looks like, and how they work, is not the scope of this article, but I think that an article will come.

Suffice it to say that we are talking about BTree indexes.

Setup
Following table is created for testing:

create table ind1 (n1 int not null primary key, 
                   n2 int, 
                   n3 int, 
                   t varchar(100));

insert into ind1 
  WITH engine (rownum) AS (
                            SELECT 1 AS rownum 
                            from   sysibm.sysdummy1
                            UNION ALL
                            SELECT rownum + 1 AS rownum
                            FROM   engine
                            where  rownum<100000
                          )
  SELECT rownum, 
         case when rownum<50000
              then 1
              else 2
         end,
         round(random()*100000),
         rpad('t', 100, 'x')
  FROM   engine;

commit;

runstats on table ind1 and detailed indexes all;

You can see that the column n1 is a primary key. This means:
- not null
- unique

There are no NULL values for n1, and every value in column n1 is unique. This uniqueness is supported by an index. Since I did not create this index myself (DB2 created it for me) it has a DB2 generated name.

db2 "select cast(indschema as varchar(20)) schema, 
            cast(indname as varchar(20)) name,
            decode(uniquerule,
                   'D', 'Duplicates',
                   'U', 'Unique',
                   'P', 'Primary') uniquerule,
            nleaf,
            nlevels
     from   syscat.indexes 
     where tabname='IND1'"

SCHEMA          NAME                 UNIQUERULE NLEAF        NLEVELS
--------------- -------------------- ---------- ------------ -------
SYSIBM          SQL220220161847690   Primary             417       3

  1 record(s) selected.

OKay....we're all set up. Let's do some explaining.

Unique Scan (index acces only)
Let's select one n1 : say for n=10000.
This means that all information can be retrieved from the index.

For good measure....I first look at the cost whitout the index. I quickly created the same table whithout the index as follows:

create table ind2 (n1 int not null, 
                   n2 int, 
                   n3 int, 
                   t varchar(100));

insert into ind2 select * from ind1;

commit;

(No primary key, so no DB2 generated index)

The query:

[db2inst1@broccoli indscn]$ db2 "explain plan for select n1 from ind2 where n1=10000" 
DB20000I  The SQL command completed successfully.

The explain:

[db2inst1@broccoli indscn]$ db2exfmt -1 -d tinus -o expl1.txt
DB2 Universal Database Version 11.5, 5622-044 (c) Copyright IBM Corp. 1991, 2017
Licensed Material - Program Property of IBM
IBM DATABASE 2 Explain Table Format Tool

Connecting to the Database.
Connect to Database Successful.
Output is in expl1.txt.
Executing Connect Reset -- Connect Reset was Successful.
[db2inst1@broccoli indscn]$

In the outputfile we find the graph of the execution path:

Access Plan:
-----------
        Total Cost:             2857.2
        Query Degree:           1


      Rows
     RETURN
     (   1)
      Cost
       I/O
       |
        1
     TBSCAN
     (   2)
     2857.2
      3228
       |
     100000
 TABLE: DB2INST1
      IND2
       Q1

In DB 2 : Cost calculations 1 : Table Scan we saw that the cost of retrieving one page is about 6,7706 when DB2 will not use prefetch and around 0,8914 if DB2 is using prefetch. The cut is at 32 pages. Since we need to read a lot more then 32 pages we can safely assume that prefetch is used. So I would say the cost for this operation is:3228 * 0,8914=2877 (well not exactly what DB2 suggests (2857.2), but quite close!

So when the cost of using an index to get this one value is lower then 2877 I quite well understand why we are using the index!

Let's see (query and explain):

[db2inst1@broccoli indscn]$ db2 "explain plan for select n1 from ind1 where n1=10000" 
DB20000I  The SQL command completed successfully.

[db2inst1@broccoli indscn]$ db2exfmt -1 -d tinus -o expl2.txt
DB2 Universal Database Version 11.5, 5622-044 (c) Copyright IBM Corp. 1991, 2017
Licensed Material - Program Property of IBM
IBM DATABASE 2 Explain Table Format Tool

Connecting to the Database.
Connect to Database Successful.
Output is in expl2.txt.
Executing Connect Reset -- Connect Reset was Successful.
[db2inst1@broccoli indscn]$

What does the graph tell us:

Access Plan:
-----------
        Total Cost:             13.5334
        Query Degree:           1


        Rows
       RETURN
       (   1)
        Cost
         I/O
         |
          1
       IXSCAN
       (   2)
       13.5334
          2
         |
       100000
   INDEX: SYSIBM
 SQL220220161847690
         Q1

We see that the execution only hits the index. The table itself is not involved.
And...we only do 2 IO's. Two IO's is to less to prefetch so I imagine the cost for 1 IO is about 6,7706 (see : DB2 : Cost calculations 1 : Table Scan). So 2 * 6,7706 = 13,5412....quite close : the DB2 estimates the cost to be 13.5334. That's close enough 🙂

Cost of getting 1 value out of the table (which actually is in the index) is 13,5334, compared to the cost of getting that same value whithout an index being 2857,2. I say that I do understand (for this query) why DB2 chooses a path using the index.

Tags: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *