DB 2 : Cost calculations 1 : Table Scan

Index

Why?
When you supply DB2 with a query DB2 has to make some serious decisions:
– Should indexes be utilized?
– Which indexes are most suitable?
– Which join operation should be used? (hash join/nested loop)
– Is all data needed at once or are the first 10 rows as quick as possible (the remainder comes later) okay?

The optimizer is the part of the DB2 engine which gives answers to all these questions. All possibilities are taken into account to determine an excution-plan for the specific query. To do this the optimizer considers a lot of possibilitiesand give them a weigth. The excution plan with the lowest weight (wich we will call cost) will be chosen as _the_ execution-plan for the query.

What is cost?
The optimizer needs a way to measure how long an operation takes. That can be I/O, that can be a calculation, it all costs time. The longer we have to wait, the more costly an operation is. We want to keep the cost as low as possible. The optimizer will try to find out an execution plan with lowest possible cost.

The total Cost is composed of I/O Cost and CPU Cost.
In the following examples I focus on the IO component of the cost. This is not to bad since I _only_ do Full Table Scan (TBScan)

As soon as we start to hit joins (hash join / nested loop / …) or sorts, the CPU Cost will increase, and I will also look at the CPU component of the cost.

Pages
DB2 is fetching pages from the datafiles!!! It is not fetching single rows, it fetches pages. If DB2 needs one single row it still fetches a whole page from the datafile. In-memory the exact row will be retrieved from the page.
So we should have a focus on pages (not rows) when discussing cost of I/O in a DB2 system.
I can easily imagine tables which are very narrow (a few columns of char or integer), or very width (a few varchar(255) columns).
The number of narrow rows in a page is much bigger compared to the number of width rows on a page. If rou need 1000 rows of a narrow table they may come from only 1 or 2 pages. For a very width table on the other hand 1000 rows may well be stored in 10000 pages. For DB2 the number of pages in respect to I/O is important, not the number of rows.

Demo Table
To play around we need to create some demonstration tables. I’ll just show one here.

drop table cost1;

create table cost1 (n1 int, n2 int, n3 int, t varchar(100));

insert into cost1 
  WITH engine (rownum) AS (
                            SELECT 1 AS rownum 
                            from   sysibm.sysdummy1
                            UNION ALL
                            SELECT rownum + 1 AS rownum
                            FROM   engine
                            where  rownum<10
                          )
  SELECT rownum, 
         round(rownum/1000,0), rownum%1000, 't'
  FROM   engine;

commit;

runstats on table cost1 and indexes all;

Thanks to recursive CTE (Common Table Expressions) I can create a table with an arbitrary number of rows. In the above example I´ll create a table with 10 rows.

For the purpose of this demonstration I actually created a few more tables (see below). After creating and filling the tables I collected statistics on them:

These table do not allready have indexes but that will come in future articles, and it’s my habit to collect basic statistics like this.

Look at some statistics

select cast(tabname as varchar(10)) tabname,
       npages,
       mpages,
       fpages,
       card,
       card/npages rows_per_page
from   syscat.tables
where  lower(tabschema)=lower('db2inst1')
       and lower(tabname) like lower('cost%');


TABNAME    NPAGES    MPAGES    FPAGES    CARD      ROWS_PER_PAGE       
---------- --------- --------- --------- --------- -------------
COST0              0         0         1         0            -1
COST1              1         0         1        10            10
COST2              2         0         2       100            50
COST3              9         0         9      1000           111
COST4             78         0        78     10000           128
COST5            776         0       776    100000           128
COST6           7756         0      7756   1000000           128

Now move on. Let’s see how the optimizer thinks we can get _all_ data from a table. There is more then 1 way to look at explain plans, but for the (quite simple) explain we perform now, the following will suffice:

db2expln -t -g -d tinus -q "select * from cost1"

Optimizer Plan:

    Rows   
  Operator 
    (ID)   
    Cost   
          
    10    
  RETURN  
   ( 1)   
  6.76828 
    |     
    10    
  TBSCAN  
   ( 2)   
  6.76828 
    |     
    10    
 Table:   
 DB2INST1 
 COST1

The above is just the graphical representation of the explain plan. db2expln actually gives you must more information.

-t : The statement is on one line (no separator)
-g : Also give a graphical representation of the execution plan
-d : The database on wich the explain operates
-q : The actual statement we want explained

– As you can see at the very bottom of the graph, the source of the rows is table cost1 which has 10 rows.
– The first operation is a TBSCAN and has a cost of 6.76828
– The first operation (the TBSCAN) has ID 2. In the other parts of the explain plan (which are not show here) this operation is labeled 2.
– The query is expected to return 10 rows and has a cost of 6,76828. The total cost consist of only the cost of the TBSCAN (which is almost completly I/O).

Table Cost0 Cost1 Cost2 Cost3 Cost4 Cost5 Cost6
Rows 0 10 100 1000 10000 100000 1000000
Pages 1 1 2 9 78 776 7756
Cost Table Scan 6.76747 6.76828 13.5412 60.9737 71.2083 691.727 6917.09
Cost/Page 6,76747 6,76828 6,7706 6,7749 0,9130 0,8914 0,8918

What is shown here that the cost increases if the number of pages increases. However if we look at the cost per page we see a ceratin drop in cost/page.

The cost is almost the same for the table with 1, 2 and 9 pages (around 6,77/page).  Then there is a decrease in the cost to 0,89/page.

At a certain point I need to start to tell about prefetching of pages, but that point is not yet. I’ ll suffice by telling that the prefetch page count is 32. Let’s close in to that 32 pages mark. Create a table with 3900 rows (31 pages) and 4000 rows (32 pages)

Table Cost7 Cost8
Rows 3900 4000
Pages 31 32
Cost Table Scan 210,053 28,5167
Cost/Page 6,7759 0,8911

The cut-off is 32 pages (the prefetch page count)

Just for good measure, also look at a table with a different width. We’ll make the varchar column a 100 characters width.

insert into cost2_1 
  WITH engine (rownum) AS (
                            SELECT 1 AS rownum 
                            from   sysibm.sysdummy1
                            UNION ALL
                            SELECT rownum + 1 AS rownum
                            FROM   engine
                            where  rownum<10
                          )
  SELECT rownum, 
         round(rownum/1000,0), rownum%1000, rpad('t',100, 'x') 
  FROM   engine;

The text column isn’t 1 character width anymore but 100. The row is much longer, so less rows fit on a page, and thus we need more pages for the same number of rows.

See what the tables look like now:

TABNAME     NPAGES       MPAGES       FPAGES       CARD         ROWS_PER_PAGE       
----------- ------------ ------------ ------------ ------------ -------------
COST2_0                0            0            1            0            -1
COST2_1                1            0            1           10            10
COST2_2                4            0            4          100            25
COST2_3               34            0           34         1000            29
COST2_7              131            0          131         3900            29
COST2_8              134            0          134         4000            29
COST2_4              334            0          334        10000            29
COST2_5             3335            0         3335       100000            29
COST2_6            33349            0        33349      1000000            29
Table Cost2_0 Cost2_1 Cost2_2 Cost2_3 Cost2_4 Cost2_5 Cost2_6
Rows 0 10 100 1000 10000 100000 1000000
Pages 1 1 4 34 334 3335 33349
Cost Table Scan 6,76747 6,76828 27,0724 35,0786 296,723 2946,83 29461,6
Cost/Page 6,76747 6,76828 6,7681 1,032 0,8884 0,8836 0,8834

Again, around the 32 pages mark, the cost/page drops dramaticly
And again for the small tablesthe cost is around 6,76 and for the tables with more then 32 pages the cost is around 0,88.

In a next article I would like to see how indices behave with respect to cost.

Tags: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *