Why
In the DB2 : Cost Calculation 2 : index only post I noticed something upsetting, well, at least it was upsetting to me.
We looked at the explain plan for looking up 1 value by an index. The index had nlevel=3.
So there is a root-node, some intermediate-nodes and finally finally some leaf-nodes.
That suggests to me that an index scan at least have to do 3 IO’s to get at the leaf-nodes. However…the plan suggests 2:
Access Plan: ----------- Total Cost: 13.5334 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 1 IXSCAN ( 2) 13.5334 2 | 100000 INDEX: SYSIBM SQL220220161847690 Q1
How on earth can there be only 2 IO’s.
Could it be that the optimizer is a bit off?
Can we maybe look at what is actually happening?
Yes we can…with db2caem (DB2 Capture Activity Event Monitor)
DB2caem captures what DB2 is actually doing when for instance executing a query. In the background db2caem is doing a lot of stuff you should otherwise be doing manually yourself. No worries for that now 🙂
Let’s go for it:
db2caem -d tinus -o /home/db2inst1/scripts/cost/indscn/db2caem/ -st "select n1 from ind1 where n1=100" ____________________________________________________________________ _____ D B 2 C A E M _____ DB2 Capture Activity Event Monitor data tool I B M The DB2CAEM Tool is a utility for capturing the activity event monitor data with details, section and values, as well as actuals. ____________________________________________________________________ ________________________________________________________________________________ DBT7041I "db2caem": The db2caem utility is connecting to the following database: "tinus". DBT7038I "db2caem": The db2caem utility successfully connected to the following database: "tinus". DBT7042I "db2caem": The SQL statement "select n1 from ind1 where n1=100" is being issued. DBT7043I "db2caem": The db2caem utility is disconnecting from the following database: "tinus". DBT7039I "db2caem": The db2caem utility successfully disconnected from the following database: "tinus". DBT7000I db2caem completed. Output path: "/home/db2inst1/scripts/cost/indscn/db2caem/DB2CAEM_2022-02-22-20.24.37.593503".
In the given directory db2caem places multiple output-files. At a later moment I might write a post about them. For now I just look at the file db2caem.exfmt.
This file looks quite similar to the output you get from formatting explain plans with db2exfmt. (also the name allready suggests so :-))
I take a look at the graph part:
Access Plan: ----------- Total Cost: 13.5334 Query Degree: 1 Rows Rows Actual RETURN ( 1) Cost I/O | 1 1 IXSCAN ( 2) 13.5334 NA | -1 NA INDEX: SYSIBM SQL220220161847690 Q1
Uhmmmm I can see that I now also have actual rows versus estimated rows. But…no I/O information. This did not help at all.
Now, when I scroll down to bottom of the file, I see:
Runtime statistics for objects Used in Access Plan: ----------------------------------------------------- Schema: SYSIBM Name: SQL220220161847690 Type: Index Member 0 --------- Metrics -------------------- object_index_l_reads:3 object_index_lbp_pages_found:3 object_index_gbp_indep_pages_found_in_lbp:3
Ah…information about the actual I/O’s performed.
I now see 3 (logical) I/O’s for the index, and that is exactly what I was aiming for!
My initial problem is solved, but I now have a small trust-issue with the optimizer 🙁
Maybe we can get rid of this trust issue when this series evolve (I really do not know right now)