|
|||||||||||
|
excessive time spent in "statistics" status
From: Lucio Chiappetti <lucio(at)lambrate.inaf.it>
Date: Fri Aug 31 2007 - 10:58:45 EDT
When I first had the problem I googled around and found some reference (which I've lost) saying that the "statistics" status is actually what one does with an EXPLAIN SELECT, and that this is done preliminarily to the actual query. It also said it might occur with a large number of joins because this analysis, for n joins MIGHT try up to n! combinations, unless one somehow specified the priorities (but the author did not remember how). I thought to have overcome the problem using a feature of the CREATE VIEW command (see below), but apparently I simply moved it to an higher n. Now I tried to see how it scales with the number of joins, and the curious things is that e.g. for n=9 it works fast, for n=15 it works slowly, for n=18 works fast again and for n=20 takes an infinite time. I'll first explain my background :
This table has columns named t0 t1 ... tn (containing the pointers t0.seq t1.seq ... for counterparts associated and validated according to some criteria) plus other service columns The simultaneous access was achieved in our interface by a mechanism we called virtual tables, which essentially was
SELECT
some subset of columns in some of the t0...tn
or some expression thereof
FROM
G left join t0 on G.t0=t0.seq
left join t1 on G.t1=t1.seq
...
left join tn on G.tn=tn.seq
We refer to the t0...tn as "the member tables" of G. We have different versions of G corresponding to different sets of member tables and different association criteria. The largest of our cases has 26 different members. Our mechanism was such that we defined a subset of columns in each of the ti (or expressions thereof like distances etc.) as "interesting", with an associated alias. Our interface usually showed only such "virtual columns", but had a possibility to add (naming them manually as ti.colname) to the SELECT also all other "member columns" normally hidden. We also allow to correlate a virtual table with a single physical table tk (be it member or not) using the t0tk correlation table (t0 is the "First Member").
create algoritm=temptable view V as
SELECT
some subset of columns in some of the t0...tn or in G
or some expression thereof
FROM
G left join t0 on G.t0=t0.seq
left join t1 on G.t1=t1.seq
...
left join tn on G.tn=tn.seq
de facto this doubles the joins, because the statement built is SELECT
left join t1 on G.t1=t1.seq
...
left join tn on G.tn=tn.seq
) left join V on G.seq=V.seq This statement NOW works (it did not work with e.g. 11 member tables before we switched to ALGORITHM=TEMPTABLE in the CREATE VIEW). An "explain select" for a query on such views gives that a view with n members "with members also enabled" involves 29+n queries (our [working] maximum is n=26 with 55 queries) I have also noticed that such "explain select" are relatively fast (0.3 sec) and are somehow cached (if I repeat one, it takes zero time)
This involves a query like this SELECT
(G left join t0 on G.t0=t0.seq
left join t1 on G.t1=t1.seq
...
left join tn on G.tn=tn.seq
) left join V on G.seq=V.seq
) on VTk.V = V.seq ) left join Tk on VTk.Tk=Tk/seq where the correlation table VTk is also a view
CREATE ALGORITHM=TEMPTABLE VIEW VTk as
select G.seq as V, G.Tk as Tk
which exploits the precomputed correlation among the "important" table t0 (the "First Member") and Tk. Note that t0Tk is indexed on (t0,Tk) but the view apparently is not indexed.
The query for "view with members also correlated with another table" in an "explain select" requires 33+n queries for n members. Now the funny thing is that up to e.g. 9 members "explain select" is rather fast (0.3 sec). For 15 members (48 queries) slows down significantly (6.3 sec) and is not cached (even if repeated soon still takes 6.3 sec). For 20 members it takes forever. But for 18 members (51 queries) instead takes only 0.5 sec. When I say 9,15,18,20 members I mean the first 9,15,18,20 of the full member list. Such a list is in an arbitrary order (sort of historical the various tables entered the database). Also I note that the output of "explain select" presents the various tables in a varying order. For instance the slow 15-member case but also the fast 9-member case have "<derived2>" (which is the 13000-element un-indexed VTk correlation view) interspersed with the other table, the fast 18-member case has it first. Questions :
Thanks in advance to whoever is able to give hints. -- ----------------------------------------------------------------------- Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy) For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html ----------------------------------------------------------------------- -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=lists@pantek.comReceived on Fri Aug 31 11:07:28 2007 This archive was generated by hypermail 2.1.8 : Sun Oct 07 2007 - 10:08:20 EDT |
||||||||||
|
|||||||||||