Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

excessive time spent in "statistics" status

From: Lucio Chiappetti <lucio(at)lambrate.inaf.it>
Date: Fri Aug 31 2007 - 10:58:45 EDT


I have some queries, involving a largish number of JOIN, which are apparently very slow or even take forever (a mysqladmin processlist shows them remain in the "statistics" status for a long time, in most cases I have to kill them after several minutes).

When I first had the problem I googled around and found some reference (which I've lost) saying that the "statistics" status is actually what one does with an EXPLAIN SELECT, and that this is done preliminarily to the actual query. It also said it might occur with a large number of joins because this analysis, for n joins MIGHT try up to n! combinations, unless one somehow specified the priorities (but the author did not remember how).

I thought to have overcome the problem using a feature of the CREATE VIEW command (see below), but apparently I simply moved it to an higher n.

Now I tried to see how it scales with the number of joins, and the curious things is that e.g. for n=9 it works fast, for n=15 it works slowly, for n=18 works fast again and for n=20 takes an infinite time.

I'll first explain my background :

  • I have a number of tables (let's call them t0, t1, t2 ...)
  • all of them have an auto_increment column called seq which is also an index
  • one table (t0) is "more important" (actually it is list of celestial X-ray sources while the other are celestial objects in other wavebands but this is irrelevant to you).
  • I have precomputed correlation tables among t0 and each of the other. These tables are called eg. t0ti, have two columns t0 and ti corresponding to the t0.seq and ti.seq of objects which are "associated". They are indexed on (t0,ti). Note that an object in t0 can be associated with 1 or more or zero (t0ti.ti null) objects in ti.
  • I originally (already under mysql 3) devised a way to identify counterparts in MORE tables (all these associations are based on spherical distance + other criteria). This involved creating a working table G

   This table has columns named t0 t1 ... tn (containing the pointers    t0.seq t1.seq ... for counterparts associated and validated according    to some criteria) plus other service columns

   The simultaneous access was achieved in our interface by a mechanism    we called virtual tables, which essentially was

     SELECT
      some subset of columns in some of the t0...tn
      or some expression thereof
     FROM
      G left join t0 on G.t0=t0.seq
        left join t1 on G.t1=t1.seq
        ...
        left join tn on G.tn=tn.seq    

   We refer to the t0...tn as "the member tables" of G.

Do you need help?X

   We have different versions of G corresponding to different sets    of member tables and different association criteria.

   The largest of our cases has 26 different members.

   Our mechanism was such that we defined a subset of columns in    each of the ti (or expressions thereof like distances etc.) as    "interesting", with an associated alias. Our interface usually    showed only such "virtual columns", but had a possibility to add    (naming them manually as ti.colname) to the SELECT also all    other "member columns" normally hidden.

   We also allow to correlate a virtual table with a single physical    table tk (be it member or not) using the t0tk correlation table    (t0 is the "First Member").

  • the above worked and still works, but has some clumsiness. When we upgraded to mysql 5 and discovered the CREATE VIEW command we decided to replace our virtual tables with views.
  • for each G we define a view as
     create algoritm=temptable view V as
     SELECT
      some subset of columns in some of the t0...tn or in G
      or some expression thereof
     FROM
      G left join t0 on G.t0=t0.seq
        left join t1 on G.t1=t1.seq
        ...
        left join tn on G.tn=tn.seq    
  • the "algorithm=temptable" was required because without it some of our queries (see below) entered in the "statistics" status forever already with 11 members
  • for the rest the VIEWs work nicely when used standalone and are easier for the user ...
  • ... but on the other hand they HIDE the member columns which are not explicitly named in CREATE VIEW (where one wants to keep a manageable number of columns). Hide means here that their names ti.colname cannot be used in SELECT !
  • so we devised an option by which on ticking on "show members also" one can also include these ti.colname in the query

   de facto this doubles the joins, because the statement built is

   SELECT
    list of (V.colname and ti.colname with i chosen among 0 and n)    FROM
    ( G left join t0 on G.t0=t0.seq

        left join t1 on G.t1=t1.seq
        ...
        left join tn on G.tn=tn.seq    

    ) left join V on G.seq=V.seq

   This statement NOW works (it did not work with e.g. 11 member tables    before we switched to ALGORITHM=TEMPTABLE in the CREATE VIEW).

   An "explain select" for a query on such views gives that a view with    n members "with members also enabled" involves 29+n queries

Do you need more help?X

   (our [working] maximum is n=26 with 55 queries)

   I have also noticed that such "explain select" are relatively fast    (0.3 sec) and are somehow cached (if I repeat one, it takes zero time)

  • the last step would be to make a query on a VIEW with "show members also" ticked, correlated with another table Tk (be Tk either a member, one of t1...tn, or a non-member)

   This involves a query like this

   SELECT
    list of (V.colname and ti.colname and Tk.colname)    FROM
   ( VTk left join
    (

     (G left join t0 on G.t0=t0.seq
        left join t1 on G.t1=t1.seq
        ...
        left join tn on G.tn=tn.seq    
     )  left join V  on G.seq=V.seq 

    ) on VTk.V = V.seq
   ) left join Tk on VTk.Tk=Tk/seq

   where the correlation table VTk is also a view

   CREATE ALGORITHM=TEMPTABLE VIEW VTk as     select G.seq as V, G.Tk as Tk
    from G left join t0Tk on G.t0=t0Tk.t0

   which exploits the precomputed correlation among the "important"    table t0 (the "First Member") and Tk. Note that t0Tk is indexed    on (t0,Tk) but the view apparently is not indexed.

  • now such a query on our largest table (26 members) hangs in statistics status forever. I experimented therefore trying to use a variable names of left joins (this to tell when it stops working ... in real life I *might* need all, or any of the member).

   The query for "view with members also correlated with another table"    in an "explain select" requires 33+n queries for n members.

Can we help you?X

   Now the funny thing is that up to e.g. 9 members "explain select"    is rather fast (0.3 sec).

   For 15 members (48 queries) slows down significantly (6.3 sec) and    is not cached (even if repeated soon still takes 6.3 sec).

   For 20 members it takes forever.

   But for 18 members (51 queries) instead takes only 0.5 sec.    

When I say 9,15,18,20 members I mean the first 9,15,18,20 of the full member list. Such a list is in an arbitrary order (sort of historical the various tables entered the database).

Also I note that the output of "explain select" presents the various tables in a varying order. For instance the slow 15-member case but also the fast 9-member case have "<derived2>" (which is the 13000-element un-indexed VTk correlation view) interspersed with the other table, the fast 18-member case has it first.

Questions :

  • what does "explain select" actually do and why sometimes hangs ?
  • can this be overcome rearranging the order of the joins (note that the bulk of the members are all joined with G), or introducing parentheses or with other syntax changes ?
  • or has it to do with some configuration parameter, maybe related to what is cached, cache size or other ?

Thanks in advance to whoever is able to give hints.

-- 
-----------------------------------------------------------------------
Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy)
For more info : 
http://www.iasf-milano.inaf.it/~lucio/personal.html
-----------------------------------------------------------------------

-- 
MySQL General Mailing List
For list archives: 
http://lists.mysql.com/mysql
To unsubscribe:    
http://lists.mysql.com/mysql?unsub=lists@pantek.com
Received on Fri Aug 31 11:07:28 2007
Can't find what you're looking for?X

This archive was generated by hypermail 2.1.8 : Sun Oct 07 2007 - 10:08:20 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library