Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

Re: Assistance avoiding a full table scan

From: Dan Buettner <drbuettner(at)gmail.com>
Date: Fri Sep 21 2007 - 13:07:29 EDT


Erik, I think the main reason your query is running slowly is the use of a subselect. MySQL does not generally perform well with subselects, though work continues in that area.

There is also a problem/situation in MySQL in that you can't use MAX/GROUP BY functions quite the way you can in other databases; you'll get an accurate MAX value for one column, but the value in another won't necessarily be from the same row. Someone posted on the list about this recently, calling it a bug, and I tend to agree.

To solve your problem:

I would take one of two approaches.

First approach: split it into two queries in PHP, and use the results of the first in the second, like so:

query1 = select max(id) from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid

in PHP: id_string = join the results with commas. implode function?

query2 = select comment, gid, date_posted from tbl where id in (id_string)

Do you need help?X

Generally speaking, fewer queries = higher performance, and databases are optimized to join tables, they do it well - but in your case I think you'll find one of these works better.

Second approach:

Insert values from first query into a temporary table, then join on that temp table in your second query.

I don't think either approach will have a speed advantage, and the first is probably easier to code.

HTH,
Dan

On 9/21/07, Erik Giberti <erik@af-design.com> wrote:
>
> Hello everyone,
>
> The app server in this case is PHP, and the database is MySQL 5.0.22
> on RedHat linux
>
> I've got a database with about 7.5K records in it that I expect to
> start growing very quickly ~10-12K records per day. The storage
> engine is InnoDB. This table is growing quickly and will continue to
> grow for a long time. This table stores comments (as you can see from
> the structure) and is being used to display a list of comments based
> on a users affiliations.
>
> The structure is approximately this - I'm leaving out unrelated columns:
>
> id int - primary key - auto increment
> gid bigint - indexed
> comment varchar
> date_posted timestamp
>
> I run a query with the following form
>
> select comment, gid, date_posted from tbl where id in (select max(id)
> from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid);
>
> I have an index on gid and id is the primary key
>
> When I describe the query with about 50 gid values inserted (where
> indicated above) I get the following:
>
> +----+--------------------+-------+-------+---------------+---------
> +---------+------+------+--------------------------+
> | id | select_type | table | type | possible_keys | key |
> key_len | ref | rows | Extra |
> +----+--------------------+-------+-------+---------------+---------
> +---------+------+------+--------------------------+
> | 1 | PRIMARY | tbl | ALL | NULL | NULL |
> NULL | NULL | 7533 | Using where |
> | 2 | DEPENDENT SUBQUERY | tbl | range | idx_gid | idx_gid |
> 9 | NULL | 58 | Using where; Using index |
> +----+--------------------+-------+-------+---------------+---------
> +---------+------+------+--------------------------+
>
> Running the query on a production machine with sufficient memory and
> horsepower (box is only 20% utilized) it still takes 3 seconds to run
> - obviously not quick enough for web use.
>
> What I really need is the most recent comment from each group based
> on a variable set of gid's that change from user to user.
>
> Any thoughts on how to tweak this to avoid the full table scan? Thank
> you in advance for your assistance.
>
> Erik Giberti
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=drbuettner@gmail.com
>
>
Received on Fri Sep 21 13:08:29 2007

This archive was generated by hypermail 2.1.8 : Sun Oct 07 2007 - 10:11:31 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library