Re: Assistance avoiding a full table scan
Brent,
I tried this and it definitely boosted performance. On a test query
that would take 2+ seconds to run with 20 id's - it ran in 0.002
seconds.
Thanks everyone for your help and comments.
Erik
On Sep 21, 2007, at 2:01 PM, Brent Baisley wrote:
> As others have mentioned, mysql doesn't handle IN queries efficiently. > You can try changing it to using derived tables/subqueries. I did some > quick tests and the explain shows a different analysis. > > select comment, gid, date_posted from tbl > JOIN
> (select max(id) as mid > from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid) as > maxids > ON tpl.id=maxids.mid; > > You're really just changing one of the IN statements to a join. My > quick tests showed that a full table is still being done, but it's now > on the derived table (maxids), which would only be as large as how > many gids you are searching on (50?). > I tested this on two related tables, 170K in one and 90K in the other. > Your tables are currently much smaller, so mysql may come up with a > different execution path. For me, the join syntax was far faster. > > > On 9/21/07, Erik Giberti <erik@af-design.com> wrote: >> Hello everyone, >> >> The app server in this case is PHP, and the database is MySQL 5.0.22 >> on RedHat linux >> >> I've got a database with about 7.5K records in it that I expect to >> start growing very quickly ~10-12K records per day. The storage >> engine is InnoDB. This table is growing quickly and will continue to >> grow for a long time. This table stores comments (as you can see from >> the structure) and is being used to display a list of comments based >> on a users affiliations. >> >> The structure is approximately this - I'm leaving out unrelated >> columns: >> >> id int - primary key - auto increment >> gid bigint - indexed >> comment varchar >> date_posted timestamp >> >> I run a query with the following form >> >> select comment, gid, date_posted from tbl where id in (select max(id) >> from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid); >> >> I have an index on gid and id is the primary key >> >> When I describe the query with about 50 gid values inserted (where >> indicated above) I get the following: >> >> +----+--------------------+-------+-------+---------------+--------- >> +---------+------+------+--------------------------+ >> | id | select_type | table | type | possible_keys | key | >> key_len | ref | rows | Extra | >> +----+--------------------+-------+-------+---------------+--------- >> +---------+------+------+--------------------------+ >> | 1 | PRIMARY | tbl | ALL | NULL | NULL | >> NULL | NULL | 7533 | Using where | >> | 2 | DEPENDENT SUBQUERY | tbl | range | idx_gid | idx_gid | >> 9 | NULL | 58 | Using where; Using index | >> +----+--------------------+-------+-------+---------------+--------- >> +---------+------+------+--------------------------+ >> >> Running the query on a production machine with sufficient memory and >> horsepower (box is only 20% utilized) it still takes 3 seconds to run >> - obviously not quick enough for web use. >> >> What I really need is the most recent comment from each group based >> on a variable set of gid's that change from user to user. >> >> Any thoughts on how to tweak this to avoid the full table scan? Thank >> you in advance for your assistance. >> >> Erik Giberti >> >> >> -- >> MySQL General Mailing List >> For list archives: http://lists.mysql.com/mysql >> To unsubscribe: http://lists.mysql.com/mysql? >> unsub=brenttech@gmail.com >> >> > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql?unsub=giberti@mac.com >
--
MySQL General Mailing List
For list archives:
http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?unsub=lists@pantek.com
Received on Wed Sep 26 18:41:20 2007
This archive was generated by hypermail 2.1.8
: Sun Oct 07 2007 - 10:12:22 EDT
|