Re: Cluster performance with data node down
Stewart Smith wrote:
> On Tue, 2007-08-21 at 11:58 +0200, Giulio Ferro wrote: > >> The query is a simple select on a single table. The tables contains >> about 330.000 entries. The queries executed are like this: >> select * from <tablename> where <varchar_field> = '....'; >> > > can use index > > >> select * from <tablename> where <varchar_field> like '%....%'; >> > > cannot use index. > > > >> <varchar_field> is a varchar(255) and is also an index. >> >> >> >> explain output: >> +----+-------------+----------------+------+---------------+------+---------+------+--------+-----------------------------------+ >> | id | select_type | table | type | possible_keys | key | >> key_len | ref | rows | Extra | >> +----+-------------+----------------+------+---------------+------+---------+------+--------+-----------------------------------+ >> | 1 | SIMPLE | <tablename> | ALL | NULL | NULL | NULL >> | NULL | 330122 | Using where with pushed condition | >> +----+-------------+----------------+------+---------------+------+---------+------+--------+-----------------------------------+ >> 1 row in set (0.01 sec) >> > > i.e. no index is being used. > > Add index to speed up first query. > > Not going to happen for 2nd though. > > You may want to try with condition pushdown disabled, this may be > quicker for the second query (or may not). either way, it's a table > scan. >
Thank you for your answer.
I understand the reason behind the poor performance of 'like' queries
and I guessed
that it was related to indices.
select ... where <varchar> = '...' => uses index
select ... where <varchar> like '%..%' => doesn't use index
If you build a proper index for a varchar field you don't have any problem
with performance, both with 2 data nodes active and with 1 dn down.
So all this question boils down to is 'like' select statements / '='
statements without indices.
The thing I'm still wondering at is the huge gap of performance for this
kind of queries
between a full working configuration (2/2 dn) and degraded cluster (1/2 dn)
(btw condition pushdown seems to be on by default in 5.1.20/21)
As I stated in my first message, after the first query (which takes
about 12 secs)
the following queries, even though they are full scan, take about 0.4
seconds,
which is very acceptable in my opinion.
The true problem is when a data node fails or is brought down. In this
event the
time for a query skyrockets to 12 secs, which of course is unacceptable
in many
production environment.
I still fail to see how parallelization in a query like this could
account for such
a gap in performance (0.4 s. <-> 12 s.)
Since the degraded time is the same time it takes for the first
execution of queries
I guess that condition pushdown doesn't work for a degraded cluster.
Please don't let this issue fall. Textual 'like' searches are very
important in many
contexts (eg. search engines in web sites, name searches in organizer
applications, etc.)
If performance stays like, this the usefulness of a cluster
architecture is considerably
lessened.
I hope to get feedback on this.
--
MySQL Cluster Mailing List
For list archives:
http://lists.mysql.com/cluster
To unsubscribe:
http://lists.mysql.com/cluster?unsub=lists@pantek.com
Received on Fri Aug 31 11:11:44 2007
This archive was generated by hypermail 2.1.8
: Sun Oct 07 2007 - 10:15:06 EDT
|