Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

Re: Cluster performance with data node down

From: Giulio Ferro <auryn(at)zirakzigil.org>
Date: Fri Aug 31 2007 - 10:57:29 EDT


Stewart Smith wrote:
> On Tue, 2007-08-21 at 11:58 +0200, Giulio Ferro wrote:
>
>> The query is a simple select on a single table. The tables contains
>> about 330.000 entries. The queries executed are like this:
>> select * from <tablename> where <varchar_field> = '....';
>>
>
> can use index
>
>
>> select * from <tablename> where <varchar_field> like '%....%';
>>
>
> cannot use index.
>
>
>
>> <varchar_field> is a varchar(255) and is also an index.
>>
>>
>>
>> explain output:
>> +----+-------------+----------------+------+---------------+------+---------+------+--------+-----------------------------------+
>> | id | select_type | table | type | possible_keys | key |
>> key_len | ref | rows | Extra |
>> +----+-------------+----------------+------+---------------+------+---------+------+--------+-----------------------------------+
>> | 1 | SIMPLE | <tablename> | ALL | NULL | NULL | NULL
>> | NULL | 330122 | Using where with pushed condition |
>> +----+-------------+----------------+------+---------------+------+---------+------+--------+-----------------------------------+
>> 1 row in set (0.01 sec)
>>
>
> i.e. no index is being used.
>
> Add index to speed up first query.
>
> Not going to happen for 2nd though.
>
> You may want to try with condition pushdown disabled, this may be
> quicker for the second query (or may not). either way, it's a table
> scan.
>

Thank you for your answer.

I understand the reason behind the poor performance of 'like' queries and I guessed
that it was related to indices.

select ... where <varchar> = '...' => uses index select ... where <varchar> like '%..%' => doesn't use index

If you build a proper index for a varchar field you don't have any problem with performance, both with 2 data nodes active and with 1 dn down.

So all this question boils down to is 'like' select statements / '=' statements without indices.

The thing I'm still wondering at is the huge gap of performance for this kind of queries
between a full working configuration (2/2 dn) and degraded cluster (1/2 dn)

(btw condition pushdown seems to be on by default in 5.1.20/21)

Do you need help?X

As I stated in my first message, after the first query (which takes about 12 secs)
the following queries, even though they are full scan, take about 0.4 seconds,
which is very acceptable in my opinion.

The true problem is when a data node fails or is brought down. In this event the
time for a query skyrockets to 12 secs, which of course is unacceptable in many
production environment.

I still fail to see how parallelization in a query like this could account for such
a gap in performance (0.4 s. <-> 12 s.)
Since the degraded time is the same time it takes for the first execution of queries
I guess that condition pushdown doesn't work for a degraded cluster.

Please don't let this issue fall. Textual 'like' searches are very important in many
contexts (eg. search engines in web sites, name searches in organizer applications, etc.)
If performance stays like, this the usefulness of a cluster architecture is considerably
lessened.

I hope to get feedback on this.

-- 
MySQL Cluster Mailing List
For list archives: 
http://lists.mysql.com/cluster
To unsubscribe:    
http://lists.mysql.com/cluster?unsub=lists@pantek.com
Received on Fri Aug 31 11:11:44 2007

This archive was generated by hypermail 2.1.8 : Sun Oct 07 2007 - 10:15:06 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library