Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

Re: [PERFORM] rtree/gist index taking enormous amount of space in 8.2.3

From: Dolafi, Tom <dolafit(at)janelia.hhmi.org>
Date: Fri Jun 29 2007 - 14:13:23 EDT


Thanks for looking into this and reproducing a similar result. The index took 6 hours to complete on a 1.5GB table resulting in 35GB of storage, and it took 36 hours to vacuum... I'm patient :-)

In the mean time I've dropped the index which has resulted in overall performance gain on queries against the table, but we have not tested the part of the application which would utilize this index.

  • Tom

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Friday, June 29, 2007 1:58 PM
To: Dolafi, Tom
Cc: pgsql-performance@postgresql.org; Oleg Bartunov; Teodor Sigaev Subject: Re: [PERFORM] rtree/gist index taking enormous amount of space in 8.2.3

"Dolafi, Tom" <dolafit@janelia.hhmi.org> writes:
> min(fmin) | max(fmin) | avg(fmin)
> 1 | 55296469 | 11423945
> min(fmax) | max(fmax) | avg(fmax)
> 18 | 55553288 | 11424491

OK, I was able to reproduce a problem after making the further guess that fmax is usually a little bit greater than fmin. The attached test script generates an rtree index of around 800 pages on 8.1.9, and the index build time is about 6 seconds on my machine. On CVS HEAD, the script generates a gist index of over 30000 pages and the build time is over 60 seconds. Since I'm using random() the numbers move around a bit, but they're consistently awful. I experimented with a few other distributions, such as fmin and fmax chosen independently in the same range, and saw gist build time usually better than rtree and index size only somewhat larger, so this particular distribution apparently fools gist_box_picksplit rather badly. The problem seems nonlinear too --- I had originally tried it with 1 million test rows instead of 100000, and gave up waiting for the index build after more than an hour.

Oleg, Teodor, can this be improved?

                        regards, tom lane

drop table featureloc;

Do you need help?X

CREATE TABLE featureloc
(
  fmin integer,
  fmax integer
);

insert into featureloc
  select r1, r1 + 1 + random() * 1000 from   (select 1 + random() * 55000000 as r1, 1 + random() * 55000000 as r2    from generate_series(1,100000) offset 0) as ss;

CREATE OR REPLACE FUNCTION boxrange(integer, integer)   RETURNS box AS
    'SELECT box (point(0, $1), point($2, 500000000))'   LANGUAGE 'sql' STRICT IMMUTABLE;

CREATE INDEX binloc_boxrange
  ON featureloc
  USING rtree
  (boxrange(fmin, fmax));

vacuum verbose featureloc;

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings Received on Fri Jun 29 14:15:39 2007

This archive was generated by hypermail 2.1.8 : Fri Jun 29 2007 - 14:20:06 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library