Quantcast
Channel: Helion-Prime Solutions blog » rdbms
Viewing all articles
Browse latest Browse all 2

Full Text Search with several tables in PostgreSQL

$
0
0

Preamble

Quite often to overcome performance degradation in Full Text Search with several tables in PostgreSQL people use external full text search engines. And surely you should use them if you have really big amount of date.

It seems that currently there are 2 most popular engines:

Sphinx: http://sphinxsearch.com/
license: GPL version 2. Commercial licensing (eg. for embedded use) is available upon request.
native API implementations: PHP, Perl, Ruby, and Java.
written in: C++

Lucene: http://lucene.apache.org/
license: Apache License 2.0.
native API implementations: Delphi, Perl, C#, C++, Python, Ruby and PHP.
written in: Java

Basics

As all active users of PostgreSQL know support of full text search is integrated into the core database system since version 8.3.
If you don’t have basic understanding of full text search in PostgreSQL, please check:
http://www.postgresql.org/docs/8.4/interactive/textsearch-intro.html

Simple full text search query:

1
2
SELECT * FROM blog_post
WHERE to_tsvector('english', text || ' ') @@ plainto_tsquery('test')

As for any query to achieve acceptable performance using big amount of data we need to add DB index.

1
2
CREATE INDEX blog_post_ts_idx
ON blog_post USING gin(to_tsvector('english', text || ' '));

The problem arises when we need to perform search with several linked tables. The experiments show that even with necessary indexes on tables during full text search PostgreSQL uses only 1 index and performance degrades.

Solving of the issue with PostgreSQL

For example we have 2 tables:
table: Blog_post, field: text,
table: User, field: name

Queries with UNION
We need indexes on both tables:

1
2
CREATE INDEX blog_post_ts_idx ON blog_post USING gin(to_tsvector('english', text || ' '));
CREATE INDEX user_ts_idx ON USER USING gin(to_tsvector('english', name || ' '));

We execute full text search with composite query (UNION )

1
2
3
4
5
SELECT id FROM blog_post WHERE to_tsvector('english', text || ' ') @@ plainto_tsquery('test')
UNION
SELECT blog_post.id FROM blog_post
   LEFT JOIN USER ON USER.id = blog_post.user_id
   WHERE to_tsvector('english', name || ' ' ) @@ plainto_tsquery('test')

In that case PostgreSQL perform 2 separate query that accomplish search by different fields, and that solve issue of index using.

Using of materialized View

Materialized View – is a database object that contains the results of a query. And as it’s a real object we can add DB index to it. Unfortunately at this time Postgresql doesn’t support that type of DB view.
Therefore we need to create such object by itself. In the capacity of materialized view we will use regular table that consists of 2 fields:

1
2
3
CREATE TABLE blog_post_ts_keywords (
   blog_post_id INTEGER NOT NULL PRIMARY KEY,
   keywords TSVECTOR);

Then we can add DB index to it:

1
2
CREATE INDEX index_blog_post_ts_keywords ON blog_post_ts_keywords
   USING GIN(keywords);

Now we can execute regular full text search:

1
SELECT blog_post_id FROM blog_post_ts_keywords WHERE keywords @@ plainto_tsquery('test')

Main complication here is maintenance of that view in actual state.
We can use DB triggers: http://www.postgresql.org/docs/8.4/interactive/trigger-definition.html or simple update queries by some schedule

1
2
3
4
5
INSERT INTO blog_post_ts_keywords
SELECT blog_post.id AS blog_post_id,  
      (to_tsvector('english', blog_post.text ||  usr.name || ' ')) AS keywords
   FROM blog_post
   LEFT JOIN USER ON USER.id = blog_post.user_id

Here you can see simple query for materialized view filling. The query for data refreshing looking quite the same considering checking of changes. And your query for outdated data deleting should take into account deleted or not active records.


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images