Database

PGroonga: Multilingual Full Text Search

PGroonga is a PostgreSQL extension adding a full text search indexing method based on Groonga. While native PostgreSQL supports full text indexing, it is limited to alphabet and digit based languages. PGroonga offers a wider range of character support making it viable for a superset of languages supported by PostgreSQL including Japanese, Chinese, etc.

Enable the extension

Creating a full text search index

Given a table with a text column:


_10
create table memos (
_10
id serial primary key,
_10
content text
_10
);

We can index the column for full text search with a pgroonga index:


_10
create index ix_memos_content ON memos USING pgroonga(content);

To test the full text index, we'll add some data.


_10
insert into memos(content)
_10
values
_10
('PostgreSQL is a relational database management system.'),
_10
('Groonga is a fast full text search engine that supports all languages.'),
_10
('PGroonga is a PostgreSQL extension that uses Groonga as index.'),
_10
('There is groonga command.');

The PostgreSQL query planner is smart enough to know that, for extremely small tables, it's faster to scan the whole table rather than loading an index. To force the index to be used, we can disable sequential scans:


_10
-- For testing only. Don't do this in production
_10
set enable_seqscan = off;

Now if we run an explain plan on a query filtering on memos.content:


_10
explain select * from memos where content like '%engine%';
_10
_10
QUERY PLAN
_10
-----------------------------------------------------------------------------
_10
Index Scan using ix_memos_content on memos (cost=0.00..1.11 rows=1 width=36)
_10
Index Cond: (content ~~ '%engine%'::text)
_10
(2 rows)

The pgroonga index is used to retrieve the result set:


_10
| id | content |
_10
| --- | ------------------------------------------------------------------------ |
_10
| 2 | 'Groonga is a fast full text search engine that supports all languages.' |

The &@~ operator performs full text search. It returns any matching results. Unlike LIKE operator, pgroonga can search any text that contains the keyword case insensitive.

Take the following example:


_10
select * from memos where content &@~ 'groonga';

And the result:


_10
id | content
_10
----+------------------------------------------------------------------------
_10
2 | Groonga is a fast full text search engine that supports all languages.
_10
3 | PGroonga is a PostgreSQL extension that uses Groonga as index.
_10
4 | There is groonga command.
_10
(3 rows)

Match all search words

To find all memos where content contains BOTH of the words postgres and pgroonga, we can just use space to separate each words:


_10
select * from memos where content &@~ 'postgres pgroonga';

And the result:


_10
id | content
_10
----+----------------------------------------------------------------
_10
3 | PGroonga is a PostgreSQL extension that uses Groonga as index.
_10
(1 row)

Match any search words

To find all memos where content contain ANY of the words postgres or pgroonga, use the upper case OR:


_10
select * from memos where content &@~ 'postgres OR pgroonga';

And the result:


_10
id | content
_10
----+----------------------------------------------------------------
_10
1 | PostgreSQL is a relational database management system.
_10
3 | PGroonga is a PostgreSQL extension that uses Groonga as index.
_10
(2 rows)

Search that matches words with negation

To find all memos where content contain the word postgres but not pgroonga, use - symbol:


_10
select * from memos where content &@~ 'postgres -pgroonga';

And the result:


_10
id | content
_10
----+--------------------------------------------------------
_10
1 | PostgreSQL is a relational database management system.
_10
(1 row)

Resources