Table partitioning

Table partitioning is the technique of dividing table data into distinct chunks known as partitions. Partitions allow the database engine to operate on smaller portions of the data, significantly increasing the overall performance and making querying, inserting, and deleting data more efficient.

Raijin supports three types of table partitioning:

Manual partitions, where you define the partition of each record in the INSERT command.
Multidimensional partitions, where you define the partition field(s) when you create the table.
Finally, since partitioning is optional, you can have tables without partitions.

Manual partitioning

Tables with manual partitions give you complete control over partition names and their content. On the other hand, manual partitioning requires explicitly deciding the partition, if any, incoming data belongs to on insertion.

To add data to a partition, you must define the partition using the PARTITION BY option of the INSERT command. If a partition with that name already exists, the data is added to it. A new partition is created if there is no partition with that name.

Data inserted without a partition will be stored in the default partition. You can see this partition with an empty name field when you list table partitions with the SHOW PARTITIONS command.

Example 1. Example of manual partitions

Insert data into two partitions, "foo" and "bar"

CREATE TABLE logs (
  "Source" STRING,
  "Hostname" STRING
  );

INSERT INTO logs VALUES ('a','host1') WITH (partition = 'foo');
INSERT INTO logs VALUES ('b','host2') WITH (partition = 'bar');
INSERT INTO logs VALUES ('c','host1') WITH (partition = 'foo');
INSERT INTO logs VALUES ('d','host2');

SELECT * FROM logs;
SHOW PARTITIONS IN logs;

This example results in a table with four records. Two of the records will be in a partition named foo, one record in a partition named bar, and one record that is not in any partition.

Result

{"_id":1,"Source":"a","Hostname":"host1"}
{"_id":3,"Source":"c","Hostname":"host1"}
{"_id":2,"Source":"b","Hostname":"host2"}
{"_id":4,"Source":"d","Hostname":"host2"}
{}
{"name":"foo","state":"attached","record_count":2,"path":"testdb/logs/foo","size":83}
{"name":"bar","state":"attached","record_count":1,"path":"testdb/logs/bar","size":83}
{"name":"","state":"attached","record_count":1,"path":"testdb/logs/_default","size":78}

Multidimensional partitioning

Multidimensional partitions are based on one or more table fields (partition dimensions) you define when creating the table. Partitions are created when records are inserted in the table if none of the existing partitions match the new data.

Creating a table with multidimensional partitions

CREATE TABLE logs (
  "EventTime" TIMESTAMP,
  "Source" STRING,
  "Message" STRING,
  "Hostname" STRING
)
PARTITION BY Source, Hostname;

The example above creates a table partitioned by combinations of the Source and Hostname fields. The partitions enable Raijin to operate on a subset of the data segmented by the intersection of those values whenever possible.

This partition setup optimizes processing SELECT queries referencing a specific source, group of sources, or any combination of source and hostname since Raijin only needs to operate on partitions containing a matching source. For example, these partitions make purging data for a phased-out Hostname easier.

Results do not have a default order when retrieving data from partitioned tables. If you require ordered results, use the ORDER BY clause of the SELECT command.

Selecting your partitions

Database performance depends on partitioning tables by fields that suit your needs. Your partitions should not be too broad or too granular. Splitting a table into very big or too many partitions will not yield any performance benefits. In such cases, Raijin may have to scan large chunks of irrelevant data, hindering performance.

Additionally, ensure that you partition tables by fields you frequently use to filter data so that Raijin works on smaller datasets and can process queries faster, e.g., during a SELECT operation.

Timestamp fields are often good candidates for defining table partitions. They are commonly used for filtering when querying or purging data. Yet, timestamps are very granular and can be up to a microsecond. Partitioning data by such a field would defeat the purpose of partitions because most - if not all - records would be in different partitions.

When partitioning by timestamp fields, we recommend using the DATE_TRUNC function. This function allows you to reduce the partition by a suitable time fraction, such as the week or month. For example, partitioning by DATE_TRUNC('month', <TIMESTAMP field>) will create a partition per month.

Example of table partitioning by day

CREATE TABLE logs (
  "EventTime" TIMESTAMP,
  "Source" STRING,
  "Message" STRING,
  "Hostname" STRING
) PARTITION BY "Source", DATE_TRUNC('day', EventTime)

In conclusion, consider the following when choosing your partitioning fields:

Look for fields you commonly use for filtering queries to reduce the dataset Raijin works on.
Consider relevant fields for pruning data, such as truncated timestamps, to simplify administrative tasks, including deleting older data from your database.
Avoid fields with low variance, resulting in a small number of partitions, unless you use them in combination with other fields.
Avoid fields with very high variance, such as granular timestamps, because it will result in too many partitions (potentially a partition for each record).

Example 2. Example of dynamic partitions

This example uses the DATE_TRUNC function to extract the date from the EventTime field, comprising TIMESTAMP values.

The PARTITION BY clause specifies the DATE_TRUNC('day', EventTime) expression on table creation. As a result, the date portion of the EventTime field determines the partition where each record will be stored.

Create a table with date-based partitioning

CREATE TABLE logs (
  "EventTime" TIMESTAMP,
  "event" STRING
) PARTITION BY DATE_TRUNC('day', EventTime);

INSERT INTO logs(event, EventTime) VALUES('a', '2023-02-04 17:14:04');
INSERT INTO logs(event, EventTime) VALUES('b', '2023-02-04 17:14:10');
INSERT INTO logs(event, EventTime) VALUES('c', '2023-02-05 17:14:04');

SELECT * FROM logs;
SHOW PARTITIONS IN logs;

This example results in a table with three records in two partitions

Result

{"_id":1,"event":"a","eventtime":"2023-02-04 17:14:04"}
{"_id":2,"event":"b","eventtime":"2023-02-04 17:14:10"}
{"_id":3,"event":"c","eventtime":"2023-02-05 17:14:04"}
{}
{"name":"2023-02-04T01:00:00.000000+01:00","state":"attached","record_count":2,"path":"testdb/logs/2023-02-04T01%3A00%3A00.000000+01%3A00 (DATE_TRUNC('day', EventTime))","size":208,"eventtime":"2023-02-04T01:00:00.000000+01:00"}
{"name":"2023-02-05T01:00:00.000000+01:00","state":"attached","record_count":1,"path":"testdb/logs/2023-02-05T01%3A00%3A00.000000+01%3A00 (DATE_TRUNC('day', EventTime))","size":208,"eventtime":"2023-02-05T01:00:00.000000+01:00"}

Managing partitions

You can list, delete, and take partitions offline. See the SHOW PARTITIONS and ALTER TABLE commands.

Raijin stores partition data in its parent table’s data directory. Other than folders for each partition, it might create two additional folders:

The _detached folder if there are any detached partitions.
The _default folder for unpartitioned data, e.g., if data is inserted without a partition for manually partitioned tables or the table was originally created without partitions.

Detach and attach partitions

The ATTACH PARTITION, DETACH PARTITION, and DROP PARTITION clauses of the ALTER TABLE command support the DIRECTORY keyword, which allows you to specify the folder name instead of the folder path.

Once a partition is detached, you can remove its folder from the _detached folder and copy it back if you need to attach it again, such as for archiving and unarchiving data.

Detaching a partition by folder name

ALTER TABLE tbl DETACH PARTITION DIRECTORY "partition_data";

Do not move or rename partition folders outside the _detached folder while Raijin is running. Doing so will render the database inconsistent and cause irreparable damage. We strongly advise always using the ALTER TABLE command to attach, detach, and delete partitions to avoid data loss.

Using the DETACH PARTITION clause of the ALTER TABLE command takes the partition data offline, making it unavailable for querying.

Example 3. Detaching and attaching partitions

Create a table and insert data

CREATE TABLE logs (
  "event" STRING,
  "hostname" STRING
) PARTITION BY "hostname";

INSERT INTO logs VALUES ('a','foo');
INSERT INTO logs VALUES ('b','bar');
INSERT INTO logs VALUES ('c','x');

Detach the "bar" partition

ALTER TABLE logs DETACH PARTITION "bar";
SELECT * FROM logs;

Result when "bar" is detached

{"_id":1,"event":"a","hostname":"foo"}
{"_id":3,"event":"c","hostname":"x"}

Use ATTACH PARTITION to restore a detached partition. It is important to remember that a partition belongs to the table it was created for and cannot be attached to a different table.

Attach the "bar" partition

ALTER TABLE logs ATTACH PARTITION "bar";
SELECT * FROM logs;

Result when "bar" is reattached

{"_id":1,"event":"a","hostname":"foo"}
{"_id":2,"event":"b","hostname":"bar"}
{"_id":3,"event":"c","hostname":"x"}

Tables without partitions

Tables created without partitions will have all records in a single partition, stored in the _default partition directory. Raijin creates the partition the first time you insert data into the table. You can see this partition when you issue the SHOW PARTITIONS command for an unpartitioned table that contains data.

Result showing the default partition

{
  "name": "",
  "state": "attached",
  "record_count": 3,
  "path": "testdatabase/testtable/_default",
  "size": 80
}