Backup and archiving

Raijin does not currently include a specialized tool for backing up databases. However, you can use OS-level tools to back up the Raijin Database Engine server. In addition, you can implement a data archiving solution by partitioning tables according to your requirements.

File permissions

Regardless of how you back up your databases, Raijin must be able to read and modify restored databases and partitions. Therefore, you must preserve user permissions when copying or moving files.

In Unix-like operating systems, you can use the cp (copy) command with the -a option to retain all permissions when copying files.

Alternatively, use the following commands to re-assign the raijin user ownership and set the required permissions for restored folders.

$ sudo chown -R raijin:raijin <FOLDER_PATH>
$ sudo chmod -R +rx <FOLDER_PATH>

File system snapshots

We recommend regular, full file system snapshots for large, active databases. The easiest way to implement this is by using OS-level tools to take a snapshot of the volume containing the Raijin data directory. This way, the full state of the database is saved, and in case of a catastrophic event, you can quickly restore service using the most recent snapshot.

Prerequisites:

  • During deployment, you must place the database on a volume that supports snapshots, for example, Btrfs or LVM. You can change the default Raijin data directory with the DataDir configuration parameter. See Storage settings for more information.

  • It is important that you stop the Raijin Database Engine service before taking the snapshot to ensure data consistency.

Example 1. Taking a snapshot

The first step is to lock the database to prevent further write transactions. The FLUSH TABLES WITH READ LOCK command locks all tables in all databases.

Lock the database from being modified
$ curl -X POST http://localhost:2500 -H 'Content-type: application/json' -d '{"query": "FLUSH TABLES WITH READ LOCK;"}'

Take a snapshot of the volume where the Raijin data directory resides and release the lock.

Release the database lock
$ curl -X POST http://localhost:2500 -H 'Content-type: application/json' -d '{"query": "UNLOCK TABLES;"}'

Finally, back up the volume snapshot to a safe location, ideally to an external storage device or service.

Partition-based archiving

Table partitioning makes it easy to implement a data archiving strategy.

Archiving table partitions

Using Raijin Database Engine’s table partitioning, you can partition a table dynamically based on the event date or any combination of columns. Dynamic partitioning enables you to archive data based on your requirements, for example, archiving events older than 60 days.

Example 2. Dynamic partitioning by event date and archiving a partition

In this example, the DATE_TRUNC function extracts the date portion of EventTime.

DATE_TRUNC(EventTime) is supplied as the argument to PARTITION BY when creating the table. Therefore, the date part of the EventTime field will determine the partition of each record.

Create a table with a date partition and insert data
CREATE DATABASE nxlog;
CREATE TABLE tbl() PARTITION BY DATE_TRUNC('day',EventTime);
INSERT INTO tbl(event, EventTime)
    VALUES('a', '2023-04-01 17:14:04'); (1)
INSERT INTO tbl(event, EventTime)
    VALUES('b', '2023-03-04 12:03:10'); (2)
INSERT INTO tbl(event, EventTime)
    VALUES('c', '2023-02-16 16:51:27'); (3)
1 Dynamically assigned to partition 2023-04-01T01:00:00.000000+01:00
2 Dynamically assigned to partition 2023-03-04T01:00:00.000000+01:00
3 Dynamically assigned to partition 2023-02-16T01:00:00.000000+01:00
List all partitions
SHOW PARTITIONS IN tbl;
Result
{"name":"2023-04-01T01:00:00.000000+01:00","state":"attached","rows":"1","path":"nxlog/tbl/db3731e2c6e70b6cfa20ffa449e49fcf"}
{"name":"2023-03-04T01:00:00.000000+01:00","state":"attached","rows":"1","path":"nxlog/tbl/58f2f0116b18375577bedaf0940d3f8f"}
{"name":"2023-02-16T01:00:00.000000+01:00","state":"attached","rows":"1","path":"nxlog/tbl/7e1c88563ba94aaeafd2633c82ff4908"}
Detach partition for 2023-02-16
ALTER TABLE tbl DETACH PARTITION "2023-02-16T01:00:00.000000+01:00";
SHOW PARTITIONS IN tbl;
Result
{"name":"2023-04-01T01:00:00.000000+01:00","state":"attached","rows":"1","path":"nxlog/tbl/db3731e2c6e70b6cfa20ffa449e49fcf"}
{"name":"2023-03-04T01:00:00.000000+01:00","state":"attached","rows":"1","path":"nxlog/tbl/58f2f0116b18375577bedaf0940d3f8f"}
{"name":"2023-02-16T01:00:00.000000+01:00","state":"detached","rows":"1","path":"nxlog/tbl/_detached/7e1c88563ba94aaeafd2633c82ff4908"}

After detaching the partition, you can back up and delete its data folder.

Back up and delete the 2023-02-16 partition data folder
$ sudo cp -Ra /opt/raijin/data/db/nxlog/tbl/_detached/7e1c88563ba94aaeafd2633c82ff4908 /tmp/raijin
$ sudo rm -R /opt/raijin/data/db/nxlog/tbl/_detached/7e1c88563ba94aaeafd2633c82ff4908

You can automate the above sequence by a script executed daily, for example, via a cron job.

The following is an example bash script that detaches partitions created based on the event date. The script connects to a local Raijin instance and uses database nxlog and table tbl. It detaches partitions older than two months, copies the partition folder to a backup location, and deletes the source folder.

#!/bin/bash

command -v curl >/dev/null || echo "curl not installed"
command -v jq >/dev/null || echo "jq not installed"

#CUTOFF=$(date --date "30 days ago" "+%s")
CUTOFF=$(date --date "2 months ago" "+%s")

RAIJIN=0.0.0.0:2500
DB=nxlog
TBL=tbl
DATA="/opt/raijin/data/db/$DB/$TBL/_detached"
BACKUP="/tmp/raijin"

RQ=$(mktemp)
LOG=$(mktemp)
PT=$(mktemp)

cmdgen(){
echo -n '{"query":"use '"$DB"';'
curl -sS $RAIJIN --data \
   '{"query":"use '"$DB"'; show partitions from '"$TBL"'"}' |\
   jq -r '(.name) + "$" + (.state) + "$" + (.rows) + "$" + (.path)' |\
   while IFS="$" read DATE STATE ROWS FPATH; do
     export PATH=/usr/bin:/bin:$PATH
     if [ $(date --date "$DATE" "+%s") -lt $CUTOFF ] && [ $STATE = 'attached' ]; then
       IFS="/" read -ra arr <<< "$FPATH"
       echo "${arr[-1]}" >> $PT
       echo -n "alter table $TBL detach partition \\\"$DATE\\\";"
       echo "Detaching partition $DATE" | tee -a $LOG | systemd-cat -t raijin.cron
     fi
   done
echo -n \"}
if [ ! -s $LOG ]; then
   echo "Nothing to do today" | tee -a $LOG | systemd-cat -t raijin.cron
fi
}

cmdgen > $RQ
curl -sS $RAIJIN -d "@$RQ"
while read PARTITION; do
  SRC="$DATA/$PARTITION"
  DST="$BACKUP/$PARTITION"
  echo "Copying $SRC to $DST" | tee -a $LOG | systemd-cat -t raijin.cron
  mkdir -p $BACKUP &&  cp -Ra $SRC $DST
  echo "Deleting $SRC" | tee -a $LOG | systemd-cat -t raijin.cron
  rm -R $SRC
done < $PT
cat $LOG
rm $RQ $LOG $PT
This script is provided as is without warranty of any kind, either expressed or implied. Use at your own risk.

Restoring table partitions

Suppose you need to execute queries on archived data, for example, for forensic analysis. In that case, you can quickly restore the relevant partition(s) by copying the data back to the table’s _detached directory and re-attaching it.

Example 3. Restoring a partition
Restore the 2023-02-16 partition folder
$ sudo cp -Ra /tmp/raijin/7e1c88563ba94aaeafd2633c82ff4908 /opt/raijin/data/db/nxlog/tbl/_detached
Attach partition 2023-02-16
ALTER TABLE tbl ATTACH PARTITION "2023-02-16T01:00:00.000000+01:00";
SHOW PARTITIONS IN tbl;
Result
{"name":"2023-04-01T01:00:00.000000+01:00","state":"attached","rows":"1","path":"nxlog/tbl/db3731e2c6e70b6cfa20ffa449e49fcf"}
{"name":"2023-03-04T01:00:00.000000+01:00","state":"attached","rows":"1","path":"nxlog/tbl/58f2f0116b18375577bedaf0940d3f8f"}
{"name":"2023-02-16T01:00:00.000000+01:00","state":"attached","rows":"1","path":"nxlog/tbl/7e1c88563ba94aaeafd2633c82ff4908"}