Purging data

Table partitioning makes it easy to implement a data deletion strategy, for example, to comply with regulations that define a data retention period.

With Raijin Database Engine’s Table partitioning, partitions can be dynamically created based on a combination of date and other columns. This enables the precise selection of records to be purged based on your specific data retention requirements. For example, you may need to retain system events for 90 days and authentication events for 180 days.

Example 1. Dynamic partitioning by two columns and deleting a partition

In this example, the DATE_TRUNC function extracts the date portion of EventTime. That date value and the EventType field are supplied as arguments to PARTITION BY when the table is created. This means that for each new record, the date part of the EventTime field and the EventType field will determine which partition it will be assigned to.

Create a table with a date:eventtype partition and insert data
CREATE TABLE tbl() PARTITION BY DATE_TRUNC('day',EventTime), EventType;
INSERT INTO tbl(event, EventTime, EventType)
    VALUES('a', '2022-03-04 17:14:04', 'Auth'); (1)
INSERT INTO tbl(event, EventTime, EventType)
    VALUES('b', '2022-05-04 17:14:10', 'Sys'); (2)
INSERT INTO tbl(event, EventTime, EventType)
    VALUES('c', '2022-05-05 17:14:04', 'Auth'); (3)
1 Dynamically assigned to partition 2022-03-04T01:00:00.000000+01:00-Auth
2 Dynamically assigned to partition 2022-05-04T01:00:00.000000+01:00-Sys
3 Dynamically assigned to partition 2022-05-05T01:00:00.000000+01:00-Auth
Query for all data from all attached partitions
SELECT event, EventTime, EventType FROM tbl;
Result when all partitions are attached
{"event":"a","eventtime":"2022-03-04 17:14:04","eventtype":"Auth"}
{"event":"b","eventtime":"2022-05-04 17:14:10","eventtype":"Sys"}
{"event":"c","eventtime":"2022-05-05 17:14:04","eventtype":"Auth"}
Detach partition 2022-03-04:Auth and retrieve all accessible data
ALTER TABLE tbl DETACH PARTITION "2022-03-04T01:00:00.000000+01:00-Auth";
SELECT event, EventTime, EventType FROM tbl;
Result when partition 2022-03-04:Auth is detached
{"event":"b","eventtime":"2022-05-04 17:14:10","eventtype":"Sys"}
{"event":"c","eventtime":"2022-05-05 17:14:04","eventtype":"Auth"}

After detaching the partition, it can be deleted.

List all partitions
SHOW PARTITIONS IN tbl;
Result
{"name":"2022-05-04T01:00:00.000000+01:00-Sys","state":"attached","rows":1,"path":"testdb/tbl/30e29b0a9a8e1ce9bb96eae203dcfee8"}
{"name":"2022-05-05T01:00:00.000000+01:00-Auth","state":"attached","rows":1,"path":"testdb/tbl/7ff11f524aa9f67d7097814b1e7aef59"}
{"name":"2022-03-04T01:00:00.000000+01:00-Auth","state":"detached","rows":1,"path":"testdb/tbl/_detached/ddb7f43eacae2905d035f9065b62c1d3"}
Delete partition 2022-03-04:Auth
ALTER TABLE tbl DROP PARTITION "2022-03-04T01:00:00.000000+01:00-Auth";
SHOW PARTITIONS IN tbl;
Result
{"name":"2022-05-04T01:00:00.000000+01:00-Sys","state":"attached","rows":1,"path":"testdb/tbl/30e29b0a9a8e1ce9bb96eae203dcfee8"}
{"name":"2022-05-05T01:00:00.000000+01:00-Auth","state":"attached","rows":1,"path":"testdb/tbl/7ff11f524aa9f67d7097814b1e7aef59"}
Query for all data from all attached partitions
SELECT event, EventTime, EventType FROM tbl;
Result after partition 2022-03-04:Auth is deleted
{"event":"b","eventtime":"2022-05-04 17:14:10","eventtype":"Sys"}
{"event":"c","eventtime":"2022-05-05 17:14:04","eventtype":"Auth"}

The above sequence can be automated using a script executed daily, for example, via a cron job.

The following bash script is provided as an example of how to delete dynamic partitions created based on event date and type. The script connects to a local Raijin instance and uses database nxlog and table tbl. It expects partition names to be in the format date:eventtype and deletes events older than 30 days and of type Auth.

#!/bin/bash

command -v curl >/dev/null || echo "curl not installed"
command -v jq >/dev/null || echo "jq not installed"

#CUTOFF=$(date --date "3 months ago" "+%s")
CUTOFF=$(date --date "30 days ago" "+%s")
#ETYPE='Sys'
ETYPE='Auth'
RAIJIN=0.0.0.0:2500

RQ=$(mktemp)
LOG=$(mktemp)

cmdgen(){
echo -n '{"query":"use nxlog;'
curl -sS $RAIJIN --data \
   '{"query":"use nxlog; show partitions from tbl"}'  |\
   jq -r '.name' |\
   while IFS=":" read DATE TYPE; do
      if [ $(date --date $DATE '+%s') -lt $CUTOFF ] && [ $TYPE = $ETYPE ]; then
         echo -n "alter table tbl detach partition \\\"$DATE:$TYPE\\\"; alter table tbl drop partition \\\"$DATE:$TYPE\\\";"
         echo "Dropping partition $DATE:$TYPE" | tee -a $LOG | systemd-cat -t raijin.cron
      fi
   done
echo -n \"}
if [ ! -s $LOG ]; then
    echo "Nothing to do today" | tee -a $LOG | systemd-cat -t raijin.cron
fi
}

cmdgen > $RQ
cat $LOG
curl -sS $RAIJIN -d "@$RQ"
rm $RQ $LOG
This script is provided as is without warranty of any kind, either expressed or implied. Use at your own risk.