Purging data
Table partitioning makes it easy to implement a data deletion strategy, for example, to comply with regulations that define a data retention period.
With Raijin Database Engine’s Table partitioning, partitions can be dynamically created based on a combination of date and other columns. This enables the precise selection of records to be purged based on your specific data retention requirements. For example, you may need to retain system events for 90 days and authentication events for 180 days.
In this example, the DATE_TRUNC function extracts the date portion of EventTime
.
That date value and the EventType
field are supplied as arguments to PARTITION BY
when the table is created.
This means that for each new record, the date part of the EventTime
field and the EventType
field will determine which partition it will be assigned to.
CREATE TABLE tbl() PARTITION BY DATE_TRUNC('day',EventTime), EventType;
INSERT INTO tbl(event, EventTime, EventType)
VALUES('a', '2022-03-04 17:14:04', 'Auth'); (1)
INSERT INTO tbl(event, EventTime, EventType)
VALUES('b', '2022-05-04 17:14:10', 'Sys'); (2)
INSERT INTO tbl(event, EventTime, EventType)
VALUES('c', '2022-05-05 17:14:04', 'Auth'); (3)
1 | Dynamically assigned to partition 2022-03-04T01:00:00.000000+01:00-Auth |
2 | Dynamically assigned to partition 2022-05-04T01:00:00.000000+01:00-Sys |
3 | Dynamically assigned to partition 2022-05-05T01:00:00.000000+01:00-Auth |
SELECT event, EventTime, EventType FROM tbl;
{"event":"a","eventtime":"2022-03-04 17:14:04","eventtype":"Auth"}
{"event":"b","eventtime":"2022-05-04 17:14:10","eventtype":"Sys"}
{"event":"c","eventtime":"2022-05-05 17:14:04","eventtype":"Auth"}
ALTER TABLE tbl DETACH PARTITION "2022-03-04T01:00:00.000000+01:00-Auth";
SELECT event, EventTime, EventType FROM tbl;
{"event":"b","eventtime":"2022-05-04 17:14:10","eventtype":"Sys"}
{"event":"c","eventtime":"2022-05-05 17:14:04","eventtype":"Auth"}
After detaching the partition, it can be deleted.
SHOW PARTITIONS IN tbl;
{"name":"2022-05-04T01:00:00.000000+01:00-Sys","state":"attached","rows":1,"path":"testdb/tbl/30e29b0a9a8e1ce9bb96eae203dcfee8"}
{"name":"2022-05-05T01:00:00.000000+01:00-Auth","state":"attached","rows":1,"path":"testdb/tbl/7ff11f524aa9f67d7097814b1e7aef59"}
{"name":"2022-03-04T01:00:00.000000+01:00-Auth","state":"detached","rows":1,"path":"testdb/tbl/_detached/ddb7f43eacae2905d035f9065b62c1d3"}
ALTER TABLE tbl DROP PARTITION "2022-03-04T01:00:00.000000+01:00-Auth";
SHOW PARTITIONS IN tbl;
{"name":"2022-05-04T01:00:00.000000+01:00-Sys","state":"attached","rows":1,"path":"testdb/tbl/30e29b0a9a8e1ce9bb96eae203dcfee8"}
{"name":"2022-05-05T01:00:00.000000+01:00-Auth","state":"attached","rows":1,"path":"testdb/tbl/7ff11f524aa9f67d7097814b1e7aef59"}
SELECT event, EventTime, EventType FROM tbl;
{"event":"b","eventtime":"2022-05-04 17:14:10","eventtype":"Sys"}
{"event":"c","eventtime":"2022-05-05 17:14:04","eventtype":"Auth"}
The above sequence can be automated using a script executed daily, for example, via a cron job.
The following bash script is provided as an example of how to delete dynamic partitions created based on event date and type.
The script connects to a local Raijin instance and uses database nxlog
and table tbl
.
It expects partition names to be in the format date:eventtype
and deletes events older than 30 days and of type Auth.
#!/bin/bash
command -v curl >/dev/null || echo "curl not installed"
command -v jq >/dev/null || echo "jq not installed"
#CUTOFF=$(date --date "3 months ago" "+%s")
CUTOFF=$(date --date "30 days ago" "+%s")
#ETYPE='Sys'
ETYPE='Auth'
RAIJIN=0.0.0.0:2500
RQ=$(mktemp)
LOG=$(mktemp)
cmdgen(){
echo -n '{"query":"use nxlog;'
curl -sS $RAIJIN --data \
'{"query":"use nxlog; show partitions from tbl"}' |\
jq -r '.name' |\
while IFS=":" read DATE TYPE; do
if [ $(date --date $DATE '+%s') -lt $CUTOFF ] && [ $TYPE = $ETYPE ]; then
echo -n "alter table tbl detach partition \\\"$DATE:$TYPE\\\"; alter table tbl drop partition \\\"$DATE:$TYPE\\\";"
echo "Dropping partition $DATE:$TYPE" | tee -a $LOG | systemd-cat -t raijin.cron
fi
done
echo -n \"}
if [ ! -s $LOG ]; then
echo "Nothing to do today" | tee -a $LOG | systemd-cat -t raijin.cron
fi
}
cmdgen > $RQ
cat $LOG
curl -sS $RAIJIN -d "@$RQ"
rm $RQ $LOG
This script is provided as is without warranty of any kind, either expressed or implied. Use at your own risk. |