[ACCEPTED]-Database design for point in time "snapshot" of data?-database-design

Accepted answer
Score: 18

This is NOT easy.

You're essentially asking 60 for a Temporal Database (What Christopher 59 Date calls Sixth Normal Form, or 6NF).

To 58 be 6NF, a schema must also be 5NF, and, basically, for 57 each datum, you need to attach a time range 56 for which the datum at that value is applicable. Then 55 in joins, the join must include only the 54 rows that are within the time range being 53 considered.

Temporal modeling is hard -- it's 52 what 6th Normal Form addresses -- and not 51 well supported in current RDBMSes.

The problem 50 is the granularity. 6th Normal Form (as 49 I understand it) supports temporal modeling 48 by making every non-key (non-key:, i.e., anything 47 "on" the entity that can change without 46 the entity losing its identity) a separate 45 relation. To this, you add a timestamp or 44 time range or version number. Making everything 43 a join solves the granularity problem, but 42 it also means your queries are going to 41 be more complicated and slower. It also 40 requires figuring out all keys and non-key 39 attributes; this tends to be a large effort.

Basically, everywhere 38 you have a relation ("ted owns the GM stock 37 certificate with id 789") you add a time: "ted 36 owns the GM stock certificate with id 789 35 now" so that you can simultaneously say, "fred 34 owns the GM stock certificate with id 789 33 from 3 Feb 2000 to yesterday". Obviously 32 these relations are many-to-many, (ted can 31 own more than one certificate now, and more 30 than one over his lifetime, too, and fred 29 can have previously owned the certificate 28 jack owns now).

So we have a table of owners, and 27 a table of stock certificates, and a many-to-many 26 table that relates owners and certificates 25 by id. To the many-to-many table, we add 24 a start_date and an end_date.

Now, imagine 23 that each state/province/land taxes the 22 dividends on stock certificates, so for 21 tax purposes to record the stock certificate's 20 owner's state of residency.

Where the owner 19 resides can obviously change independently 18 with stock ownership; ted can live in Nebraska, buy 17 10 shares, get a dividend that Nebraska 16 taxes, move to Nevada, sells 5 shares to 15 fred, buy 10 more shares.

But for us, it's 14 ted can move to Nebraska at some time, buy 10 shares 13 at some time, get a dividend at some time, which Nebraska taxes, move 12 to Neveda at some time, sell 5 shares to fred at some time, buy 11 10 more shares at some time.

We need all of that if we 10 want to calculate what taxes ted owes in 9 Nebraska and in Nevada, joining up on the 8 matching/overlapping date ranges in person_stockcertificate 7 and person_address. A person's address is 6 no longer one-to-one, it's one-to-many because 5 it's address during time range.

If ted buys ten shares, do 4 we model a buy event with a single purchase 3 date, or do we add a date_bought to each 2 share? Depends on the question we need the 1 model to answer.

Score: 11

We did this once by creating separate database 17 tables that contained the data we wanted 16 to snapshot, but denormalized, i.e. every 15 record contained all data required to make 14 sense, not references to id's that may or 13 may no longer exist. It also added a date 12 to each row.

Then we produced triggers for 11 specific inserts or updates that did a join 10 on all affected tables, and inserted it 9 into the snapshot tables.

This way it would 8 be trivial to write something that restored 7 the users' data to a point in time.

If you 6 have a table:

user:

id, firstname, lastname, department_id

department:

id, name, departmenthead_id

your snapshot 5 of the user table could look like this:

user_id, user_firstname, user_lastname, department_id, department_name, deparmenthead_id, deparmenthead_firstname, departmenthead_lastname, snapshot_date

and 4 a query something like

INSERT INTO usersnapshot
SELECT user.id AS user_id, user.firstname AS user_firstname, user.lastname AS user_lastname,
department.id AS department_id, department.name AS department_name
departmenthead.id AS departmenthead_id, departmenthead.firstname AS departmenthead_firstname, departmenthead.lastname AS departmenthead_lastname,
GETDATE() AS snapshot_date
FROM user
INNER JOIN department ON user.department_id = department.id
INNER JOIN user departmenthead ON department.departmenthead_id = departmenthead.id

This ensures each 3 row in the snapshot is true for that moment 2 in time, even if department or department 1 head has changed in the meantime.

Score: 2

Having snapshots and/or an audit trail is 28 a common database requirement. For many 27 applications, creating 'shadow' or audit 26 tables is an easy and straight forward task. While 25 database level backups and transaction logs 24 are good to have, they are not a version 23 control system.

Basically, you need create 22 a shadow table with all the same columns 21 as the base table, and then setup triggers 20 on the base table to place a copy of the 19 row in the shadow table when ever it is 18 updated or deleted.

Through some logic you 17 can recreate what the data looked like at 16 a given point in time. For an easy way 15 to set this up in Sybase see: http://www.theeggeadventure.com/wikimedia/index.php/Sybase_Tips#create_.27audit.27_columns

If you need 14 to do lots of historical snapshots, then 13 you can keep the data in the same table. Basically, create 12 two columns - an added and deleted column. The 11 downside is for every query you must add 10 a where clause. Of course you can create 9 a view, which shows just the active records. This 8 gets a bit more complicated if you have 7 a normalized database with multiple tables, all 6 with history.

However, it does work. You 5 simply have the 'added' and 'deleted' columns 4 on each table, and then your query has the 3 point in time of interest. Whenever data 2 is modified you must copy the current row, and 1 mark it as deleted.

Score: 1

Use Log Triggers

All data changes are captured, giving 2 the ability to query as if at any point 1 in time.

Score: 0

SQL Server 2005 (onwards) Enterprise Edition 1 has the ability to create Database snapshots

Score: 0

Oracle from version 9i have Flashback technology, which 4 is in Oracle 10g and 11g much improved and 3 you can see state of your database at any 2 given point in history, provided you enable 1 flashback.

Check this document: Flashback Overview

Score: 0

You can use the logs produced by your RDBMS 16 to obtain snapshots of your data. Normally 15 the logs are used to provide database recovery. They 14 can however also be used to replicate the 13 data across several RDBMS instances or to 12 get snapshots of the data.

To get a snapshot 11 of your data, simply take into account all 10 the logs produced before the desired moment 9 in time. You then "play-back" those 8 logs to obtain the actual database with 7 your data restored.

How to access and "play-back" the 6 logs depends on the concrete RDBMS product 5 you use.

Another possibility is to use temporal 4 databases. They have time-aspects built 3 in and allow "going-back-in-time". Look 2 for "Oracle Flashback Technology" for 1 example. http://www.stanford.edu/dept/itss/docs/oracle/10g/appdev.101/b10795/adfns_fl.htm#ADFNS1008

Score: 0

With SQL Server at least, you can use Full 8 logging and keep the transaction logs between 7 each backup set.

Then you can do an point-in-time 6 backup.

That's a poor solution.

What exactly 5 does your client want? Is it for analytical 4 purposes (i.e. the questions are like how 3 many orders did we have two weeks ago) ? Because 2 that's exactly the problem that a datawarehouse 1 solves.

Score: 0

Maybe consider using a NoSql solution like 6 MongoDB to aggregate all of your relational 5 data into a single document, then store 4 that document with a timestamp or version 3 number. Solutions like Kafka-Connect or 2 Oracle Golden Gate simplify piping relational 1 data into NoSql stores.

Score: 0

I would use an additional timestamp field 4 on each table, whether it is master table 3 or fact/ transaction table. And all table 2 need to be newly inserted even when the 1 goal is to update.

More Related questions