Home Data Replication How to compare database changes based on transaction journals.
Tuesday, 23 October 2012 21:34

How to compare database changes based on transaction journals. Featured

Written by 
Rate this item
(2 votes)

On the site is available i5/OS CDC connector for Informatica PowerCenter which  is able to read transaction journal entries from i-System (old As/400) in real time.  This entries can be used to replicate changes to other database, to audit database changes made by users or to compare two processes.

Use Case Description.

There are two identical databases on two different i - Systems (As/400 Systems) and two application sets, which calculates actuarial reserve. One set of applications is the old one. The second applications set must give the same results but is optimised in order to run faster. The challenge is, how to compare what the two sets do the same and what they do differently.

Solution

It is sufficient to run two workflows which calculates check sums. Workflows use real time i5/OS (as/400) CDC connectors to read journal entries from AS/400 machine to Informatica machine. The workflows use transformations to calculate hash for each Job, Program, Profile, Library, Table, Field or any subset of these. That is, for instance, one hash for Job, Table, and Field. After each step of calculation is finished, it is possible to compare check sums and answer whether outcomes are identical or  what is different. It is not required to stop processing to compare check sums and to figure out which processes are different.

There is no need to do time consuming configuration for each table used in the processing. Jobs, programs,user profiles,libraries, tables and fields are discover automatically when they are used by applications and check sums are calculated for them. Having calculated detailed check sums it is possible to aggregate them.  It means, having check sum for each Job, Entry type, Program, Table and Field it is possible to aggregate them to have check sum for Job, Table and Field. Usually, it is necessary to have check sum for fields as some fields like "Processing Time"  or "Job number" can always be different.

What sample workflows do?

Prerequisite: "i5/OS CDC connector for PowerCenter". See the zip file to be downloaded below. 

In the Repository Manager import CDC_Replication_from_as400.xml. The file is located in Samples sub folder of downloaded zip file.

In the sample, there are used two workflows which read CDC data for two processes and calculate partial hashes. Third one workflow aggregates the hashes. Aggregated hashes show differences between the two processes.

 

Workflows wf_CheckSum_CDC_FirstDatabase and wf_CheckSum_CDC_SecondDatabase read journal entries from As/400 but from different journals. Run CDCCFG.exe to see which journals entries are collected from.  Field "CDC Library" defines library where "CDC services" are installed. Field "Infa user" defines user which PowerCenter uses to connect to AS/400. Picture below, shows no "CDC services" configured as there is no row in the "Data queues status" grid. To setup test "CDC services" press "Setup" button. It does more then setting up test services. It does initial setup for CDC environment like granting authority for "Infa user".

  

 

After setup press "Get Data" button to get configuration. Configured test CDC services are presented below. First service reads entries from journal QSQJRN from library COPACK1. The journal is associated with data queue FIFODTAQ. Workflow wf_CheckSum_CDC_FirstDatabase is configured to read from the data queue. It means, It reads entries from QSQJRN journal.  Second service reads data from JRN journal and populates FIFODTAQ1 data queue. Workflow wf_CheckSum_CDC_SecondDatabase reads entries from the data queue.

 

. To start "CDC services" on as/400 press button "Start". Press "Get Data" to refresh "Data queue status". In "LASTSEQN" field is shown last sequence number read from journal receiver with name taken form  "RCVNAME" field. 

 

Having "CDC services" started, run both workflows: wf_CheckSum_CDC_FirstDatabase and wf_CheckSum_CDC_SecondDatabase. (Define data queue connection to your as/400 system and configure sessions in both workflows!!! ). Check sums are written to $PMTargetFileDir\Checksums\ folder. To generate check sums, change files TEST and TEST1. Changes for file TEST are written to journal QSQJRN, changes for file TEST1 are written to journal JRN. Both files are in configured "CDC Library". Check sums are written at every commit in the case every 200000 rows. It can be changes on session level. To force flush of check sums and stop the workflows properly press "Stop Workflow". Workflow defined by current row of "Data queues status" is stopped. 

 

When required check sums are generated run workflow wf_CheckSums_Agregation. The workflow requires folder "CheckSum" to be copied to $PMSourceFileDir. The file pliki.bat is prepared for windows. For UNIX change the file name to pliki.sh, change command inside and session configuration for attribute COMMAND of the source. So

$PMSourceFileDir\Checksums\pliki.bat $PMTargetFileDir/Checksums/Hash*.*

must be changed to

$PMSourceFileDir\Checksums\pliki.sh $PMTargetFileDir/Checksums/Hash

 

Aggregated check sums are populated to files "Aggregate..." in folder $PMTargetFileDir\Checksums\. The files contains:

AgregatedHash.out - check sum groupped for job, entry type, table and field name.

AgregatedHashJob.out - check sum groupped for job, entry type and field name.

AgregatedHashTable.out - check sum groupped for entry type,table and field name.

Only check sums not equal 0 are written to the files. It means, only differences are in the aggregation files. Check sums is not depended on journal entry order. That is add Row "1", add row "2" gives the same check sum as add row "2" and add row "1". 

 

In the test case, operations on tables TEST and TEST1 are compared. So the best file to check differences is AgregatedHashJob.out.

  

 

Read 6424 times Last modified on Sunday, 05 January 2014 22:43