ETL : Testing Basics
SQL types:
- DML (Data Manuipulation) - Insert , update , Delete , Merge
- DDL (Data Definition)-Create , alter,Drop,Truncate
- DRL (Data Retrival )- Select
- TCL (Transaction Lan)- Commit Rollback
- DCL (Data Control)- grant ,revoke
- It is a DataBase
- Contains data for Analysis
OLTP (Online Transaction Processing System)
- Used in Day to Day Transaction
- Ex : ATM , Net Banking
- Data is Captured into either DB , Flat File (xml) etc.,
- It is a smaller part of DataWarehouse
- It is specific to a department eg: Finance , HR etc.,
- It is used by middle level management to retrieve data wrt to their deptartment.
- Process of read data from OLTP and inserting into Data Warehouse
- The Data from Data Warehouse is used in Analytics report
Assume Sales departments has records of customer transaction in DB and Services has transaction in an csv file or flat file ( service number , service mechnaic name ,amount paid , spares replaced ,Date etc.,)
If management wants to find out the total transaction of a customer both sales and services how will they do ?
So they need to extract only relavent information from both data sources and load it into a common Destination .This is called ETL.
Hetrogenous Data : Data coming from different Data Sources like Flat file , Orcale DB, Mongo, My Sql ,txt,json etc.,
Mapping Document ( http://dwbitips.blogspot.com/2012/12/what-is-etl-mapping-document-real-time.html ) :
- Requirement Document
- Contains src , dest table and Column Details
- Conatins transformation query or forumula applied for each column
ETL testing = Data Integration Testing + BI testing (Report)
Generic Data Integration Testing :
- Scheme match wrt requirement
- Count match with src and destination
- Duplicate records (loaded twice),
- Null Validation
- Untransformed Data should match source and destination data (1:1).
- Mapping is correct ie., data is going to correct cols as per requirment
- Old data already residing is not affected
Late Arriving Dimension or Early Arriving Fact
- Happens when you get fact data before the dimension data arrives
- example: New employee just onboarded and had an accident before insurance forms could be completed and processed by the insurance company. The hospital will create a medical claim record to be paid by the insurance company, but the insurance company does not yet have a person to associate the claim.
- -1 = UNKNOWN, -2 = N/A, -3 = Not Provided etc.,
https://www.softwaretestinghelp.com/etl-testing-data-warehouse-testing/
https://www.guru99.com/utlimate-guide-etl-datawarehouse-testing.html
https://www.tutorialspoint.com/etl_testing/index.htm