Unit Testing Template For Etl Process
SQL Server - Unit and Integration Testing of SSIS Packages By August 2013 Get the Code: I worked on a project where we built extract, transform and load (ETL) processes with more than 150 packages. Many of them contained complex transformations and business logic, thus were not simple “move data from point A to point B” packages. Making minor changes was not straightforward and results were often unpredictable. To test packages, we used to fill input tables or files with test data, execute the package or task in Microsoft Business Intelligence Development Studio (BIDS), write a SQL query and compare the output produced by the package with what we thought was the correct output. More often we just ran the whole ETL process on a sample database and just sampled the output data at the end of the process—a time-consuming and unreliable procedure.
Unfortunately, this is a common practice among SQL Server Integration Services (SSIS) developers. Even more challenging is to determine what effects the execution of one package has on subsequent packages. As you build your ETL process, you create a network of connected packages and different resources. It’s difficult to maintain a complete overview of the numerous dependencies among all of these at all times. This article explains how to perform unit and integration testing of SSIS packages by introducing a library called SSISTester, which is built on top of the managed SSIS API. After reading this article you should be able to use the described techniques and tools to automate unit and integration testing of your existing and new SSIS projects.
To understand the article, you should have previous experience with SSIS and C#. SSISTester When I started thinking about a testing framework for SSIS packages, I found three aspects to be important. First, I wanted to have a similar UX to writing tests using the Visual Studio testing framework, so the typical methodology involving setup, verification and cleanup (aka teardown) steps had to be applied. Second, I wanted to use existing and proven tools to write, execute and manage tests. Once again, Visual Studio was the obvious choice.
And third, I wanted to be able to code tests in C#. With that in mind I wrote SSISTester, a.NET library that sits on top of the SSIS runtime and exposes an API that allows you to write and execute tests for SSIS packages. The main logical components of the library are depicted in Figure1. Figure 1 Logical Components of the SSISTester Library The Package Repository is used to store raw XML representations of target packages. Each time a test is executed, a new instance of the Microsoft.SqlServer.Dts.Runtime.Package class is deserialized from XML with all fields and properties set to their default values. This is important because you don’t want different tests that target the same package to accidently reuse any of the values set by previous tests. Instances of test classes are stored within the Test Repository.
These classes contain methods that implement your test cases. When a test is executed, these methods are called by the Test Engine.
ETL Testing Scenarios - Learn ETL Testing starting from Introduction, Tasks, ETL vs Database Testing, Categories, Challenges, Tester's Roles, Techniques. ETL Test Scenarios are used to validate an ETL Testing Process. It involves validating the source and the target table structure as per the mapping document. SSISTester is the framework that makes unit and integration testing of SSIS. CodePlex is going read. You should use live tests to test you whole ETL process.
The specific rules that must be followed when creating test classes will be described in detail later. Metadata contains the attributes needed to decorate a test class so it can be recognized as a test implementation.
The Test Engine looks for these attributes when loading tests into the Test Repository. The Test Context represents a set of classes that provide access to the runtime information during different phases of the test execution.
For example, you can use these classes to access different aspects of a package being tested, such as variables, properties, preceding constraints, connection managers, currently executing task, package errors and so forth. The Test Engine refers to the core classes and interfaces of the SSISTester API that directly utilize the managed SSIS runtime. They are used to load packages and test classes into their respective repositories, as well as to execute tests and to create test results. Mini ETL To create packages and test classes, I’ll use Visual Studio 2012 and SQL Server 2012, and I’ll use three packages to illustrate a simple ETL scenario in which customer data, delivered as a text file, is transformed and stored within a database. The packages are CopyCustomers.dtsx, LoadCustomers.dtsx and Main.dtsx.
CopyCustomers.dtsx copies the Customers.txt file from one location to another and on the way it converts all customer names to uppercase text. Customers. Carta Semilogaritmica Bode Pdf To Excel more. txt is a simple CSV file that contains ids and names of customers, like so.
CREATE DATABASE [Demo] GO USE [Demo] GO CREATE TABLE [dbo].[CustomersStaging]( [Id] [int] NULL, [Name] [nvarchar](255) NULL ) ON [PRIMARY] GO The package Main.dtsx contains two Execute Package tasks that execute the sub-packages CopyCustomers.dtsx and LoadCustomers.dtsx, respectively. Connection managers in both CopyCustomers.dtsx and LoadCustomers.dtsx are configured using expressions and package variables. The same package variables are retrieved from the parent package configuration when executed from within another package. Creating Unit Tests To begin, create a console project and add assembly references to SSIS.Test.dll and SSIS.Test.Report.dll.
I’m going to create a unit test for the CopyCustomers.dtsx package first. Figure 2 shows the control flow (left) and data flow (right) for CopyCustomers.dtsx. Figure 3 Control Flow (Left) and Data Flow (Right) of the LoadCustomers.dtsx Package When a test targets a specific task, only that task is executed by the Test Engine. If the successful execution of the target task depends on the execution of preceding tasks, the results of executing those tasks need to be manually generated. The DFT Load Customers data flow expects the target table to be truncated by the SQL Truncate CustomersStaging task.
Further, the data flow expects the transformed Customers.txt file at a specific location. Because this file is created by the CopyCustomers.dtsx package, I need to copy it manually. Here’s the Setup method that does all this. Creating Integration Tests The basic idea of unit tests is to isolate all of the possible effects other packages or tasks may have on the one being tested.
Sometimes it can be challenging to create a realistic test setup and the initial conditions needed for a unit test to ensure the package or task being tested behaves like a part of a complete ETL process. Because you usually implement ETL processes with a number of packages, you need to perform integration tests to be sure that each package works well when run as part of that process. The idea is to define probing points in your ETL process where you want to perform tests, without having to stop the whole process. As the process progresses and reaches the probing point, your tests are executed and you can verify a “live” work-in-progress ETL process; hence the name, “live test.” A live test is basically a post-condition—defined for a package, task or event handler—that needs to be satisfied after the package, task or event handler has executed. This post-condition corresponds to the verification step of a unit test.
Live tests are different from the unit tests because it’s not possible to prepare the test prior to package execution or to perform a clean-up step afterward. This is because unlike a unit test, a live test doesn’t execute the package; it’s the other way round: A package executes a test when it comes to the probing point for which a post-condition is defined. Figure 7 illustrates this difference. Note the position of the package in both figures. When running unit tests, the Test Engine explicitly executes a unit test by calling its Setup, Verify and Teardown methods. A package is executed as a part of this Setup-Verify-Teardown sequence.
Thanks to the following technical experts for reviewing this article: Christian Landgrebe (LPA) and Andrew Oakley (Microsoft) Christian Landgrebe leads the database team at LPA, focused on delivering BI solutions to clients in the financial and banking industry. Andrew Oakley is a Senior Program Manager on the patterns & practices team. Prior to becoming a Program Manager, Andrew spent two years as a Technical Evangelist for Visual Studio and the.NET platform. His current project focuses on data access guidance around building polyglot persistent systems using relational and NoSQL data stores.