SQL Server Integration Services (SSIS)
SQL Server Integration Services (SSIS)
Introduction
Hello, welcome to the SSIS course. This is an advanced level database course and it requires the knowledge of databases.Before starting this course, I would recommend to complete Introduction to Database and
Introduction to Database Administration courses. These two courses will help you understand SSIS
course as well as SSAS and SSRS courses.
If you have already completed these two courses, I believe this SSIS course will be more accessible for you.
1.What is SSIS?
SSIS stands for “SQL Server Integration Services“. It is a platform to build business-level data integration and data transformations solutions. It solves complicated business problems by copying or downloading data, loading into data warehouses, mining data and managing data.
The SSIS extracts and transforms data from various sources such as xml data files, flat files and load
the data into the destinations.
You can use SSIS graphical tools to develop solutions without writing a single line of code.
We will also cover in this course:
-Installation Microsoft SQL Server 2017
-Restore Sample Database
-Create DW in MS SQL Server
-Create Stage Area Table
-Implementation of data flow
2.Software Installation
In this video, we will cover how to install softwares that are needed for this course.
First, we need SQL Server 2017 to be downloaded.
Go to the below link to download SQL SERVER 2017
Link: Try SQL Server 2017 RTM on Microsoft Evaluation Center
Go to the below link to download SQL Server Data Tools (SSDT) for Visual Studio
Link: Download SQL Server Data Tools (SSDT) – SQL Server Data Tools (SSDT) | Microsoft Docs
Go to the below link to download SQL Server Management Studio (SSMS)
Link: Download SQL Server Management Studio (SSMS) – SQL Server Management Studio (SSMS) | Microsoft Docs
3.Setting Up / Sample Database Upload
After we’ve finished downloading and installing the required software, we’ll need to download sample databases from Microsoft.
In this video, we will demonstrate how to download AdvantureWorks2017 sample databases and then upload them to MS SQL Server so that we can play with them later.
Resources:
AdventureWorks sample databases – SQL Server | Microsoft Docs
4.Creating A Data Warehouse
Creating Data Warehouse Repository and Stage table using MS SQL SERVER 2022 & Data Tools
A staging area is primarily required in a Data Warehousing Architecture for timing reasons. In short, all the necessary data must be available before data can be integrated into the Data Warehouse.
Create a Stage Area Table.
This video will teach us how to create the Stage table using Visual Studio Data tools (Integration).
Upload XLS data file in the database.
Select newly created database > Click the right button > Select tasks > Select Import Data > Select data source “Microsoft Excel” > browse to select the file > Click open > Select Excel version 97-2003 > Click Next button > Select Destination “SQL Server Native Client 11.0” > Select database > then continue to finish the process.
Upload CSV data file in the database.
Select the newly created database > Click the right button > Select tasks > Select Import Flat File > > Browse to select the file > Click next button > In Modify Columns, change the Data Type option as shown in the video > Click next to finish the process.
This video will also show how to create an OnlineRetailDW and then upload raw data (dataset) to the data warehouse. Download RawData dataset files here:
5.Extract, Transform and Load RawData to the Stage table using the Data Tools Visual Studio Integration
How to perform the complete ETL (Extract, Transform, Load) process using SQL Server to move raw data into a staging table. We will walk through the essential steps of extracting raw data from various sources, transforming it to meet business needs, and loading it into the staging table for further analysis.
This video covers:
- How to extract raw data from files and databases.
- Data transformation techniques, such as data type conversion and cleansing.
- Loading the transformed data into the staging table.
- Best practices for handling large datasets during ETL.
- Practical SQL Server tasks and tools for efficient data management.
6.Creating Tables using Data Tools Integration Project (Execute SQL Task)
Download the queries below
Set up a New SSIS Project:
- Open Visual Studio 2022 and create a new Integration Services Project.
- Name the project and specify the location for the files.
Configure Connection Manager:
- In the Solution Explorer, right-click on the Connection Managers area and choose New OLE DB Connection.
- Configure the connection by specifying the SQL Server instance and database where you want to create the dimension and fact tables.
Create Dimension Tables with Execute SQL Task:
- Drag and drop an Execute SQL Task from the SSIS toolbox onto the Control Flow canvas.
- Double-click on the Execute SQL Task and configure the Connection (use the Connection Manager created earlier).
- In the SQLStatement section, write the SQL script for creating your dimension tables (e.g.,
CREATE TABLE DimCustomer (...)
). - Click OK to save the task.
Create Fact Tables with Execute SQL Task:
- Repeat the process by dragging another Execute SQL Task for creating the fact tables.
- In the SQL Statement section, write the SQL script for creating your fact tables (e.g.,
CREATE TABLE FactSales (...)
).
Link Tasks for Sequential Execution:
- Link the tasks by dragging the arrow from the first Execute SQL Task (dimension creation) to the second task (fact table creation). This ensures that the dimension tables are created before the fact tables.
Execute the Package:
- Once the tasks are configured, right-click the project and choose Execute Package to run the process and create the dimension and fact tables in your SQL Server database.
Verify Tables in SQL Server:
- After execution, open SQL Server Management Studio (SSMS) and verify that the dimension and fact tables have been created successfully.
7.ETL Data from Stage Table to Dimensions (SSIS)
the process of ETL (Extract, Transform, Load) from a staging table to dimension tables using SQL Server Integration Services (SSIS). This tutorial is essential for anyone involved in data warehousing or managing data transformations in Microsoft SQL Server.
8.Updating Stage table keys
Before we can ETL data from the stage table to the fact tables (business process), you need to update the stage table with keys generated using the system auto-increment feature in MS SQL Server 2019 or 2022. After running an update query for each dimension and updating each stage table’s foreign key, you can proceed with the ETL process to transfer data from the stage tables to the fact tables.
9.Altering Fact tables & adding Foreign Keys Constraints
In Kimball’s dimensional modeling techniques, a Degenerate Dimension refers to a dimension that does not have its dedicated dimension table but is instead stored directly in the fact table. This occurs when a dimension is represented by an attribute, such as an identifier (e.g., invoice number or transaction number), which doesn’t need to have descriptive attributes stored separately in a dimension table.
Key points about degenerate dimensions:
- Stored in the fact table: The degenerate dimension exists as a column in the fact table, typically a unique identifier or code, like an order number.
- No descriptive attributes: Unlike typical dimension tables (e.g., Customer or Product), degenerate dimensions don’t have additional descriptive fields or attributes.
- Useful for reporting: They are mainly used for tracking transactions or for reporting purposes, often when you need to slice the data by a specific transaction or identifier.
In SQL, when you define a foreign key constraint using the ALTER TABLE
statement, the name of the constraint must be unique within the database. This means that you cannot reuse the same constraint name for different tables. If you attempt to use the same constraint name (like FK_CustomerKey
) across multiple tables, the database will return an error because constraint names are unique identifiers in the schema.
The constraint name FK_CustomerKey
is being used to create a foreign key relationship between the CustomerKey
in the factSales
table and the CustomerKey
in the DimCustomer
table. However, if you try to use the same constraint name in other tables (e.g., another fact table or another table that also references DimCustomer
), you will encounter an error because constraint names must be unique within the database.
10.Creating a Star Diagram
The constraint name FK_CustomerKey
is being used to create a foreign key relationship between the CustomerKey
in the factSales
table and the CustomerKey
in the DimCustomer
table. However, if you try to use the same constraint name in other tables (e.g., another fact table or another table that also references DimCustomer
), you will encounter an error because constraint names must be unique within the database.
Although we didn’t use subcategories in our dimensions and our fact tables don’t have relationships yet, a more robust design would include dimension subcategories and connections between fact tables. In future videos, we’ll introduce factless fact tables and transition to a snowflake schema, which requires linking dimensions to other dimensions for a more detailed structure.
Conclusion
Thank you for taking the Introduction to SSIS course. In the future, we will offer a more in-depth course that will cover a lot more. To learn more about SSIS, please see the book listed below.
Created By Miraz Nabi & Asif Nabi
Lecturus is a platform that offers training to individuals interested in developing or enhancing their computer skills, as well as a career change or advancement.
Get In Touch
147 Prince St, Brooklyn, NY 11201
- Email: lecturus@outlook.com
- Phone: 929-280-7710
- Hours: Mon-Fri 9 AM - 5 PM
Lecturus is a platform that offers training to individuals interested in developing or enhancing their computer skills, as well as a career change or advancement.
Get In Touch
147 Prince St, Brooklyn, NY 11201
- Email: lecturus@outlook.com
- Phone: 929-280-7710
- Hours: Mon-Fri 9 AM - 5 PM