How to -- Underdtand Integration Services Engine #000
Data Flow and Control Flow
Data Flow: the transformation work together to process and manage data. All tansformation are doing work at the same time.(coordinated streaming)
Control Flow: require completion before the susequent tasks are handled. execute both serially and in parallel
Note: Integration services allows the maximum number of parallel tasks that execute to be set on the package-by-package basis. the MaxConcurrentExecutables, the default setting is -1, indicating to Integartion Services to add 2 to the number of processors and use that value for the number of tasks to execute in parallel.
Control Flow |
Data Flow |
Task(one component only) is the smallest unit of work |
Transformation(one component or a group of some components) is the smallest unit of work |
Require completion before subsequent tasks are handled |
Transformations work together to process and manage data |
Workflow orchestration |
Data correlation and transformation |
Process-oriented |
Information-oriented |
Serial or parallel tasks execution |
Streaming in nature |
Synchronous processing |
Coordinated processing |
|
Source and destination |
Memory Buffer Architecture
Types of Transfermation
- Blocking nature -- streaming, blocking and semi-blocking
Non-blocking : streaming/row based,
Semi-blocking : Buffer
Blocking - Communication mechanism -- synchronous and asynchronous. How transformations that are connected to one another by path communicate with one another
Asynchronous: A transformation output is asynchronous if the buffers used in the input are different from the buffer used in the output(the output column and input column in the advance tab have different lineageid)
Synchronous: A synchronous transformation is one where the buffers are immediately handed off to the next downstream transformation at the transformation logic
A definitive way to identity synchronous versus asynchronous is to look at the SynchronousInputID,property of the column output properties
If the value is 0 or none, asynchronous. Else, synchronous(tied to the input).
Note: Streaming: It is rarely the case that the transformation will slip behind the rate of the data being fed to it.
Row Based: one by one, in most cases there transformations may not be able to keep up with the rate at which the data is fed to them, and the buffers are held up until each row is processed.
Advanced data flow excuteion concepts
Source and Destination Adapters
-
The source will have the same list of input columns and output columns.
Asynchronous -
Destination, their buffers are de-allocated and data is loaded into the destinations.
SynchronousExecution tree
Execution tree
An execution tree is a section of data flow starting from an asynchronous output and terminating at inputs on transforms that have no synchronous outputs. Some examples might help to clarify the concept.
Source -> Derived Column -> Data Conversion -> Sort -> Destination
Ignoring error outputs, the preceding data flow has two execution trees. Sources are always asynchronous because there is no input for the output. The Derived Column has synchronous outputs as well as the Data Conversion transform. However the Sort has asynchronous outputs and, so, the first execution tree starts at the Source output and ends at the Sort input. The second execution tree starts at the Sort output and ends at the Destination.
Monitoring Data Flow Exectuion
Pipeline logging events are available in the logging features of IS. To capture the event, create a new log entry through the logging designer window under SSIS menu logging optioin.
References: