How to -- Underdtand Integration Services Engine #000

Data Flow and Control Flow

Data Flow: the transformation work together to process and manage data. All tansformation are doing work at the same time.(coordinated streaming)

Control Flow:  require completion before the susequent tasks are handled. execute both serially and in parallel

Note: Integration services allows the maximum number of parallel tasks that execute to be set on the package-by-package basis. the MaxConcurrentExecutables, the default setting is -1, indicating to Integartion Services to add 2 to the number of processors and use that value for the number of tasks to execute in parallel.

 

Control Flow

Data Flow

Task(one component only) is the smallest unit of work

Transformation(one component or a group of some components) is the smallest unit of work

Require completion before subsequent tasks are handled

Transformations work together to process and manage data

Workflow orchestration

Data correlation and transformation

Process-oriented

Information-oriented

Serial or parallel tasks execution

Streaming in nature

Synchronous processing

Coordinated processing

Source and destination

Memory Buffer Architecture

Types of Transfermation

  • Blocking nature -- streaming, blocking and semi-blocking
          Non-blocking : streaming/row based,
          Semi-blocking : Buffer
          Blocking   
  • Communication mechanism -- synchronous and asynchronous. How transformations that are connected to one another by path communicate with one another
          Asynchronous: A transformation output is asynchronous if the buffers used in the input are different from the buffer used  in the output(the output column and input column in the advance tab have different lineageid)
          Synchronous: A synchronous transformation is one where the buffers are immediately handed off to the next downstream transformation at the transformation logic
          A definitive way to identity  synchronous versus asynchronous is to look at the SynchronousInputID,property of the column output properties
    If the value is 0 or none, asynchronous. Else, synchronous(tied to the input).

Note: Streaming: It is rarely the case that the transformation will slip behind the rate of the data being fed to it.
Row Based:  one by one, in most cases there transformations may not be able to keep up with the rate at which the data is fed to them, and the buffers are held up until each row is processed.

Advanced data flow excuteion concepts

Source and Destination Adapters

  • The source will have the same list of input columns and output columns.
    Asynchronous

  • Destination, their buffers are de-allocated and data is loaded into the destinations.
    Synchronous
    Execution tree

Execution tree  

An execution tree is a section of data flow starting from an asynchronous output and terminating at inputs on transforms that have no synchronous outputs. Some examples might help to clarify the concept.

Source -> Derived Column -> Data Conversion -> Sort -> Destination

Ignoring error outputs, the preceding data flow has two execution trees. Sources are always asynchronous because there is no input for the output. The Derived Column has synchronous outputs as well as the Data Conversion transform. However the Sort has asynchronous outputs and, so, the first execution tree starts at the Source output and ends at the Sort input. The second execution tree starts at the Sort output and ends at the Destination.   

Monitoring Data Flow Exectuion

Pipeline logging events are available in the logging features of IS. To capture the event, create a new log entry through the logging designer window under SSIS menu logging optioin.

References:

http://technet.microsoft.com/en-us/library/cc966529.aspx

posted @ 2009-06-22 17:55  refeiner  阅读(268)  评论(0编辑  收藏  举报