代码改变世界

SAP BI:SSIS实现对SAP数据的抽取

2012-11-22 13:39  VincentMK  阅读(3329)  评论(0编辑  收藏  举报

Microsoft Connector 1.0 for SAP BI is delivered in the Microsoft SQL Server 2008 Feature Pack. It enables data extraction from and to SAP NetWeaver BI in both Full and Delta modes via standard interfaces, within the Microsoft SQL Server Integration Services environment. The SAP datasets supported by the connector include SAP BI InfoProviders like InfoCubes, Data Store Objects (DSO), and InfoObjects.

The Microsoft Connector 1.0 for SAP BI has three main components:

  • SAP BI Source, to extract data from SAP BI
  • SAP BI Destination, to load data into SAP BI
  • SAP BI Connection Manager, to manage the RFC connection between the Integration Services package and SAP BI

Microsoft Connector 1.0 for SAP BI is an add-in for SQL Server Integration Services. It provides an efficient and streamlined solution for integrating non-SAP data sources with SAP BI. It also enables the construction of data warehouse solutions for SAP data in SQL Server 2008, where SAP BI is exposed as a data source of SQL Server.

 

Microsoft Connector 1.0 for SAP BI has the following requirements:

  • Windows Server 2003 and later, Windows Vista, or Microsoft Windows XP Professional with Service Pack 2.
  • SQL Server 2008 Integration Services. Microsoft Connector 1.0 for SAP BI needs to be installed on the same computer where Integration Services is installed.
  • Windows Installer 4.5 and later.
  • Extracting data using Microsoft Connector 1.0 for SAP BI from SAP BI system requires the SAP Open Hub license. For more information about SAP licensing, consult your SAP representative.
  • On the SAP BI system, SAP_BW component support package level 16 (as part of SAP NetWeaver Support Pack Stack 14) is required. SAP_BW component support package level 17 or higher is strongly recommended.
  • To use Microsoft Connector 1.0 for SAP BI in 32-bit (64-bit) mode on any 32-bit (64-bit) operating system, The 32-bit (64-bit) version of librfc32.dll needs to copied to the following location: %windir%\system32.
  • To use Microsoft Connector 1.0 for SAP BI in 32-bit mode on a 64-bit operating system, the 32-bit librfc32.dll needs to be copied to the following location: %windir%\SysWow64.

Notes

  • Microsoft Connector 1.0 for SAP BI can only be used with SQL Server 2008 Integration Services. However, you can load data from or extract data to SQL Server 2008, SQL Server 2005, or SQL Server 2000 databases.
  • Librfc32.dll is a component owned by SAP. Microsoft does not support this SAP component and assumes no liability for its use.
  • Microsoft Connector 1.0 for SAP BI does not support SAP BW 3.5 and earlier versions.
  • Extracting data from an SAP BI system by using Microsoft Connector 1.0 for SAP BI only supports Open Hub Destinations. It does not support InfoSpokes, because InfoSpokes are obsolete in SAP NetWeaver BI.

With Microsoft Connector 1.0 for SAP BI, it is now possible use components of the SQL Server platform to move data in and out of SAP BI.

 

Figure 1: Overview of the solution architecture

 

This scenario uses an Integration Services package that leverages the “SAP BI Source” component. It treats SAP BI as a data source for a SQL Server database. Behind the scenes, SAP’s Open Hub Services interface is used to fetch data from SAP BI InfoProviders.

 

To configure SAP BI to extract data into a non-SAP destination such as SQL Server, you need to follow these steps:

  1. Set up the RFC Destination.
  2. Configure and create the Open Hub Destination.
  3. Create the Data Transfer Process (DTP) and transformation.
  4. Define parallel processing.
  5. Define the size of the data package.
  6. Configure the process chain.
 

In transaction code SM59 on SAP BI, create a new HTTP connection with type T (TCP/IP Connection), as shown in Figure 2. Under Activation Type, select “Registered Server Program”. Then, fill in an appropriate Program ID, which can be any descriptive short text. The RFC Destination and Program ID will be used later to set up the connection manager in Integration Services.

Figure 2: Configuring the RFC Destination in SAP BI

 

There are two Open Hub implementation options in SAP BI: the legacy InfoSpoke, and the new Open Hub Destination via Data Transfer Process (DTP). The InfoSpoke is marked as obsolete in SAP NetWeaver BI. Therefore the Microsoft Connector 1.0 for SAP BI officially supports only the Open Hub Destination.

 

In Admin Workbench on SAP BI (transaction code RSA1), create a new Open Hub Destination with Destination Type “Third-Party Tool”, and specify the previously created RFC Destination name (Figure 3). Save and activate the new destination.

Figure 3: Creating the Open Hub Destination in SAP BI

 

Create a Data Transfer Process under the Open Hub Destination. Specify Full or Delta for Extraction Mode. Activate the DTP. Check and activate the Transformation.

Figure 4: Creating the Data Transfer Process in SAP BI

 

By default, SAP BI sets the number of parallel DTP processes as a value greater than 1 for performance reasons. This is configurable through SAP transaction code RSBATCH (SAP BI Background Management).

Figure 5: Configuring Parallel Processing in SAP BI

We want to keep the number of parallel processes to a reasonable value for the overall DTP process type DTP_LOAD, but this parallelism can lead to a timeout error during the Open Hub DTP extraction through Microsoft Connector for SAP BI. To get around this issue, the number of processes for the Open Hub DTP should be set to 1 by following the steps below:

1. In the Open Hub DTP screen, select “Goto” from the menu, then “Setting for Batch Manager”:

Figure 6: Opening Batch Manager in SAP BI to configure the number of parallel processes

2. Change the Number of Processes to “1”.

Figure 7: Configuring the number of parallel processes in SAP BI

3. Save the changed settings.

 

The default setting for the DTP data package is 50,000. Depending on the actual hardware infrastructure, adjusting the package size may improve the extract, transform, and load (ETL) performance. Note that the Microsoft SAP BI source will read the DTP package size to determine the actual data packet size in the Integration Services package. It is highly recommended to reach an agreement on the size that balances the concerns of the SAP Basis team and the SQL Server DBA. In reality, a value between 50,000 and 200,000 should satisfy most needs.

Figure 8: Defining the data package size in SAP BI

 

A process chain is required to work with the Microsoft Connector (Figure 9).

Figure 9: The two nodes that are the minimum requirement for a process chain in SAP BI

The process chain must contain at least these two nodes:

  • Start node with the scheduling option “Start Using Meta Chain or API” (Figure 10)
  • Data Transfer Process node

Figure 10: Configuring scheduling options for a process chain in SAP BI

After you activate the process chain, it is ready to be called from the Integration Services package.

 

Configuring the Integration Services package in Business Intelligence Development Studio involves three main steps:

  1. Add the “SAP BI Source” as a source in the data flow.
  2. Set up the connection manager for SAP BI.
  3. Define the workflow of the package.
 

After you install the Microsoft Connector from the SQL Server 2008 Feature Pack, go to Business Intelligence Development Studio and create a new Integration Services project. The Microsoft Connector components are not available in the Toolbox until you add them manually. To add them, right click Data Flow Sources in the Toolbox, click ChooseItems, and then on the SSIS Data Flow Items tab, select the SAP BI Source check box, as shown in Figure 11.

Figure 11: Adding the SAP BI source to the Toolbox in Business Intelligence Development Studio

Now SAP BI Source is available in Data Flow Sources (Figure 12).

Figure 12: The list of Data Flow Sources in the Toolbox in Business Intelligence Development Studio after adding the SAP BI source

Setting Up the Connection Manager for SAP BI

In the Integration Services package, add a new connection and choose SAPBI (Figure 13).

Figure 13: Adding a new SAP BI connection to an Integration Services package

After the connection is created, in the SAP BI connection manager dialog box, edit the connection and fill out the system and logon information. Click Test Connection to verify successful configuration (Figure 14).

Figure 14: Configuring the SAP BI connection manager

 

Adding and Configuring the SAP BI Source

In Business Intelligence Development Studio, drag the SAP BI source to the data flow of the package (Figure 15).

Figure 15: The representation of the SAP BI source in the data flow of a package

Edit the source by choosing the appropriate SAP BI connection manager, specifying the RFC destination, and choosing the previously-created process chain (Figure 16).

Figure 16: Configuring the SAP BI source on theConnection Manager page of the SAP BI Source Editor

Note the different execution modes that are available:

  • P– Trigger Process Chain: The specified process chain is started, the extraction is made, and after ending the extraction, data is extracted in packets.
  • W– Wait for Notify: No process chain is started; instead the tool only waits until it is notified of that the extraction is complete. Someone else is responsible for starting up the extraction (for example, SAP’s own scheduler).
  • E– Extract Only: A process chain is not started, and the source does not wait for notification. Instead, the Request ID entered in the field “Request ID” is used to retrieve data that is hidden behind the respective request.

If the Integration Services package will initiate the ETL process from SAP BI, then the mode “P” should be chosen to trigger the SAP BI process chain for data movement through Open Hub. This is the most suitable option for a “pull” pattern.

The mode “W” is the best for a “push” pattern. In this mode, SAP BI schedules its own internal ETL, and then it starts the Open Hub DTP to push data to SQL Server.

The mode “E” is used when there is an error during the ETL and a particular request needs to be reprocessed. This is mostly useful during testing, or in production during a data recovery process.

Note that the Extract-Only mode will fail if there are multiple packages within one request. This failure occurs because the SAP BI system does not provide the number of packets correctly when the Read function of the Open Hub API is called. To work around this limitation and support Extract-Only mode, increase the package size in the DTP of the Open Hub Destination to a value greater than the number of rows that will be extracted. As a result, only one package is created.

Configuring the Advanced Settings

There are three main options available on the Advancedpage of the SAP BI Source Editor:

  • String conversion options
  • Timeout setting
  • Request ID reset

Figure 17: Configuring advanced options for the SAP BI source on the Advanced page of the SAP BI Source Editor

Timeout and Request ID are very important.

Timeout specifies the valid period that the Integration Services destination should wait for the SAP BI source, before the package fails due to a timeout error. If an Open Hub DTP is expected to run for a long time, as in a full initial extraction, increase the timeout to a large enough number to avoid the timeout error. However, for routine delta loads, where the duration is not so long, enter a realistic timeout value. Any value between 300 and 3600 should be acceptable under normal delta circumstances.

Request ID can be used to reset a DTP that encountered a problem. If a DTP load is stuck in Yellow status in SAP BI, the request can be reset to Green. After a request is successfully reset, it can be deleted in SAP BI in Admin Workbench Monitor. For more information about DTP request status, check the SAP system table RSBKREQUEST table on SAP BI, and look under the columns USTATE (User-Defined Processing Status for a DTP Request) and TSTATE (Technical Processing Status for a DTP Request). The overall DTP status will be successful when both USTATE and TSTATE of a DTP request indicate success (value “2”). Figure 18 shows all available values of USTATE and TSTATE.

Figure 18: The available values for the status of a DTP request in SAP BI

Adding and Configuring the Destination

After you set up the SAP BI source, define the destination in the package. An OLE DB destination is commonly used for this purpose. Based upon the metadata from the SAP BI source, the system may propose a table creation script if the target table is not available in the database. After the column mapping is done, the Integration Services package is ready to run (Figure 19).

Figure 19: A data flow for extracting from an SAP BI source to a non-SAP destination

 

Figure 20: Overview of the solution architecture

 

Sometimes non-SAP data needs to be moved into SAP BI, but it can be challenging to load some data sources into SAP BI. This challenge can be solved by using the SAP BI Destination component in Integration Services. Because Integration Services is versatile in supporting various types of data sources, like XML and flat files, it is now possible to have a unified ETL platform to move data into SAP BI. This versatility can be particularly useful in a heterogeneous environment for ad-hoc reporting or for data analysis and processing purposes. The SAP BI destination component greatly expands SAP BI’s capability in extracting data from non-SAP environments.

 

To configure SAP BI to load non-SAP data, you set up the data source and the ETL.

 

A new “External System” source system needs to be set up in SAP BI to be able to communicate with the SAP BI Destination component in Integration Services. This can be achieved in Admin Workbench (transaction code RSA1), by selecting “Source Systems” from the left panel. This selection leads to the RFC Destination setup screen.

Figure 21: Configuring a source system on the RFC Destination screen in SAP BI

 

The InfoSource and InfoPackage can either be set up within SAP BI’s Admin Workbench, or in Integration Services from within the SAP BI Destination Editor dialog box.

Figure 22: Creating SAP BI objects directly from the SAP BI Destination Editordialog box

Note that the objects created from the SAP BI Destination Editor dialog box are put under the “Unassigned node”application area in SAP workbench. If you prefer a dedicated application area, consider creating the objects in SAP BI Admin Workbench.

 

Configuring the Integration Services package in Business Intelligence Development Studio involves three main steps:

  1. Add the “SAP BI Destination” as a destination in the data flow.
  2. Set up the connection manager for SAP BI.
  3. Define the workflow of the package.
 

Figure 23: Adding the SAP BI destination to the Toolbox in Business Intelligence Development Studio

 

Create a new connection manager for SAP BI first. The details can be found in the setup steps for Application Scenario 1. For more information, see “Setting Up the Connection Manager for SAP BI” earlier in this white paper.

 

After the InfoPackage and InfoSource are available, add the SAP BI destination to the data flow of the package. Then configure the destination in the SAP BI Destination Editor dialog box.

Figure 24: Configuring the SAP BI destination on the Connection Manager page of the SAP BI Destination Editordialog box

The data flow of the package now looks like this.

Figure 25: A data flow for loading from a non-SAP source to an SAP BI destination

 

A compelling use case is to leverage Microsoft Connector 1.0 for SAP BI to move the multidimensional data in SAP BI’s InfoCubes to SQL Server Analysis Services cubes, with all the dimensional structures and content intact. The main objective is to migrate SAP BI InfoCubes to SQL Server cubes efficiently, in order to construct an Analysis Services based enterprise data warehouse. This use case demonstrates that this objective can be achieved with stability, quality, and performance, and with a relatively small amount of effort.

 

When SAP BI Open Hub processes InfoCube data, it flattens the multidimensional structure into a relational structure. So the design idea is to mirror the same flat structure first in a staging table, then reconstruct the dimensions in the Analysis Services cube.

Figure 26: Overview of the solution architecture

 

The standard SAP InfoCube 0FIAP_C03 is used. Its dimensions and fact table metadata are shown in Figure 27:

sapbi_metadata.jpg

Figure 27: Metadata for the dimensions and fact tables in standard SAP BI InfoCube 0FIAP_C03

The flattened Open Hub structure is shown in Figure 28.

sapbi_openhub.jpg

Figure 28: The flattened Open Hub structure in SAP BI

The SAP BI process chain and Integration Services package are shown in Figure 29.

sapbi_3.jpg

Figure 29: The configuration of the process chain in SAP BI, and of the data flow in the SQL Server Integration Services package

The column mappings in the Integration Services package are shown in Figure 30.

Figure 30: The column mappings between the SAP BI source and the destination on the Mappings page of the OLE DB Destination Editor dialog box

The matching structure of the data in SQL Server Analysis Services is shown in Figure 31.

sapbi_4.jpg

Figure 31: The structure of the SQL Server Analysis Services cube based on the data extracted from SAP BI to SQL Server

After the Analysis Services cube is set up, it needs be deployed. Then, each dimension and the cube itself can be processed to dispatch data from the staging table to each dimension respectively.

 

An easy way to validate the data quality after the cube migration is to run and compare reports on SAP BI and Analysis Services.

Here is the result of an SAP BI BEx query against the SAP BI InfoCube.

Figure 32: Viewing the data in the InfoCube in SAP BI

Here is a Microsoft Excel® PivotTable® report against the Analysis Services cube.

Figure 33: Viewing the data from the SQL Server Analysis Services cube in an Excel PivotTable report

Here is a SQL Server Reporting Services report against the Analysis Services cube.

Figure 34: Viewing the data from the Analysis Services cube in a Reporting Services report

The query results on SAP BI and in the Analysis Services cube match precisely.

 

This paper has described the functionality of the Microsoft Connector 1.0 for SAP BI, and provided detailed step-by-step instructions on how to use the connector in SQL Server Integration Services. A realistic use case is presented with the design highlights and rationale. Overall, the connector bridges the gap to support building an enterprise data warehouse solution centered on Microsoft SQL Server 2008 in a heterogeneous environment with heavy presence of SAP BI. It offers great flexibility and efficiency for extracting non-SAP data into SAP BI, and for extracting SAP BI data into a SQL Server data warehouse.

By utilizing the Microsoft Connector 1.0 for SAP BI effectively, it is now possible to construct a streamlined end-to-end data warehouse and business intelligence solution based upon Microsoft technologies for enterprises running SAP, with lower TCO, better design, and more flexibility.