Search This Blog

Wednesday, October 21, 2009

SAP BW Open Hub - Introduction

In general, SAP BW Open Hub Service is a method that reads data from a SAP BW object, namely an InfoCube, DataStore Object or InfoObject and places the contents into a staging area for consumption. It provides APIs for all the steps required, to start the Open Hub Service Job, read the data and to tell it when the data was completely read.
So the interaction between DataServices and Open Hub Service contains the following steps
1. A DataServices Job is started including many processing steps. One of these steps is a DataFlow with an Open Hub Destination as source.
2. Before the dataflow can start reading the data, a DataTransferProcess (DTP) has to run filling the Destination with the data. But a DTP cannot be started directly from DataServices, only a ProcessChain containing one. So with each Open Hub Destination Reader, a SAP BW ProcessChain is associated.
3. The ProcessChain potentially can include many steps, at one place it will call a DTP reading the source data and writing into the Open Hub destination.
4. Once all the data is copied into the Destination, the DTP pauses and sends a notification to the DataServices RFCServer. Hence it has to be configured first.
5. The RFCServer writes the notification information into a repository table (AL_SAP_JOB) and with that the DataServices DataFlow can finally start with reading the data from the OpenHub Destination.
6. During the completion of the DataFlow we have to tell the DTP the fact that this OpenHub Destination was successfully read.
7. So a BAPI call is made to update the status of the DTP to green and with that...
8. the ProcessChain can continue until its logical end.
9. In parallel, the next DataServices task is started



Above architecture has a couple of consequences one needs to be aware
 Can a ProcessChain contain multiple steps? Yes, but just one DTP meant for DataServices. Otherwise each OpenHub Destination reader would trigger the same ProcessChain, executing it twice hence.
 Can two different OpenHub Destinations be used in parallel? Yes, no problem.
 Can the same OpenHub Destination be used in different DataFlows? Yes, but not in parallel. Each DTP will truncate the Destination table and load its data into it. So neither it is good to truncate the data while another DTP is still in the process of copying the data, nor should the truncation be done when the Destination has the data but DataServices has not been able to read all yet. In other words, a DTP is done once its status is set to green - by DataServices.
What if multiple developers are working on the same dataflow? Same argument as with other tables. Should a target table be loaded by two developers at the same time? No. However, the actual OpenHub Destination table is editable to enable the same dataflow being tested in parallel. Just to mention that...
 Can a DTP be started manually via BW, and DataServices just reads that data, e.g. for testing?
 In step 5 the RFCServer writes in the AL_SAP_JOB table of the repo. Which repo in case there are multiple?

No comments:

Post a Comment