Extraction Methods and Tools

The following diagram illustrates how the various extraction/replication methods might fall and interact within a DSP solution:

SAP RFC & BOA RFC

RFC allows for the extraction of SAP pooled and clustered tables. The RFC extraction method calls an RFC on the SAP application server to run a SQL command. Data is returned in a Binary Large Object (BLOB) in blocks (e.g., 10,000 records) that are then parsed by using SAP RFC, BOA RFC or BODS RFC. The RFC is called again to extract the next block of records (in this example, the next 10,000 records). The process repeats until the entire table has been extracted.

Assemble

Assemble is a tool within the DSP that creates and executes packages to transfer data between systems. The tool uses an ODBC connection with a non individual-specific account with read-only access. Once a connection is established to a source, packages to refresh data are relatively easy to create and process in the DSP.  Assemble runs multiple threads and is the BackOffice Associates® preferred extraction method for all but the largest tables.  Additional configuration and tuning can be required for the largest tables that are over several million records (refer to the DSP Collect Delta Configuration image).

SSIS (SQL Server Integration Services)

A component of MS SQL Server, SSIS replaces Data Transformation Services (DTS). SSIS is typically faster than Assemble because a single program is both reading the source and loading the target. Depending on variables like hardware, connections and table width, SISS can achieve extraction speeds of several million records per minute. However, using SSIS to extract too many tables simultaneously can require additional CPUs and memory on the application server. Consequently, SSIS should be used judiciously to extract tables that cannot be efficiently extracted with other methods. The same delta configuration that can be implemented for Assemble can also be for SSIS when needed (refer to the DSP Collect Delta Configuration diagram).

DBMoto®

DBMoto® functionality (which is housed in Collect) that uses an ODBC connection to download data in any of three different ways: Refresh, Change Data Capture (or Mirroring) and Synchronization.

In Refresh mode, an entire table is extracted. A one-time full-table refresh is required for any tables that will be set up for change data capture or synchronization. Change data capture mirrors data changes made in a source system to a target system. Synchronization pushes data changes both directions between two systems. 

When performing change data capture or synchronization, DBMoto® leverages native change logs within the source to identify the subset of records that have been changed and need to be updated in the target system. If the source system does not contain native or accessible change logs, then DBMoto uses triggers on the source system to push changes.

Choose the Right Extraction Method

To aid in selecting the right extraction methods for the project, refer to the following table:

Collect Delta Configuration

For large tables with requisite date fields, a delta extraction process can be built in within DSP, as conceptualized in the following diagram: