Friday, August 14, 2009

SSIS != ETL

I've started to use Microsoft Sql Server Integration Services @ work. On first impression, like many ms products: its purty.

The second impression is less favorable. An ETL tool has two by-definition requirements: The E-L, pulling data from some source(s) and pushing it to some target(s), and the T, meaningful transformations of the data.

SSIS does a service-able job at the E-L. Which is, by itself, no win. DTS, once upon a time also did, and SSIS leverages already existing Microsoft Technologies to do most of the work.

As far as the T goes, SSIS is, at best adequate. Relatively basic and common tasks are difficult even, if, more or less do-able.

The over-arching issue for SSIS is there is little reason to use it. As there are plenty of tools out there providing data transfer between heterogeneous sources, an ETL tool has to provide a reason to embed transformation logic in its platform instead of natively in database procedures, which, for me, SSIS, so far, fails to do:
  • A firm seeking database vendor independence wouldn't choose SQL Server Integration Services.
  • An ETL developer requiring drag and drop GUIs (and so, unable to code procedures) likely lacks the ability to handle reasonable complicated transformation logic.
  • SSIS does not, yet, provide useful few-click implementations of common ETL tasks. For example, while it has a SCD transformation, I've found it does not always behave as one, or at least I, would like. Or, more egregiously: the nightmare that is the pivot transformation.
  • Finally, the most frustrating to me is that, at the very least, I expect an ETL tool to give me robust logging and auditing reasonably free. If I write a stored procedure, I have to manually intermingle code capturing and loggic basic metadata (e.g.: How many rows were retrieved from a source, how many were inserted into the various destinations) with my transformation logic. An ETL tool can capture and log that behind the scenes. What makes SSIS' failure in this regard completely fustrating is that MicroSoft understands, exactly the sort of logging a real world project would desire and how clunky and intrusive it is to implement in SSIS.
I have not played yet with huge data sets, it is possible that SSIS offers some advantages over t-sql for those. The general defense of SSIS seems to go: It is a relatively new platform and all its limitations will be addressed as it matures.

As it exists, I suspect the best way to make it useful is to build some template packages, an internal (xml) package definition configuration file and dotnet code which programmatically builds packages, which entails an awfully high start up cost. In my current implementation I am limiting my use of SSIS, mostly, to its E-L ability, and, for the moment, as a stored procedure runner.

In the end, of course, it is unfair to hold to basically free SSIS to the bar set by far more expensive alternative tools.

No comments:

Post a Comment