Wednesday, July 3, 2013

Build a local environment for Salesforce data analysis with free tools

Salesforce.com is actually great for data analysis.  Its reports and dashboards are fantastic tools for quick analysis over the relational data in objects.  In fact in terms of integrating relational datastore and common BI needs, there isn't really anything better out there.  However the level of integration and ease of use has its downside: sometimes you just can't optimize for two different goals.  Compromises inevitably get introduced, as many long time SFDC users have realized.  There's no arbitrary join on the relational side.  There's volume and performance limits on the analysis side.  And there are other things related to the multi-tenant model different people love to complain about.  You just can't help but wonder, wouldn't it be nice if I can have an environment to manipulate and view my data with more freedom?  Well, you already can, if you have a nice ETL tool with a capable data warehouse.

What if I can't affort Cast Iron and Netezza some big blue guy really loves to sell to me?  What if I don't even have enough money to play with the nice SQL Server/SSIS guys?  

Well, where money falls short is a perfect point to start with some creativity.  I recently wired together Jitterbit Data Loader with MS SQL Server Express 2012, which provided a decent relational store for playing with Salesforce data.  Both tools are free.  SQL Express provides a 10 GB per database storage that should be plenty for your big objects - if you have even bigger ones, I just don't know why you can't afford expensive enterprise tools.  :-)  Jitterbit Data Loader provides direct transport from Salesforce to SQL Express.  The process of putting them together is fairly straightforward.  I only stumbled on a few points (info here assuming you're familiar with both tools):
  • To make SQL Express remotely available (if you need to run the two tools on different computers), use a fixed port (1433) for it's TCP/IP service instead of the default dynamic port.  Also the config screen is a little confusing (in 2012 version at least) - make sure to change the port setting for "All IP", otherwise the dynamic setting in All IP will overwrite whatever the change you have in for specific IP addresses (v4 or v6).
  • Use ODBC for the database connect in Jitterbit.  JDBC (specifically jTds) just plain doesn't work.  I've tried everything I know - use SQL Server authentication instead of Windows, correctly separate the instance name, no db name - eventually I just hand built the whole connection string but nothing worked.  ODBC, on the other hand, only needs one small tweak: use the instance=sqlexpress paramenter (\sqlexpress notation doesn't work).
  • Make sure the target tables in SQL Express have enough length for the fields (equal or bigger than defined in SFDC).  Otherwise the operation in Jitterbit ends with nativeError=8152, basically Msg 8152 in T-SQL, meaning "string or binary data would be truncated".
I think we can safely assume you can swap in another SQL database, for instance, MySQL, as long as you have the proper ODBC driver in place.  Who knows, Oracle XE can even work (it only has 4 GB per db though).

No comments:

Post a Comment