gcinsight_ssis1_thumbWhile Geocaching is a great set of data to use for analytics, sourcing it can be problematic.  The only source of Geocaching data is the geocaching.com website, which places a number of hard limits on the amount of data which can be extracted.  To overcome these limitations and get the data into a format in which I could perform analysis, I devised a procedure for acquiring the data, de-duplicating it, and loading it into SQL Server.  While the former portion of the process used geocaching-specific tools to access the API and download the raw data, the latter half of the process utilizes SSIS to perform an ETL process which exports the data from an application-specific SQLite database to Excel then imports the new and changed items in the spreadsheet into the SQL database.

 

 

Screenshot of Geocaching Import SSIS package

Screenshot of Geocaching Import SSIS package

 

Screenshot of Geocaching Import SSIS package (Read Input Data Flow)

Screenshot of Geocaching Import SSIS package (Read Input Data Flow)

 

Screenshot of Geocaching Import SSIS package (Load Data Data Flow)

Screenshot of Geocaching Import SSIS package (Load Data Data Flow)