Geocaching Analytics – Part 3: GCInsight Data Preparation

At this point, we’ve gotten our data downloaded from the website and we’re sitting on a local treasure trove of data…but the data is contained within a proprietary program (GSAK) which limits our ability to make use of the data.  So, it’s time to work on extracting the data and getting it into a usable format.

The GSAK application will be our primary data source for acquiring Geocaching data as well as future updates to the data.  That being said, the data is not particularly easy to work with in its GSAK format.  Fortunately, GSAK stores its geocache data in a SQLite database, which is a compact database which runs as an application on command rather than a service.

In order to fully utilize the data, we’ll be building a process to export the GSAK data from SQLite into a SQL Server database.  Once the data is exported to SQL Server, we can perform any number of transformations and summarizations.

Viewing and Exporting the GSAK Geocache Data
The first step in exporting the data is understanding how it is stored.  SQLite is a free download available from the SQLite download page.  SQLite is a command-line based program, which can be difficult for exploring a database.  Fortunately, another free program named SQLiteMan, a graphical SQLite interface, is available free from their download page.

Using SQLiteMan, the GSAK database can be opened directly from its default location, C:\Users\<username>\AppData\Roaming\gsak\data\<database name>\sqlite.db3

Upon accessing the GSAK geocache database, you will notice a number of tables.  Fortunately, there also exist a couple of views which will make accessing the geocache data easier.

The most important task here is to design a query which will grab all of the desired columns from the cachesall view.  Once we’ve created the SQL statement, we’re going to save it into a text file with a couple of other commands.  This text file will be the instructions used for SQLite to export the data from its database.  We’ll call this file “command.txt”

.mode csv
.header on
.out ‘C:Program Files (x86)gsakexport_scriptgeocaches.csv’
FROM cachesall

The command.txt file which will export the above SQL query to a flat CSV file in the specified directory.
Now, we’ll create a simple batch file named runexport.bat which calls SQLite with the above command file.  This will allow us to create our CSV export of geocache data by running a single file — either manually or on a schedule:

@echo off
echo Performing export…
“C:Program Files (x86)gsakexport_scriptsqlite3.exe” “C:Usersjpries.HQAppDataRoaminggsakdataGA Geocaches 2013+sqlite.db3” < “C:Program Files (x86)gsakexport_scriptcommand.txt”
echo Done!

The batch file, runexport.bat, used to call the command.txt file and perform the SQLite data export.Running the above batch file as is will create a file named “C:Program Files (x86)gsakexport_scriptgeocaches.csv.”  We could perform a one-time import of this data into SQL Server using the SQL Server Import and Export Wizard, or if we planned to perform this import on a regular basis, we could create an SSIS package to automate the process.

Taking a peek inside the exported geocache data csv file with Excel

For the first run and proof-of-concept, we’ll use the SQL Server Import and Export Wizard for a one-time import.  In a later part, we’ll use SSIS to automate the import of cache data to keep the database up to date over time.

Importing the GSAK Geocache Data to SQL Server
We’ll be using the SQL Server Import and Export Wizard to perform a manual one-time import of our geocache data into a pre-existing SQL database.

We first open SQL Server Management Studio and connect to our SQL Server instance.  Next, we right click on our database (or first create a new empty one if needed) and select Tasks -> Import Data to launch the SQL Server Import and Export Wizard.

Once we’ve launched the wizard, we want to choose a Flat File Source and then choose the CSV export file which our batch file from earlier created (which by default is saved in the “C:Program Files (x86)gsakexport_script” directory.

With our file selected, we need to make a couple of changes to the defaults on the General subpage in the Format section:

  • Format: Delimited
  • Text Qualifier :  (single double-quote character)
  • Header row delimter:  {LF}
  • Header rows to skip:  0
  • Column names in first data row:  checked

On the Columns subpage, we need to set the following settings:

  • Row delimiter:  {LF}
  • Column delimter:  Comma {,}

After setting the above options, you should start to see coherent data in the preview section of the subpage (press the Refresh button if not, then check your settings and refresh again).  Once you have the correct data showing, you can press Next to proceed to the next page of the wizard.

Press Suggest Column Types and choose a number which is large enough to include the largest values.  It’ll take a few minutes, but you can suggest a value larger than the number of rows you’ll be importing to guarantee the fields get scanned.  Even though we’ll create the table later with our own parameters, this is necessary to ensure the text data source can import large fields without errors.

For any column which has a maximum value of 8,000 characters or greater (which the Suggest Column Types function can tell you near the end of the wizard), it is necessary to change the Data Type of the source column on the Advanced subpage to Text Stream.  If you do not do this, you will receive an error when you attempt to import about source columns being limited to 8,000 characters.

On this next page, we’re selecting the destination, which in this case is a SQL Server Native Client 11.0 database on the localhost server.

On the next page of the wizard, we’re going to choose to Edit Mappings and on the Column Mappings page, choose Edit SQL.  This will allow us to insert our own T-SQL code to create the destination table in our database, so we can ensure all of the fields are the correct data types.

The below T-SQL code can be used in the Edit Mappings -> Edit SQL section to create our new Geocache table.

CREATE TABLE [dbo].[Geocaches] (
[rowid] int,
[CacheID] int,
[Code] varchar(10),
[Name] nvarchar(250),
[PlacedBy] varchar(100),
[Archived] int,
[CacheType] varchar(1),
[Container] varchar(15),
[County] varchar(50),
[Country] varchar(50),
[Difficulty] float,
[Found] int,
[FoundByMeDate] date,
[HasTravelBug] int,
[LastFoundDate] date,
[LastGPXDate] date,
[LastLog] date,
[Latitude] varchar(20),
[LongHtm] int,
[Longitude] varchar(20),
[NumberOfLogs] int,
[OwnerID] int,
[OwnerName] varchar(100),
[PlacedDate] date,
[ShortHtm] int,
[State] varchar(50),
[TempDisabled] int,
[Terrain] float,
[Created] date,
[Status] varchar(1),
[IsPremium] int,
[Guid] varchar(50),
[FavPoints] int,
[LongDescription] nvarchar(max),
[ShortDescription] nvarchar(max),
[Url] varchar(200),
[Hints] nvarchar(max),
[TravelBugs] nvarchar(max)

With our custom T-SQL entered to create our table, the review screen of the wizard will give us a summary of the table and one last change to go back and make any necessary changes.

Finally, the wizard should complete and import the entire contents of the CSV export file we created from the SQLite GSAK database into a new table in our SQL Server database.  In this case, we imported about 22,000 rows (Geocaches) — which is all of the caches in the state of Georgia presently.

As a test, we can do a basic select statement on our new SQL Server table and voila!  Results!  We are now in a position to perform all manners of analysis on our geocaching data via SQL Server Reporting Services as well as other tools.

Next, we’ll be diving into some of the analysis with our new SQL Server based geocaching data table.