Jeff Pries

Business Intelligence, SQL Server, and other assorted IT miscellany

Category: GSAK

A Modern Day Balloon Race (GCInsight Geocaching Analytics – Part 6)

GCInsightA Modern Day Balloon Race (Geocaching GCInsight Analytics – Part 6)

They’re not as popular as they once were, mostly for environmental reasons, but when I was in elementary school, balloon races were annual events which were tons of fun.  For anyone that hasn’t had the experience in participating in a balloon race, each participant gets a regular helium balloon with an attached string and postcard.  They’re all released at once and the balloons spread sometimes hundreds of miles.  If a postcard is found, it’s reported or mailed back and whichever one gets found the furthest away wins.  While balloon races are great fun, there are some environmental concerns with releasing hundreds of balloons (which essentially become trash) regularly into the wild.

Balloons released in a balloon race

Balloons released in a balloon race

So, what is a more modern and environmental way to have the same kind of fun?  Believe it or not, Geocaching has the answer.  Geocaching is a game in which millions of physical containers are hidden throughout the world.  These containers are tracked on a website which, with each listing, contains GPS coordinates of the geocache and an inventory of what is inside it. Trackables, or travel bugs, are physical items, each of which has a unique tracking code, which can be placed inside of a geocache.  Each trackable item maintains a history of the geocaches (and their locations) that it has been placed within — and that becomes a virtual balloon for our balloon race.

Continue reading

Geocaching Analytics – Part 5: GCInsight Visualizations with SSRS

Geocaching Analytics – Part 5: GCInsight Visualizations with SSRS

Continuing our journey of analyzing Geocache data with various visualization tools, we arrive at SQL Server Reporting Services (SSRS).  SSRS is a server-based SQL Server component used for generating static (and generally printer friendly) reports.

Aside from its cost (free), one of the greatest benefits of SSRS is the ease of use in getting started with report creation.

This post won’t cover the installation and configuration of the server-based components of SSRS (there are plenty of resources out there on the Internet and the process varies based on your version and edition of SQL Server).  At its core though, it is made up of the following components:

  • A SQL Server Reporting Database which stores information used by SSRS, including connection information and reports.  This database is initially created by the installation and is managed by the product.
  • The Report Server and Report Manager services, which are the core service behind Reporting Services as well as a web server (running either natively or inside of SharePoint) which allows for the access and administration of reports and report components.
  • Design Tools such as Report Builder and SQL Server Data Tools (Visual Studio)
SSRS conceptual diagram.  Source:  http://www.accelebrate.com/library/tutorials/ssrs-2008

Going into the inner workings of SSRS and report development can go down the rabbit hole of complexity very quickly, so we’ll stop there.  A quick summary is that report design tools create report files which are stored inside of reporting services and can then be accessed via a web browser.  As we’re interested in geocaching visualizations, we’ll be focusing on the content created within the report design tools.

Continue reading

Geocaching Analytics – Part 4: GCInsight Visualizations with Power Map

Geocaching Analytics – Part 4: GCInsight Visualizations with Power Map

Previously, we used various tools to download a copy of our Geocache data from geocaching.com and use a one-time SSIS package via the data import wizard to import that data into a SQL Server database.

Now the fun begins and we can play with some visualizations!  First up, we’ll play with Microsoft’s brand new Power Map functionality for Microsoft Excel.

Power Map is part of the suite of tools which Microsoft has named the Power BI suite.  The Power BI tools, such as Power Map are available in two flavors — on premise and cloud-based as part of Office 365, both of which utilize Excel 2013 Professional Plus.

One interesting note is that Microsoft is attempting to drive traffic to the Office 365 subscription flavor of Power Map over the standalone version.  As such, the standalone version is technically called “Power Map Preview for Excel 2013” but the functionality is the same between both.  The standalone version was originally to be time limited and stop functioning after 5/30/2014, but that limitation has since been removed.

First thing’s first — ensure that you have Power Map (or Power Map Preview) downloaded and installed.  After installing, we’ll need to go into the Excel addins and enable it (as well as PowerPivot which is another tool we’ll use for managing our data)

Continue reading

Geocaching Analytics – Part 3: GCInsight Data Preparation

Geocaching Analytics – Part 3: GCInsight Data Preparation

At this point, we’ve gotten our data downloaded from the Geocaching.com website and we’re sitting on a local treasure trove of data…but the data is contained within a proprietary program (GSAK) which limits our ability to make use of the data.  So, it’s time to work on extracting the data and getting it into a usable format.

The GSAK application will be our primary data source for acquiring Geocaching data as well as future updates to the data.  That being said, the data is not particularly easy to work with in its GSAK format.  Fortunately, GSAK stores its geocache data in a SQLite database, which is a compact database which runs as an application on command rather than a service.

In order to fully utilize the data, we’ll be building a process to export the GSAK data from SQLite into a SQL Server database.  Once the data is exported to SQL Server, we can perform any number of transformations and summarizations.

Continue reading

Geocaching Analytics – Part 2: GCInsight Data Gathering

Geocaching Analytics – Part 2: GCInsight Data Gathering

This post is going to dive into the details of actually getting the Geocaching data used for later analysis into a convenient and usable source.

Unlike many sample data sources out there, the data is not made (easily) publicly available, though anybody can create a free account and view most of the data on the website.  Getting the data out of the website and into a SQL database is another story altogether…

Some things we’ll need to get started:

  1. Geocaching.com Premium Account – The first step is to upgrade to a Geocaching.com Preimium Account, which as of this writing is currently $20/year.  While geocaching.com does offer free accounts, those accounts cannot see all geocaches (many geocaches are flagged as “premium only”) and they also suffer from data export limitations.
  2. GSAK (The Geocaching Swiss Army Knife) – GSAK is a Windows-based application which is used for downloading and managing Geocache data.  It’s one of the few tools which has the ability to download Geocache data from geocaching.com utilizing the Geocaching.com API (more on that later) and handles de-duplicating and merging cache data.  This handy application is a one-time payment of $30.

There are three primary methods of retrieving data from the geocaching.com website as of this writing:

  1. Built-in Functions (Pocket Queries) – Pocket Queries are the functionality built in to the geocaching.com website for premium members.  They’re intended as a method of gathering a number of geocaches, based on search parameters, and exporting them to a single XML file (the GPX format).  There is a limit of a maximum of 1000 caches that can be returned per query and a maximum of 5 queries that can be run per day.  So, using this method, you can export a maximum of 5,000 caches per day.  One additional limitation is this method will not retrieve caches which have been retired (archived) but remain in the official geocaching.com database for historical purposes.  One strange quirk/weakness to this method is that it will not retrieve the number of favorite points that have been awarded to a cache.
  2. API Queries (GSAK) – Geocaching.com does not make their API publicly available, instead, you must use a program which has access to the API via their partner program.  In this case, that application is GSAK.  GSAK has the ability to download geocaches directly from the geocaching.com website via the API.  The API allows for a maximum of 6,000 caches to be downloaded (with full details) and 10,000 with light details (typically the light details are fine).  This allows for a combined 16,000 caches to be downloaded per day.  Additionally, this will retrieve the number of favorite points that have been awarded to a cache.
  3. Web page scraping – Web page scraping is writing a utility which will retrieve a cache detail web page and then scrape the details off of it.  Since this functions in the same manner as a client would, there is no limit to the number of geocaches which can be downloaded in this manner, however, it does require writing a custom application and updating it every time the geocaching.com site redesigns their cache detail pages.  The c:geo Android app would be an example of this.  We won’t be using this method in this application at this time.

Continue reading

Geocaching Analytics – Part 1: GCInsight, A New Project

Geocaching Analytics – Part 1: GCInsight, A New Project

As you may or may not be aware, I participate in a hobby called Geocaching.  Geocaching has greatly increased in popularity in recent years, so odds are that at this point you’ve at least heard about it.  If you haven’t, a very brief description of it would be a treasure-hunt for technology enthusiasts.  In short, somebody hides a container somewhere (be it in the woods or in an urban setting) and then they post a description on the geocaching.com website which includes attributes about how its hidden as well as coordinates.  You then read the listing on the website, load the coordinates into your GPS, and head out to find it.  If you do find it, you sign the log, enter your find into the website, and possible trade small “treasures.”

I’ll admit, that description sounds a bit weak.  Really, its a fun way to get outdoors and see cool, hidden, or amazing places that you wouldn’t otherwise see (two Georgia highlights include an old abandoned gold mine and a forgotten 1800’s railroad tunnel)

Okay, so why mention this on a business intelligence / sql server topic-heavy blog?

Simple.  By the very nature of this activity, there is tons and tons of really cool information, some of it geospatial, just screaming to be analyzed and visualized.  Geocaching also can get very stats heavy tracking number of finds, on which days or months, and other commonalities between them.

Okay.  So there is a ton of interesting data, just sitting out there in a database.  It seems obvious when you put it that way.  So, why aren’t other people doing anything with this?  The answer is that they are.  There are a lot of people out there doing some pretty cool things with that data.  One example being here and another being here.  But the goal here isn’t necessarily to be unique, but to have fun exploring data.  That said, I believe I do have some ideas to put a unique spin on things.

I’ll be calling this project GCInsight.  This is an entertainment and learning experience rather than a service that I’m setting up.  I’ll go through the steps I performed from start to finish to acquire the data, prepare and export the data, and finally analyze and present the data.

With our project introduced, it’s time to acquire some data…

© 2017 Jeff Pries

Theme by Anders NorenUp ↑