They’re not as popular as they once were, mostly for environmental reasons, but when I was in elementary school, balloon races were annual events which were tons of fun. For anyone that hasn’t had the experience in participating in a balloon race, each participant gets a regular helium balloon with an attached string and postcard. They’re all released at once and the balloons spread sometimes hundreds of miles. If a postcard is found, it’s reported or mailed back and whichever one gets found the furthest away wins. While balloon races are great fun, there are some environmental concerns with releasing hundreds of balloons (which essentially become trash) regularly into the wild.
So, what is a more modern and environmental way to have the same kind of fun? Believe it or not, Geocaching has the answer. Geocaching is a game in which millions of physical containers are hidden throughout the world. These containers are tracked on a website which, with each listing, contains GPS coordinates of the geocache and an inventory of what is inside it. Trackables, or travel bugs, are physical items, each of which has a unique tracking code, which can be placed inside of a geocache. Each trackable item maintains a history of the geocaches (and their locations) that it has been placed within — and that becomes a virtual balloon for our balloon race.
Thanks to the history that is maintained by a Geocaching Travel Bug, the “virtual balloon race” or “trackable race” is possible. All that is needed is to set some arbitrary rules to turn an ordinary trackable into a racer. Each person brings a travel bug to a set location at a set time. They are then left within the geocache and after an agreed upon amount of time (say a year), the race is over. Geocachers tend to know whenever they find a trackable inside of a geocache to pick it up and move it somewhere else, so that is our race! Travel bugs are the balloons and postcards and geocachers are the wind.
So what, you ask, does this have to do with business intelligence, SQL server, data visualization, or anything else? Well, I’m glad you asked, we’re finally getting to that. While geocaching.com provides the core functionality for how a Travel Bug functions, it provides absolutely no functionality for tracking a race — no way to group which Travel Bugs are in the race, where they each are in relation to each other, or how far each one has traveled compared to the others. And that’s where our custom development comes in!
Gathering the Race Data
The first challenge to overcome is gathering the race data. Data for each Geocaching Travel Bug is maintained solely on geocaching.com web pages. While an API does exist, it isn’t publicly available for use, so that means data must be scraped from web pages. Each trackable has two web pages of information — one which contains general information, including the total distance traveled to date, and one which contains a list of all stops the trackable has made. To complicate things even further, one of these two pages requires a valid login to access — so the scraping mechanism must support forms-based authentication.
I considered a number of potential solutions to these challenges, but ultimately I decided that a custom .NET (C#) Windows Service was the way to go. I designed the data gathering Windows service to do the following upon being started:
- Start a refresh timer.
- On firing of the timer, perform a query to a local SQL Server database for the list of trackables which are to have data fetched.
- Create an internal web browser session and authenticate to the geocaching website.
- After authenticating, download the general info and stop data web page for each trackable in the refresh list.
- Parse the general info page and stop data page for each trackable for relevant information and insert it into the appropriate tables in the local SQL database.
Visualizing the Data via a Scoreboard
Having developed a method to store all the trackable race data locally as well as a method to update that data periodically, the fun could begin — summarizing and visualizing the data. The first few visualizations were pretty simple — like any race, there needs to be a scoreboard which ranks the racers from 1 to whatever and shows how much of a gap there is in between positions. While simple, the best type of visualization for this is a simple table. To create this table, I wrote a simple T-SQL stored procedure in the local SQL database which would be responsible for querying and ranking the data and then I wrote a simple web page to execute that query and then format and display the results.
A scoreboard is the bread and butter for any race, but what other fun things could be done with this data? Well, I have a lot of data points, each of which has a latitude and longitude associated with it. How about a map?
Visualizing the Data with Google Maps
The first map I created was a Google Map which showed the current position of each trackable in the race in relation to each other. This fun and interactive visualization allows for seeing where each trackable is in relation to the starting point and how close they are to one another at a quick glance. It won’t tell you which one is in the lead, but the scoreboard in the previous section will always be a better visualization for that sort of information.
While the Google Maps API provides the functionality to create a map and place a marker on it, it doesn’t provide the functionality to easily add a label to that marker. Fortunately, that functionality is easily added by using the “MarkerWithLabel.js” library which is freely available on the Internet. Using this library, it is possible to create a marker as easily as it is a native one, but to include a text label next to that marker to describe what the marker is (in this case, the labels are used to list the name of each racer)
Taking the Google Maps Visualization to the Next Level
The scoreboard visualization was highly useful for determining the status of the race. And the Google Maps visualization was a fun way to see where each racer was right now. But what about the next level? What would happen if you visualized every single movement by ever single racer in a time sequence? I wondered the same thing too, and was determined to figure out how to do it.
After thinking about the problem, I realized it wasn’t terribly complicated, it just had a few challenges. The Google Maps API allows for making changes to the map object after it’s been displayed on the web page (animation) and all of my race data is time stamped and has latitude and longitude. I just needed to assemble the pieces.
I started by creating a T-SQL Stored Procedure in my local SQL database to collect all of the relevant data. This included the list of all of the trackables in the race (and their names) as well as a list of the date and location of each stop that they made.
After retrieving the data from the database, the first job of the program was to convert the list of stops into lines. A line has four components — a staring point, an ending point, the date it was completed, and the trackable which the line applies to. With a list of lines, the next task was to sort these in chronological order and then insert gaps for days in which no activity occurred. For instance, if there was movement on April 12th, 13th, and 14th, but no movement on the 14th, 16th, or 17th, I wanted the animation to pause appropriately for the gaps instead of just jumping to the 18th. To do this, I number every day in the race (from 1 to 365) and I associated all of the movement for each day to its corresponding number. So, some days may have many lines associated with them and some days may have none. The reason for organizing the data like this was for the final key component of the animation.
And with that, I had my animation! I later went back and added a little additional polish to it. I drew a custom marker icon (instead of the default pushpin) and on each clock tick I had the marker move to the last point of the line for each racer. Additionally, I created a color palette and assigned each racer a number corresponding to a color in the palette so each would have a uniquely colored path. Finally, I created a legend to display which racer is which color by utilizing only the labels of MarkersWithLabels.
One Last Visualization with SSRS
The animated map of the race was truly interesting — I’d never seen anything quite like that before and everyone that has seen it has enjoyed it. To wrap things up, I wanted to take one additional step and that was to look at a summary of the race as a whole. Since I’d already gathered all of the data I needed, this was purely a presentation challenge. To accomplish this task, I looked to my trusty friend, SQL Server Reporting Services.
Using my local SQL Server database of the race data, I created a number of T-SQL stored procedures to feed my SSRS report. The goal of the report was to provide an overview and interesting facts about the race as a whole — rather than single out winners (which the earlier scoreboard already does). Again, the summary of the information was truly interesting and I’ve shared it below.
I haven’t shared the full source for any of the above visualizations, however, if you are particularly interested in how to accomplish any specific task above, contact me directly and I’m happy to share the necessary code or information.
And so concludes my GCInsight series of posts — at least for now. I’m always playing with and visualizing data and Geocaching data happens to be a very interesting set of data to visualize. So, stay tuned for a Part 7 of the GCInsight series? Maybe?
[ Back to Part 5 ]