Jeff Pries

Business Intelligence, SQL Server, and other assorted IT miscellany

Month: January 2014

Presenting Data and Information with Edward Tufte

Presenting Data and Information with Edward Tufte

This past Friday, I had the opportunity to attend a one day course, Presenting Data and Information, presented by Edward Tufte. Considered by many to be a pioneer in the field of modern data visualization, this was an opportunity I could not pass up.  After all, how frequently do you get the opportunity to meet one of the founders of a specific discipline in person?

The course was presented in a single-day format, taking place in a large ballroom in the Buckhead Westin Hotel.  For a price of $380, the course included a full day of lecture as well as paperback copies of Tufte’s four books:

If you didn’t mind waiting in line, you could have one or two of the books autographed as well.

Continue reading

Getting Started with US Census Data

Getting Started with U.S. Census Data

U.S. Census QuickFacts data is an amazing resource which is available to the public free of charge.  This treasure trove of information is an incredibly useful resource for performing many different types of analysis involving where people live in the United States as well as other demographic information, all of which is broken down by county and state.  While U.S. Census QuickFacts data is available and summarized from multiple sources throughout the Internet, the U.S. Census Bureau makes the information directly available only via flat text files on their web site.  Fear not!  Without too much effort, we can take those flat files and import them into a transaction data source.

In this post, I’ll be covering the process of downing the U.S. Census QuickFacts data from their website (currently data from the 2010 census) and importing it as new tables into an existing SQL Server database using SQL Server Management Studio and a little bit of Microsoft Excel.  I’m using SQL Server 2012 and Microsoft Excel 2013, but any recent version of both should work.

The first step in using U.S. Census QuickFacts data is to download it.  This can be done via the U.S. Census Bureau Website at which there are three files of interest:

  • DataSet.txt — the actual census data in a CSV file
  • FIPS_CountyName.txt — Columns which give the codes for each country and state used within the DataSet file.
  • DataDict.txt — descriptions of what each column name represents.  This file is a useful reference, but we won’t be importing it into our database.

Start things off by downloading the three text files.

U.S. Census data available for download from the Census Bureau website.

Continue reading

Geocaching Analytics – Part 2: GCInsight Data Gathering

Geocaching Analytics – Part 2: GCInsight Data Gathering

This post is going to dive into the details of actually getting the Geocaching data used for later analysis into a convenient and usable source.

Unlike many sample data sources out there, the data is not made (easily) publicly available, though anybody can create a free account and view most of the data on the website.  Getting the data out of the website and into a SQL database is another story altogether…

Some things we’ll need to get started:

  1. Geocaching.com Premium Account – The first step is to upgrade to a Geocaching.com Preimium Account, which as of this writing is currently $20/year.  While geocaching.com does offer free accounts, those accounts cannot see all geocaches (many geocaches are flagged as “premium only”) and they also suffer from data export limitations.
  2. GSAK (The Geocaching Swiss Army Knife) – GSAK is a Windows-based application which is used for downloading and managing Geocache data.  It’s one of the few tools which has the ability to download Geocache data from geocaching.com utilizing the Geocaching.com API (more on that later) and handles de-duplicating and merging cache data.  This handy application is a one-time payment of $30.

There are three primary methods of retrieving data from the geocaching.com website as of this writing:

  1. Built-in Functions (Pocket Queries) – Pocket Queries are the functionality built in to the geocaching.com website for premium members.  They’re intended as a method of gathering a number of geocaches, based on search parameters, and exporting them to a single XML file (the GPX format).  There is a limit of a maximum of 1000 caches that can be returned per query and a maximum of 5 queries that can be run per day.  So, using this method, you can export a maximum of 5,000 caches per day.  One additional limitation is this method will not retrieve caches which have been retired (archived) but remain in the official geocaching.com database for historical purposes.  One strange quirk/weakness to this method is that it will not retrieve the number of favorite points that have been awarded to a cache.
  2. API Queries (GSAK) – Geocaching.com does not make their API publicly available, instead, you must use a program which has access to the API via their partner program.  In this case, that application is GSAK.  GSAK has the ability to download geocaches directly from the geocaching.com website via the API.  The API allows for a maximum of 6,000 caches to be downloaded (with full details) and 10,000 with light details (typically the light details are fine).  This allows for a combined 16,000 caches to be downloaded per day.  Additionally, this will retrieve the number of favorite points that have been awarded to a cache.
  3. Web page scraping – Web page scraping is writing a utility which will retrieve a cache detail web page and then scrape the details off of it.  Since this functions in the same manner as a client would, there is no limit to the number of geocaches which can be downloaded in this manner, however, it does require writing a custom application and updating it every time the geocaching.com site redesigns their cache detail pages.  The c:geo Android app would be an example of this.  We won’t be using this method in this application at this time.

Continue reading

Geocaching Analytics – Part 1: GCInsight, A New Project

Geocaching Analytics – Part 1: GCInsight, A New Project

As you may or may not be aware, I participate in a hobby called Geocaching.  Geocaching has greatly increased in popularity in recent years, so odds are that at this point you’ve at least heard about it.  If you haven’t, a very brief description of it would be a treasure-hunt for technology enthusiasts.  In short, somebody hides a container somewhere (be it in the woods or in an urban setting) and then they post a description on the geocaching.com website which includes attributes about how its hidden as well as coordinates.  You then read the listing on the website, load the coordinates into your GPS, and head out to find it.  If you do find it, you sign the log, enter your find into the website, and possible trade small “treasures.”

I’ll admit, that description sounds a bit weak.  Really, its a fun way to get outdoors and see cool, hidden, or amazing places that you wouldn’t otherwise see (two Georgia highlights include an old abandoned gold mine and a forgotten 1800’s railroad tunnel)

Okay, so why mention this on a business intelligence / sql server topic-heavy blog?

Simple.  By the very nature of this activity, there is tons and tons of really cool information, some of it geospatial, just screaming to be analyzed and visualized.  Geocaching also can get very stats heavy tracking number of finds, on which days or months, and other commonalities between them.

Okay.  So there is a ton of interesting data, just sitting out there in a database.  It seems obvious when you put it that way.  So, why aren’t other people doing anything with this?  The answer is that they are.  There are a lot of people out there doing some pretty cool things with that data.  One example being here and another being here.  But the goal here isn’t necessarily to be unique, but to have fun exploring data.  That said, I believe I do have some ideas to put a unique spin on things.

I’ll be calling this project GCInsight.  This is an entertainment and learning experience rather than a service that I’m setting up.  I’ll go through the steps I performed from start to finish to acquire the data, prepare and export the data, and finally analyze and present the data.

With our project introduced, it’s time to acquire some data…

The Office Hijinks Series – Part 7: Full of Hot Air

The Office Hijinks Series – Part 7: Full of Hot Air

I’ve heard of people doing things like this before — the cubicle full of balloons or peanuts, but had yet to actually see it in real life.

So, it was time to give it a try!  Eric was out of the office for a few days, so we decided to take advantage of the opportunity and the chaos of our ongoing office renovation.

It turns out, that it takes a LOT of balloons to fill an office…and a lot of air.  The team spent the evening with a number of balloon pumps blowing up and tying balloons.  It took us about 500 regular sized balloons to fill a 12×12 office to about waist height.  At that point, we were quite literally out of steam and capped things off with the Monkey King to lord over his balloon kingdom.

[ Go back to Part 6 | Go forward to Part 8]

Happy New Year 2014! Start Your PDP Today!

Happy New Year 2014!  Start Your PDP Today!

Happy New Year everyone!  With 2013 behind us and today the first day of 2014, it’s time to review how last year went and make some plans for the new year.  And what better way to make some plans and set some goals than to start creating your Professional Development Plan for 2014 today!

A Professional Development Plan is a great way to set a series of professional goals for yourself (such as attaining new skills and certifications) as well as tracking your progress over the year (such as attending user group meetings and training events).  Without tracking as you go, it’s pretty easy to forget what progress you’ve made, lose focus, or just let things slide in general.  So, get started today!

2013 was a great year — I attended a ton of user group meetings, SQL Saturdays, and other training events.  I attended my first PASS Summit and attained my first Microsoft SQL Server certification.  Not to mention, I learned a ton of new tips and tricks and developed some great reports and dashboards along the way.  Here’s to hoping 2014 is an even better and more productive year!

© 2017 Jeff Pries

Theme by Anders NorenUp ↑