In October of 2013, I attended a SQL Saturday event in Charlotte, NC presented by Julie Smith and Rob Volk titled “Harvesting Web Data Using Power Query & Other Tools” (the link to the session has since broken).

Julie Smith and Rob Volk presenting ” Harvesting Web Data Using Power Query & Other Tools”

In the session, Julie and Rob downloaded data about SQL Saturday events from the sqlsaturday.com website using two different methods.

This was the first time I’d ever seen this data used in a demo and I found it intriguing enough that I made a note to myself that I wanted to play with it some day as well to see what I could do with it.

Well, 6 years later, we’re finally there. In this two-post series, I’m going to explore the SQL Saturday data with Power BI and SQL Server. I’ve broken this exercise into two parts. This post will cover the Power BI file, which includes the data model, cached data, visualizations, and analysis. My second post will cover getting the data, working with and organizing the data, and challenges with the data and process as a whole.

Its also worth noting that while putting this demo together, Reza Rad posted something very similar, Analyzing PASS SQL Saturday Data Using Power BI, though our methodologies for the data are quite different. This was now my second time seeing the data used in a demo, so pretty interesting to see another take on it.

What is SQL Saturday?

In short, SQL Saturday is a one-day free training event hosted by PASS and organized and executed by the community. Starting in 2007, most major cities, both in the US and worldwide, host an event once or twice per year. Each event is organized by volunteer organizers and has multiple tracks of sessions presented by volunteer speakers. Events typically also have sponsors to help cover the costs of putting on the free event.

Data

Based on the description, we can start to see what aspects of the data might be interesting. Events take place at locations over years. Each event has sponsors (of varying levels), sessions (of varying tracks), and speakers (with varying sessions).

Gathering all of this data will be covered in part two of this series, but all* of the data is available at sqlsaturday.com publicly without need of logging in. (All gets an asterisk as there are data availability issues with events prior to April 2015 due to a website upgrade that took place around that time)

For my spin on SQL Saturday data, I wanted to spend a lot of time on the ETL process and making the data as useful as possible. In doing so, I re-shaped the data somewhat (due to two many-to-many relationships) and I enriched the data with some location geocoding via the Bing Maps API as well as data for sessions and events I’ve attended.

Power BI Analysis

Now for the fun part, exploring the data. For my exploration, I thought the data would be most interesting broken up into different summaries, with one page per summary. The data supports current and future events, though my summary is filtered to exclude future events.

Summary by Event Location (All / US Only / Non-US Only)

Of the 994 events that I have data for, 965 have occurred so far throughout the world. The events have taken place in 305 cities within 71 countries. We’re averaging 69 events per year (or 2.2 per week) with most events occurring in September and the fewest in January. Interestingly, it looks like the total number of events per year has leveled out over the past few years.

Summary of all SQL Saturday events to date, regardless of location.

Of the 965 events that have taken place, 516 have taken place in the US (53%). The events have occurred in 134 cities in 41 states. I was surprised to see how many states haven’t had a SQL Saturday event yet. In the US, we average 37 events per year and 1.5 per week. October is the most popular month and December the least.

Summary of all SQL Saturday events to date, for the US only., utilizing the new Shape Map visual.

The other half of all events have taken place outside the US, with the most events having taken place in Brazil. Outside the US, we average 45 events per year and 1.7 per week. September is the clear winner for most events with January and July having the fewest.

Summary of all SQL Saturday events to date, for the Non-US only.

Being a free event for attendees, SQL Saturday couldn’t happen without sponsors. Unfortunately, within the sponsor data, sponsors are frequently listed multiple times (under varying names) and sponsorship levels are not fully standardized. In both of these cases, I’ve attempted to clean up this data and re-group the sponsorship levels. Each event typically has multiple sponsors at each level (with Gold, Silver, and Bronze being some of the most common levels) and each sponsor can sponsor many events.

There have been over 11,000 sponsorships so far (instances of a sponsor sponsoring an event) made up of at least 2,201 unique sponsors. Gold is the most common sponsorship level, both for an individual event and overall. On average, an event will have about 11.1 sponsors of varying levels. The high amount of repeat sponsorships seen from sponsors is excellent!

Summary of all sponsorships for all SQL Saturday events to date.

Right up there with sponsors, an event cannot happen without volunteer speakers. Unfortunately, speakers is another area of the data where, within the data, a speaker may show up multiple times under varying names. Some basic de-duplication has been done here to reduce the more obvious duplicates.

On average, there are 28.4 presentations (sessions) per event. To date, there have been at least 4,042 speakers presenting these events. Looking at the comparisons of counts of presentations and sessions shows instances where a speaker may present the same session multiple times or may present multiple times at the same event.

Summary of all speakers for all SQL Saturday events to date.

Every event is made up of sessions. Sessions are organized into tracks (with the number of which varying for each event). The Track data is very dirty, with many events having custom tracks. I’ve made an effort to bin together similarly named tracks, but Other still wins the race for the number of sessions, followed up by Enterprise Database topics and BI topics.

To date, there have been over 27,000 presentations made up of over 16,000 different session titles. “Common TSQL Mistakes” has been presented at the most number of events so far, coming in at 43 events.

Summary of all sessions for all SQL Saturday events to date.

I am definitely biased, but to me, so of the most fun analysis is looking at what events and sessions that I’ve attended over the years. My first event was in 2012 and I’ve attended 30 events so far in 14 different cities. I’ve yet to attend a SQL Saturday event outside of the southeast. I usually attend 3-4 events per year with a focus on BI events and BI tracks (for obvious reasons!)

Summary of all SQL Saturday events that I have attended to date.

Of the events I’ve attended, I’ve attended 159 sessions (the number is actually a little higher, but there are a few sessions that, for whatever reasons, never made it to the official schedule data). Of those 159 sessions, I’ve spent over 170 hours in sessions learning! I’ve been fortunate enough to see 120 speakers and I’m averaging 17.7 sessions per year!

Summary of all SQL Saturday sessions that I have attended to date.

I find much of this data fascinating and I’ll probably continue to work with it and enhance it over time, so stay tuned for possible future versions. Until then, my data, processes, and Power BI file are available. See the links below in the Resources section.

Note that all of this work, and much of the report development was developed prior to March 2020, when the Covid-19 pandemic started cancelling many events. Prior to that, the number of events per year were trending upward and event cancellations were rare. Now, new events are rarely being scheduled and existing events are converting to “virtual” or cancelling. For cancelled events, many may appear to have occurred as prior to the pandemic, cancellations were rare and the data doesn’t have a great way of denoting whether or not an event actually occurred.

Resources