Mapping the Truth: How One Group of Data Analysts Cleaned up Dirty Data to Find Bigfoot

0
234

There came a point where the world just became too serious for Bill Troolin. Enrolled at University of Minnesota Data Visualization and Analytics Boot Camp in the middle of a once-in-a-century pandemic, Bill and his fellow learners had unsurprisingly spent many months exploring COVID-19 data in all its depressing variety — from coronavirus death rates to the slump in happiness across different countries.

Eventually, though, Bill and his team wanted a break. “We decided to do something a little more lighthearted — but still treat it professionally,” he said. With that, Bill and three other trainee data analysts pivoted away from the coronavirus. In fact, they left the real world behind too. 

Their new project? Mapping every single one of America’s Bigfoot sightings on an interactive website. “I think it started out as a joke,” laughed Jasper Beachy, another member of the four-man team. Maybe so. But along the way Bill, Jasper, and the others all learned a number of incredibly valuable lessons — about code optimization, data analysis, and teamwork. 

Dirty data

It started out as a bit of fun but the team took their Bigfoot project incredibly seriously. That began with gathering the relevant data. Bigfoot may not be as epoch-defining as the coronavirus, but that didn’t stop one enthusiast from gathering thousands of alleged sightings and posting them online. As the third member of the gang, Gonzalo Reusch spent countless hours sorting through and cleaning up that data. 

That wasn’t easy, Bill explained. “There were about 100,000 rows of dirty data that we had to manually sort and clean — getting rid of anything with missing data inputs or blank fields.” Then came actually putting that raw data onto a sleek, accessible map of the United States — which the team dubbed “Mapping the Truth.” 

Initially, the team had wanted to host their work on a SQL server — but as with so much else in this project, they wound up pivoting as plans changed. “Because there weren’t going to be any new inputs of sightings, we just went with JSON data,” explained Matthew Rud, the group’s fourth and final member. “We used Python, Pandas, and Jupyter Notebook to convert it all into JSONs.” 

There were other challenges, too. After initially trying to upload their work onto GitHub, for instance, the team realized their files were too big. Using LFS (large file storage) didn’t work, either — so finally the team split the files into parts small enough to upload comfortably. 

In the end, one thing was clear: Though Mapping The Truth started off as a bit of fun, it ended up teaching everyone some genuinely valuable lessons about data and coding. “The project gave us technical material that we can show employers,” emphasized Matthew. “It highlights a lot of CSS formatting and JavaScript.”

Beyond Bigfoot

Beyond technical knowledge, the classmates learned another important lesson: teamwork. Throughout the project, each member focused on their own specific skills, with Gonzalo cleaning up the data, Bill jumping into mapping, Jasper sharpening the UI, and Matthew adding datasets to their model. 

And though they’ve already learned a lot — from their data and each other — the team isn’t letting up. As Gonzalo put it, “I think that we just want to keep playing and having fun.” Understand the data, and his enthusiasm makes sense. Apart from offering the basics — where and when a Bigfoot aficionado spotted the creature — the map also shares many amusing anecdotes. 

On one occasion, for instance, some campers in Northern Minnesota reported seeing Bigfoot  amble right past their window. Gonzalo joked that it’d be fun to correlate implausible moments like this to how many substances are consumed in the local area — but he touches on a more serious point, too. 

With so much information to play with, after all, the team has dozens of potential investigations up their sleeve. That’s especially true after they added recent UFO sightings into the mix. Comparing the locations of Bigfoot sightings to those for spaceships, they hit upon an interesting fact: Bigfoot sightings were typically made in dark, rural areas, while UFOs were normally spotted in brightly lit towns and cities. 

I want to believe…

Instinctively, this makes sense. You can easily imagine some overenthusiastic Trekkie mistaking a flickering street light for a visitor from a distant galaxy. More seriously, though, Gonzalo said that his experiences have given him a deep appreciation of data and the enormous insights it can offer. “Now I can turn on the news during election time and understand how all that information is gathered.”

Though Gonzalo and the others began their project to escape reality, it’s clear that they walked away with genuine insights about the real world in all its complexity.

As for Bigfoot himself? After plugging in the data, the team found that sightings are most common in Washington State. So next time you’re in the area, keep your wits about you — unless you’re in Seattle, in which case you’re probably looking at a street light.

Want to learn the truth about what’s really out there — or just sharpen your tech skills? Get started with University of Minnesota Boot Camps in coding, data visualization and analytics, UX/UI, and cybersecurity.

LEAVE A REPLY

Please enter your comment!
Please enter your name here