February 2, 2021
I'm been on a journey into big data now for about 8 months and am finally beginning to work with some really fun datasets, big datasets, and exploring them visually, looking for trends and finding stories in the data. I found a pretty mysterious story and wanted to share it with you today. This is my very first post discussing data (all my posts before this were about photography) so bare with me, I'm new to this whole thing.
In today's post I'm going to explore a massive dataset through the visualization tool kibana. The massive dataset I will explore is New York City 311 Service Requests from 2010 - 2021. This dataset is updated daily as part of New York City Open Data. I downloaded the dataset on December 29th, 2020 as part of an assignment for my class at York University's Certificate in Advance Data Science and Predictive Analytics.
The purpose of my class assignment was to create a .config file to upload the .csv file into kibana using elasticsearch and logstash on a virtual machine in Google Cloud Platform. Don't worry if that sentence doesn't make any sense to you, it's just a thing I needed to say. Then assignment then asked us to explore the data visually, specifically using heat maps and tag clouds (kind of like a word cloud), and finally, create a dashboard. The .config file wasn't easy, especially as this was the first time using geotags, but I was ultimately successful. With nearly 25 million rows, the upload took over 5 hours, but the wait was worth it and it's been a blast exploring the data.
For the record, this post has nothing to do with my actual assignment. This is just a fun exercise in data story telling about the loudest talker in New York City.
Before we begin, a bit about the data. As mentioned, the dataset has nearly 25 million rows representing 25 million 311 service requests from 2010 to 2021 in New York City. In case you don't have 311 in your city, 311 is the number you would call (though you can also make the request online and through a mobile app) to report things like a streetlight being out, a loud neighbour or your driveway being blocked. The dataset has 41 columns, such as created data, complaint type, location type, street name, resolution description, borough... etc. There are a lot of empty rows, as many columns don't apply to particular requests, but each service request has a unique ID. I did not do any cleaning of the data.
Here is what the data looks like in the discover page in kibana (not all columns are shown as there is a fixed space for each row).
As fun as it was to have 25 million requests from all five boroughs of New York City at my disposal, I wanted to get a little more local and specific and decided to just look at the Bronx. Filtering for just requests from the Bronx still leaves you with 4,456,754 requests.
The first big question I was curious about is what are people actually using 311 for? There are two columns that tell this information most clearly, the "complaint_type" column and the "descriptor" column. The complaint column covers the big picture and has inputs like Water System, PLUMBING and ELECTRIC. The most popular five complaints in the Bronx specifically are Noise - Residential, HEAT/HOT WATER, HEATING, PLUMBING and Blocked Driveway (I'm leaving the capitalization as it appears in the data). You can see the top 15 highest complaint types below. (As these are screenshots from kibana, I didn't have a lot of control over the size of the font below so it's hard to read, but I repeat the results in the caption. )
Top 15 complaint types in the Bronx 2010 - 2021
Noise - Residential, HEAT/HOT WATER, HEATING, PLUMBING, Blocked Driveway, UNSANITARY CONDITION, GENERAL CONSTRUCTION, Noise - Street/Sidewalk, Illegal Parking, PAINT/PLASTER, Water System, PAINT - PLASTER, Street Condition, ELECTRIC, Street Light Condition
The other column that helps to understand the complaint a little better is the descriptor column. This column seems to expand on a more specific aspect of the complaint. For example, it breaks down the noise complaint into Loud Music/Party, Banging/Pounding, Loud Talking or Car/Truck Music. There are also terms more general like ENTIRE BUILDING, APARTMENT ONLY which clarifies how localized the issue is. The top five columns in terms of number of requests are Loud Music/Party, ENTIRE BUILDING, HEAT, Banging/Pounding, APARTMENT ONLY. You can see the top 15 descriptors for the Bronx from 2010-2021 below.
Top 15 descriptors types for the Bronx from 2010-2021
Noise Music/Party, ENTIRE BUILDING, HEAT, Banging/Pounding, APARTMENT ONLY, No Access, CEILING, MOLD, FLOOR, PESTS, Street Light Out, Pothole, WALL, Loud Talking, Car/Truck Music
So that's just a little introduction to the data and will help you understand what the most common complaints and requests are for 311 in New York City. The next exploration I wanted to do was where the requests were coming from using the geohash coordinate map feature in kibana. Having never used it before, I was pretty excited to explore the data visually on a map. What would be revealed!?! Let's take a look.
Below is a heat map of all the 311 requests in the Bronx from 2010-2021. My guess would be that this heat map represents pretty well the density of the city. That would be an interesting avenue to pursue someday, but not today. So there we have it, a pretty good look at where 311 requests are coming from in the Bronx.
With some basic filtering we can start to see where specific complaints are most prevalent. Below are very different heat maps, one filtered for potholes (left), the other for street light out (right). Here we can see that the west end of the borough has a lot more problems with their potholes, while the east seems to be struggling with their street lights. Is this information useful? Well, yes. A city could allocate resources for specific tasks in different parts of the city to make it more convenient or to save on transportation. Regardless, you can see these heat maps are spread out through the city, though in slightly different ways.
Filtered for Pothole
Filtered for Street Light Out
In complete contrast, take a look at a heat maps when filtering out for loud talking (left) and loud music/party (right). Here we can see super concentrations in very specific locations. Perhaps some repeat offenders? Perhaps a very sensitive neighbour who just likes to complain a lot? It's hard to tell.
As for the loud music/party complaint being so concentrated, there is clearly someone who likes to party pretty hard in the northern part of the Bronx. Could be worth digging into at some point. But honestly, can we talk about loud talking? In a city of loud talkers, I couldn't get over the fact there would be that many complaints for loud talking, and on top of that, I couldn't get over the fact that it was so hyper focused on one part of the Bronx. I had to dig into this deeper to see what the heck was going on! Did we just find the loudest talker in the Bronx?
Filtered for Loud Talking
Filtered for Loud Music/Party
The major advantage of having almost 11 years of timestamped historical data is that you can look to see if there are any specific trends. Below is a histogram of all the 311 service requests in the Bronx specifically for loud talking. The combined total is 60,530 over the 11 year period. As expected, you can see the summer months have high peaks and many fewer calls in the winter. You can also see that covid-19 may have had something to do with the sharp spike in 2020, but I'm hoping to do another post on the pandemic, so we can look at that later. As informative as this histogram is, it still doesn't tell us anything about that specific hot zone in the heat map.
Histogram of complaint loud talking in the Bronx, 2010-2021
To help with finding the specific locations of the complaints, it made sense to count the complaints for loud talking for specific street names and see what was the most popular. The dataset has a column entitled "street_name" and I used that to filter. When you count the 311 requests for loud talking and filter by street, Beach Avenue had 7,910 complaint requests. When you look closer at the heat zone, Beach Avenue is exactly where the bright red dot is, so we definitely found our spot. The street with the next highest number of complaints for loud talking was Cruger Avenue with 1,554. That's a pretty huge drop. That means one street had 5x more 311 requests for loud talking than any other street in the Bronx. This must be one loud New Yorker.
Count of Loud Talking by Street Name, Bronx, 2010-2021
BEACH AVENUE, CRUGER AVENUE, GRAND CONCOURSE, PROSPECT AVENUE, HOE AVENUE
Beach avenue has so many more 311 complaints for loud talking than any other street, that when you exclude it from the data, the heat map doesn't look anything like the original and resembles a lot more closely what the heat map looks like for all the service requests in the Bronx. So what is going on at Beach Avenue? Does New Yorks loudest talker live on Beach Avenue?
Filter for loud talking excluding Beach Avenue, Bronx, 2010-2021
Next, I decided to take a look at the histogram again, this time, filtering for the top 3 most complaints for loud talking by street per month. They say if you have to explain a chart, then it's not a good chart, and yes, this is a messy chart below, but it tells us a lot. Let me explain it to you. Think of it this way, the chart plots three dots each month showing the three streets that made the most requests complaining about a loud talker in the Bronx. If that street is a repeat offender (gets into the top three in another month) there is a line to its next dot. The 5 most popular streets (from the chart above) I have highlighted in brighter colours. Beach Avenue, out mysteriously popular loud talking epicentre, is highlighted in red.
Histogram of top 3 streets per month with complaint requests for loud talking - Bronx - 2010-2021
Histogram of Beach Avenue complaint requests for loud talking - Bronx - 2010-2021
I really wasn't expecting that. What we have now learned is that the vast majority of the complaints on Beach Avenue have been made since 2015, and hardly any before that. What changed in 2015? It would make sense to look at a heat map of complaints before 2015. Below is a heat map (right) and the top five streets (left) for complaints about loud talking before 2015. From 2010 to 2015, Cruger Avenue had the most complaints about loud talking with close to 75% more complaints than the next street, Prospect Avenue. Beach Avenue came in third.
Count by street for loud talking from 2010 - 2015, Bronx
CRUGER AVENUE, PROSPECT AVENUE, BEACH AVENUE, GATES PLACE, TOWNSEND AVENUE
Filtered for loud talking 2010 - 2015, Bronx
Not surprisingly, in contrast, when you look at the complaints for loud talking since 2015, Beach Avenue towers over all the other streets in the Bronx, accounting for nearly 7x the next highest street.
Count by street for loud talking from 2015 - 2021
BEACH AVENUE, GRAND CONCOURSE, HOE AVENUE, ANDERSON AVENUE, PROSPECT AVENUE
Filtered for loud talking 2015 - 2021, Bronx
Is it possible that the loudest talker moved from Cruger Avenue to Beach Avenue in late 2014? Is this data more about the talker or more about the complainer? Was it the complainer that moved instead? Is this data not actually about the loudest talker but the person with the most sensitive ears? So far, it's hard to tell. Let's explore some more.
Keep in mind, while I'm just exploring the data lightheartedly (no, I don't really thing the loudest talker in NYC lives on Beach Avenue) each one of the requests to 311 has to have a response in some capacity. As for noise complaints, the department responsible for the resolution of the complaint is the NYPD. The dataset has a column called "resolution_description" and has a preset number of results for how the NYPD responded to the complaint. Here are all the resolutions to all the 311 requests complaining of loud talking in the Bronx for the 11 years of the data.
Top 10 resolutions for 311 services requests complaining about loud talking in the Bronx 2010-2021.
Here we can see the majority of the time the police responded to a complaint for loud talking, there was no action required. No action was stated in various different ways, it could be "no evidence of the violation" (top row), "police action was not necessary" (third row), or "those responsible for the condition were gone" (fourth row). Alternatively, police could take "action to fix the condition" (second row). This I would classify as action needed.
To determine if the loudest talker in the Bronx is actually on Beach Avenue, we should take a look at the police response to loud talking across all the loud talking complaints in the Bronx compared to just the responses on Beach Avenue. If the police had to take action on Beach Avenue at a higher percent of the responses, it might suggest the loudest talker lives on Beach Avenue. As it's always good to visualize the data, I've compared the police response to loud talking across all of the Bronx to just Beach Avenue. I've distinguished the top five responses into two categories, action was needed or action was not needed. Where action was needed, "The Police Department responded to the complaint and took action to fix the condition" is highlighted in red for each chart. All the other responses required no action or a neutral reaction and are in green/blue colour (neutral reaction is the response "The Police Department reviewed your complaint and provided additional information below"),
Police response to all loud talking complaints in Bronx, 2010-2021. Red bar is where police needed to take action.
Police response to loud talking complaints on Beach Avenue, 2010-2021. Red bar is where police needed to take action.
Here we can see that the police response had to "take action to fix the condition" for loud talking a lot more more often for all the calls in the Bronx compared to Beach Avenue. When you run the numbers, police "took action to fix the condition" on 23% of all the calls in the Bronx for loud talking, compared to only 3.4% of the complaints about loud talking on Beach Avenue. This suggests we might not have the loudest talker in the Bronx on Beach Avenue as you would expect the police to have to take more action on Beach Avenue if that was the case.
Maybe the person with the most sensitive ears lives on Beach Avenue and it's more about the complainer than the loud talker? Let's explore some more to see what else we can find out about Beach Avenue.
Another major discrepancy between Beach Avenue and rest of the Bronx complaints about loud talking was the location type. This column in the dataset, "location_type", describes the type of location where the complaint is taking place. For all the complaints about loud talking in the Bronx, the most common location types were residential building/house and street/sidewalk. A distant third was store/commercial. For Beach Avenue, the number one location type was store/commercial, followed by street/sidewalk and and a distant third was residential building/house. Below are tag clouds and count for top 5 location types for complaints for loud talking in the bronx as a whole (left), and on Beach Avenue only (right)
Tag cloud of location type for loud talking, all Bronx.
Tag cloud of location type for loud talking, Beach Avenue in Bronx only.
Count of location type for loud talking, all of the Bronx.
Count of location type for loud talking, Beach Avenue in Bronx only.
Beach Avenue accounts for over 80% of all the complaints for loud talking at the location type store/commercial in the Bronx. Of course I did a google street view look at the neighbourhood and there were nothing that looked different than any other corner in that part of the Bronx. Is it possible there is a store owner with the loudest voice in New York? Maybe, but police responses don't suggest that the complaints are particularly warranted. Is it possible someone living above or near a business is has a lot of young kids trying to nap? Perhaps? There is a story here, but New York is full of great stories.
In our last exploration of loud talking in the Bronx, let us look at the ways the complainer reached out to 311. The three ways to make a service request to 311 in New York City are through calling the number 311, online or through a mobile app. In our dataset, there is a column called "open_data_channel" and each row has one of the three different ways you can make a service request. This column turns out to be very insightful, and might be the most insightful for our mystery of the loud talker.
Let us begin by examining how people who want to complain about loud talking in the Bronx have connected with 311 in the past 11 years. Best way to do this is through a histogram. This is the same histogram we saw above but it breaks down all the complaints into the way they reached 311. Here you can see that the ability to make a service request online was widely released sometime in 2011 and the mobile app sometime in 2012 and both have grown in use over the years. You can also see that actual phone calls to 311 had decreased over the years.
Histogram of 311 service request channels, Bronx, 2010-2021
It's clear the complaints about loud talking across the Bronx shows a slow but steady increase of mobile app usage over the 11 years. In contrast, when we look at the ways in people have connected to 311 complaining about loud talking on Beach Avenue, there is a sudden and immediate jump from online requests over the internet to mobile requests on an app.
Histogram of 311 service request channels, Beach Avenue, 2010-2021
For me, this chart solves the mystery of the loudest talker. That person exists, but they can't be found in the data. What the data shows instead, most likely, is a person or household with the most sensitive ears and that this person is most likely to reach out to 311 about a regular talking New Yorker that lives near them.
How do I come to this conclusion? Clearly the distribution of ways people connect with 311 on Beach Avenue in no way reflects the rest of the Bronx. The almost immediate change from online to mobile service requests for loud talking suggests that the bulk of the complaints are weighted by one person or household. The idea that everyone on this street chose the same month to download the app to complain instead of doing it online is pretty unlikely. This might be a case of really good targeted advertising for the mobile app in this neighbourhood, worth a look for marketers, but again, it's highly unlikely.
And finally, there is a chance that this very same person or household that complains so much on Beach Avenue since 2015, lived on Cruger Avenue prior to 2015.
Histogram of 311 service request channels, Cruger Avenue, 2010-2021
In conclusion, exploring data visually is super insightful. Using the heat map, we recognized a super concentration of complaints for loud talking at one particular part of the Bronx. The bar chart showed us which street had the most complaints, Beach Avenue, and the histogram showed us that most of these complaints were from 2015 alone. A dig into police responses to the complaint show us that action is required way more often at calls to any other street in the Bronx, suggesting Beach Avenue doesn't have the loudest talker living or working there. We also revealed that the vast majority of the complaints on Beach Avenue are made about a store or business, unlike almost all other complaints about loud talking on other streets in the Bronx. And finally we can pretty much deduce that the complaints on Beach Avenue are being made by one person or household as the means in which complaints were made to 311 shifted almost entirely from one mode to another in the matter of one month in 2017.
Is this information helpful? Probably not. Does it tell a story? Certainly, and the screenplay could be worth a lot! Pick a genre. Romantic comedy? It's a story about a store owner who is in love with a beat cop and keeps complaining about a loud talker to get them to visit their store. Horror film? Someone is being haunted by a former patron of a local deli who roams the store yelling their last order that was incorrectly made cost them their life for some reason. Drama? A deep feud between a store owner and the neighbours upstairs begins when their children fall deeply in love and both parents resent the other family. I could go on, but I will spare you. Regardless, a story is here and I welcome anyone to tell it.
As for the police, think they could increase patrols on this particular street as they are likely going to have to end up on Beach Avenue eventually. Ideally, they might try to reach out to the person complaining so much to see if there is a way to resolve the underlying issue. It's not likely just someone talking loudly.
And finally, I stepped back and look at all the complaints for loud talking in all of the 311 service requests for all five boroughs of NYC, and to my surprise, Beach Avenue still accounts for the most complaints. So it's official, this was a search for the loudest talker in all of New York City.
Loud Talking complaints to 311, New York City, 2010-2021
Elasticsearch is fast. I had been told it was fast, but working with my largest dataset ever by many millions of rows, this blew me away.
I say 11 years of data, but is actually three days short as I downloaded the file on December 29th, 2020.
Once I started to dig into this a little deeper, I started having some uncomfortable feelings about privacy. I didn't entirely feel like what I was doing was wrong, this is an open source dataset, but it was weird to drill down into an actual street in the city, a very specific intersection. Once I realized it is likely one person or household, that got even weirder. Albeit, the thing I kept telling myself that there is a story here, a super cranky neighbour, a family with lots of kids who nap all the time, a particularly loud patron of a store, and I would love to know what it is. New York is filled with great stories and I'm sure there is a pretty interesting one here.
From what I could tell, I couldn't find any other deep data exploration of this particular aspect of the NYC 311 service request data. Would love to see any posts on it if they do exist.
Writing these posts takes way longer than I expected. I started writing and had to return to the data multiple times. Mad respect to all the people who make posts like this all the time, I now have a greater appreciation for how challenging this is.
Google Cloud Platform is really expensive. I returned to this data multiple times using their virtual machine and my dedicated cluster and I got a big bill for all the data usage. It's a rookie mistake, but be warned.
As I need to move on to more post and more assignments from class, I am not going to go back and fix a few things I would likely change if I were to do it again. The main thing would be being more consistent throughout the whole post on colour.
All the visualizations in this post are screenshots from kibana and sometime aligned or adjusted in photoshop. Kibana is amazing in real-time, for example, if you are to roll your cursor over each bar in a chart or different sections of a heat map, the actual numbers and more detail is given. Similarly for when creating a dashboard. The interactive part of the charts and maps is most helpful, but not something I was able to demonstrate here in this post.
This assignment was a part of York University School of Continuing Studies Certificate in Advance Data Science and Predictive Analytics class.