Denver Crime Data, Part 1

I joined DataforDemocracy.org in part out of a sense of civic duty, and in part to find some interesting data projects to work on. The project that caught my fancy was helping to build a dashboard for the USA. From the project page at GitHub:

We’re building a dashboard of key metrics for the USA because: If you can’t measure it, you can’t manage it. When complete, you’ll be able to see how well the country is doing along a number of metrics at a glance. We strive to paint an objective, centralized picture of what’s currently going on, with very timely updates. We hope that by making this information easily visible and available that we can come to a collective understanding about whether the country is thriving or not.

https://github.com/Data4Democracy/usa-dashboard

As I got started, I realized that it was a great opportunity to put to use different python tools that I have been adding to my toolbox: web scraping, database ETL, data visualization, and data analysis. With this in mind, I have set out to create a dashboard for crime in the Mile High city. I hope this can ultimately be translated for use with other cities and possibly even help citizens understand what is happening with crime in their place of residence. Lofty goals, I know.

Fortunately, Denver has a decent data infrastructure, so it is pretty easy to access crime data; the data is even updated throughout the week.

The Denver Police Department strives to make crime data as accurate as possible, but there is no avoiding the introduction of errors into this process, which relies on data furnished by many people and that cannot always be verified. Data on this site are updated Monday through Friday, adding new incidents and updating existing data with information gathered through the investigative process.

https://www.denvergov.org/opendata/dataset/city-and-county-of-denver-crime

I started my project using JupyterLab to parse the data and see what I could do with it. It is a fairly manageable file (113MB) that covers five years worth of crimes in the Denver Metro.

import numpy as np
import pandas as pd

crime_csv_url = 'https://www.denvergov.org/media/gis/DataCatalog/crime/csv/crime.csv'

df['FIRST_OCCURRENCE_DATE'] = 
   pd.to_datetime(df['FIRST_OCCURRENCE_DATE'])

df['DATE'] = df['FIRST_OCCURRENCE_DATE'].dt.date

df_offense_type = df.groupby(['DATE','NEIGHBORHOOD_ID',
   'OFFENSE_CATEGORY_ID','OFFENSE_TYPE_ID'])\
   .agg({'INCIDENT_ID':'count'}).reset_index()

df_neighborhood = df_offense_type.pivot_table(index='DATE', 
   values='INCIDENT_ID', columns='NEIGHBORHOOD_ID', 
   aggfunc=np.sum, fill_value=0)

This project seems like a great opportunity to use plot.ly chart tools, something I have wanted to do for a while. After a little playing around, I was able to generate a time series chart with range sliders.

import plotly.plotly as py
import plotly.graph_objs as go

trace_names = list(df_neighborhood.columns)
traces = []
for trace in trace_names:
    traces.append(go.Scatter(x=df_neighborhood.index,
             y=df_neighborhood[trace].values, 
             name=trace, stackgroup='A'))

layout = dict(
    title='Time Series with Rangeslider',
    xaxis=dict(
        rangeselector=dict(
            buttons=list([
                dict(count=1,
                     label='1m',
                     step='month',
                     stepmode='backward'),
                dict(count=6,
                     label='6m',
                     step='month',
                     stepmode='backward'),
                dict(step='all')
            ])
        ),
        rangeslider=dict(
            visible = True
        ),
        type='date'
    )
)

fig = dict(data=traces, layout=layout)
py.iplot(fig, filename = "Time Series with Rangeslider")

This amount of data makes the mouse-over completely unwieldy unless you change the settings to only ‘display the closest data’ rather than the comparison option. There are various options for the chart that can be accessed in the upper right corner. If you double-click on a neighborhood, it will isolate the data specific to that neighborhood. The values are the number of crimes on a given day, by neighborhood. The bottom bar is a range slider – if you drag the boxes on either side, it will narrow the range of data being displayed in the larger chart.

TBC…