I had a eureka moment today while I was rummaging about on various job sites. It occured to me, in part because I finally found a dataset that I could understand; data that I could “feel”. My sudden intuition was that I should tell a data story.
I need to tell a data story. I need to see the data come to life and illuminate human interaction.
Enter the Instacart data set:
The story I am telling needs an audience. Who is reading this story. What do they want to learn? Why are they reading the story? I did not have an immediate answer to these questions, so I just hopped in with some semi-random chart generation (more formally called exploratory data analysis “EDA”). Per Instacart’s data team, the original purpose of this data set was for machine learning experimentation:
Instacart is excited to announce our first public dataset release, “The Instacart Online Grocery Shopping Dataset 2017”. This anonymized dataset contains a sample of over 3 million grocery orders from more than 200,000 Instacart users.
3 Million Instacart Orders, Open Sourced
For each user, we provide between 4 and 100 of their orders, with the sequence of products purchased in each order. We also provide the week and hour of day the order was placed, and a relative measure of time between orders.
Machine learning is above the paygrade of this post, so I am going to focus on manually sussing out interesting details that might influence the behavior of a stakeholder. My list of stakeholders starts with the four sided marketplace of Instacart: Customers, Retailers, Shoppers, Products. The Product stakeholder can also be thought of as an advertiser. My background is in the retail industry, so that seems like a great starting point. Retailers may want to know about customer habits to help drive staffing, inventory, and marketing decisions:
When to expect customers so they can staff for it.
Using the base data, I graphed the shopper orders placed by time of day. While interesting, the Instacart team points out that different product categories are purchased at different times of the day. From a retailer’s perspective, what you are buying based on the time of day has a bigger impact on marketing and cross-selling opportunities.
Next I looked at the shopping patterns by day of week. Weekly shopping habits can drive staffing and inventory decisions. If you know your slowest day is Monday, you may focus your inventory team on Sunday night when your customer load is down and you have just had your big sales push.
What products people are buying so they can maximize inventory.
Instacart customers’ buying habits may not be representative of the entire shopping population due to the increased cost, but from the following chart, you can see what departments make up a shopper’s cart. An interesting stand-out is that while Baby products are not a major seller at 1 and 2 AM, but they are the largest component of a shopper’s cart.
It turns out that that the two spikes in early morning shopping come from one shopper each hour stocking up on baby food, which is not much of a trend.
How can a sale be maximized (upsell/cross marketing).
If a retailer knows where the shoppers will be throughout the store, and what they may be looking for, it opens up contextual marketing opportunities. If the typical customer at 5pm on a Friday is shopping for snacks and alcohol, then a retailer can micro target those shoppers and tailor up/cross sell efforts accordingly.
There are a lot more complex processes we can run on this data, and it appears that Tableau is not loading properly in some browsers at the moment, so we will have to return to this effort shortly.