We discovered how recommendations are helping people get their daily dose of music, news and food
At Bibblio Events we recently organized our first-ever RecSys New York City Meetup, hosted by the team at eBay. The data science get-together took place at their offices in Chelsea, Manhattan, on 6th Avenue.
The presenters for the evening were Adam Hajari (Senior Scientist at Pandora) & Samir Rayani (Engineer at Pandora), Dhaval Shah (Head of Machine Learning and Big Data Engineering at Bloomberg Media) and Marc Jansen (Data Scientist at Gousto).
Read on to find out more about the work they've been doing to improve recs at their companies and grab the links to their presentation slides too.
Eightball: an Open-source Framework
"We sat around and thought: could we reverse engineer the Billboards charts? And you know, as college kids, we said 'probably'." – Samir Rayani
Samir Rayani co-founded Next Big Sound (NBS) in 2008 during his junior year at Northwestern University, and Adam Hajari joined the company in 2012 after completing a PhD in Physics at Washington University. In 2015, NBS was acquired by Pandora, which offers an internet radio service, available in the US, which is well known for its track recommendation feature.
After an intro on how NBS came about - a group of college kids asking themselves 'how does a band become famous?' - Samir explained the underlying principle and purpose of their RecSys problem space in their duo presentation.
NBS focuses on finding out where fans are listening to artists, analyzing that data to determine which social signals lead to sales, and ultimately recommending to artists and their teams which marketing levers they can pull to increase their sales and grow their audience. Here's the premise of NBS set out in a diagram:
Before diving into eightball, Samir and Adam shared some entertaining and valuable insights that they picked up in the early years at NBS. They spoke about how important it was to learn lessons the hard way:
So how did they get to where they are now? A new hire introduced the twelve-factor methodology to NBS's engineering and data science team, and the first factor is a 'version controlled code base'.
Another factor is 'Build, Release, Run', so NBS packaged all their apps as a single file that included all library dependencies. This is the connection to eightball, Adam explained, which attempts to do something similar for models:
Eightball uses panda dataframes as its data structure, and in a way wraps the Scikit-learn library whilst adding extra tools. In short, their open-source python library can be used for packaging prediction classifiers and evaluation tools (and can be found on GitHub here).
News Recommendations at Scale: Challenges and Approaches
"People don't want to read about yesterday's market breakdown today. Most stories we can't recommend a couple of weeks later, not even a couple of hours later - we need to do it right now." – Dhavel Shah
Next up was Dhaval Shah, who has been has been Head of Machine Learning and Big Data Engineering at Bloomberg Media for close to four years now.
Dhaval came to talk about doing recommendations at scale, and the numbers over at Bloomberg Media back that up: every month they reach 60 million unique visitors on web and mobile applications by publishing approximately 500 news stories and videos every day.
In the image below you can see of how the user sees their recommendations. Article recommendations are presented in a never-ending scroll - this works similarly across desktop and mobile.
The team at Bloomberg Media, Dhaval explains, wants to continuously enhance the UX, increase the amount of time users spend on their platforms, and grow the group of returning users too. He expressed the belief of both the data science team and 'business' side that personalizing their news content will help them achieve these goals.
What are the challenges in enhancing their news content offering? Firstly, challenges can be domain-specific, as news sites have a high publishing frequency and lots of articles have a low shelf life:
There are exceptions to this shelf life rule, Dhaval pointed out, as articles such as Paul Ford's What is Code? (2015) and Tom Randall 2016 piece on climate change are still popular today.
Dhaval made the point that there is also a danger in (over) personalization towards user consumption patterns, and the key thing is to understand how they change over time. Personalization approaches like this can have the limitation of isolating people from diverse content, and you should take care to avoid monotony in recommendations if you want a better user experience.
Dhaval also mentioned another type of challenge inherent to recommender systems. As well as avoiding the danger of filter-bubbles, Bloomberg Media needs a scalable solution that can handle the size of the user base, and that's able to tackle the cold start problem for both items and users too.
Providing timely recommendations is essential, as they lead to a better experience and increase trust, but there are issues with the available approaches:
So how does Bloomberg Media overcome these problems and generate timely recommendations? Dhaval explained how they adapted the user and item based collaborative filtering (CF) approaches to build models quickly. They carefully select a subset of users and/or items for computing similarity in CF. This allows Bloomberg Media to tune to current user reading habits by training several times a day, adapting to current market trends and recommending newer content. In this way, Bloomberg Media is able to adapt to changing user behaviour while mitigating the low article shelf-life problem.
Recommendations are computed in real-time and also reshuffled using a technique called dithering when displayed, as randomness intrigues users. The results in recent A/B tests show positive outcomes for their adjusted CF plus dithering with content similarity too:
Data Science for Dinner: Building a Personalization System
"We use recommendations to make our service super impactful for people - not something that just increases our revenue but really changes our customers' lives." – Marc Jansen
Our third and final presenter was Marc Jansen, Data Scientist at recipe kit box company Gousto. Their model is simple but effective - they deliver a box to your doorstep with fresh ingredients and recipe cards to help you out. Marc is based in London, UK, and kicked off his presentation by describing their service and the market they're serving. Marc began with the observation that the UK market is very different from New York City. In the UK, people eat at home.
You only need to grab a small share of that market to be a successful company, but in order to pull it off, you need a lot of manpower. One of the crucial teams is Data Science, which works fairly separately from the other parts of the company, under the direct responsibility of the CTO.
Gousto's data science team work on mid- to long-term projects. In Marc's team, they focus on using algorithms to improve warehouse processes and everything that has to do with the online user experience. Below you can see a slide Marc 'borrowed' from his CTO on how the data streams end up at his team. (A special shout out to London company Snowplow Analytics, who capture all the data coming in from Gousto's applications and configure it into the right format):
As the number of available recipes grew, the users began reaching out for help finding ones they liked. Providing recommendations was a great step up from having to scroll through a list of 1500+ possibilities! There's actually a lot to recommending recipes (and food), Marc explained. The recommendation engine needs to take into account historic orders data (demand), stock handling, warehouse optimization and the personalized choice of the customer.
When they started to work on a their first recommender system last June, they decided, as many other e-commerce businesses with lots of active customers do, to pick one based on CF (using an implementation of LightFM). This was launched in December '17. And led to some issues:
The first challenge was that Guosto doesn't sell a static set of products. The company decides weekly what the menus are and then Marc's team needs to optimize and recommend within that set. The second challenge is a physical constraint - there's a limit on what can fit into a box. My 'favorite' limitation of the actual box is that you currently can't have more than two broccoli based menus because it simply won't fit.
So Marc and his team decided to have a different stab at the problem. What is the core of their knowledge? Recipes! So recently they started to build a full recipe ontology (with lots of help of their chefs and support of graph database supplier Neo4j). This is the foundation of their new product: a content based recommender engine calculating recipe similarity, if you like.
Marc shared the results he was allowed to share while we wait for the live results of the new content based recommender to come in. The off-line testing of the new recommender shows promising results (0.7 vs 0.55). Marc explained that, based on the live results of the CF product, you can conclude that people don't disagree with their recommendations, but they're not as persuasive as they could be:
Marc made an important side point while discussing this: it's really important to think about the design of your recommendations! His advice is to stay close to your UX/UI team.
Marc finished his presentation discussing the future of the recommendations at Guosto. Their websites and apps are just the beginning, he said. The obvious next step is to incorporate recs into emails to customers. Marc expects to be doing recommendations via voice too, as Guosto have launched their Alexa app reading out recipes and making it possible to order as well.
I'm grateful we were able to host our inaugural NYC event at such an impressive venue, and it was fun chatting to everyone over drinks and pizza. If you want to join us at the next NYC meetup on recommender systems, then join the group here (212 members already)! Our second meetup will be on September 18, 2018. Also, we’re always on the lookout for speakers to wow our future meetups, so if you know, (or are) somebody who’d like to share a story — big or small — then please contact me.