Recommendation engine + service
Abstract
Recommendation engine, combined with a dedicated microservice, is the edge stone of the system. It is a machine learning model exposed through an API, capable of providing what would be the most relevent topics of interest for a specific user. The true power of this engine is by melting topics the user has interest already plus topics he might be interested into, based on similar user interests (collaborative filtering).
TODOs
- Fake data and personas (philosophy, nothing much)
- Empty running : randomness algorithm
- Models : Content-based filtering + Collaborative filtering
- API exposure
- Plan : 1st dummy model to control the full chain
- Backend BFF
Data flow overview (draw.io diagram embedded)
Development constraints
Synthetic data generation
We are starting development of a ML engine without having existing data. To make that work out, we should generate data that mimics are least most of the behaviors our users have.
To do so, we should rely on two things :
- Personas based on existing client segmentation (Kundensegmentierung)
- Splitting data between subject data and test data
We should create a first a handful set of users, based on the 18 different criterias used to define these client segments. Each user will will be unique and will more or less fit in one of the segment, and will generate ~100 event logs.
Data should then be splitted between what will be consumed by the engine to self-learn, and what will be used as test data. Usually, 70% of a dataset is used to train the models and the 30% left are used to test it.
Dry run : providing recommendation on a user without event logs
On this matter, the purpose of habbits questionnaire is exactly to prevent this kind of edge case. But still, users can decide to not opt-in for personalisation. In this case, the recommendation services should fall back to provide trends coming from all users, and no longer by making collaborative filtering. This approach is close, or maybe identical, to content-based filtering.
Models : content based filtering + collaborative filtering
As seen with dry run, we will face use cases where we do not only need collaborative filtering, but also trends that are applied for all users, maybe per feature.
This is the sign we do not need one model, but two for each type of filtering we want to apply.
Building a model is a really difficult task. We don't exactly know how it works, it is self-learning, and their quality mostly relies on the quality of data. We however can rely on exhaustive benchmarking and ressources from Microsoft Azure to start building a first model :
- https://learn.microsoft.com/en-us/azure/architecture/ai-ml/architecture/real-time-recommendation
- https://github.com/recommenders-team/recommenders/blob/main/examples/05_operationalize/als_movie_o16n.ipynb
- https://learn.microsoft.com/en-us/azure/architecture/solution-ideas/articles/build-content-based-recommendation-system-using-recommender
- https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints?view=azureml-api-2#what-are-batch-endpoints
- https://azure.microsoft.com/en-us/blog/building-recommender-systems-with-azure-machine-learning-service/
In the long term, we would like to be able to merge recommendations coming from the two models. influence between the two models might be close to an exponential. Influence of trends-based filtering should lower over the number of event logs, like b * exp(-ax)
.