Skip to main content
Version: Next

Event logging

Abstract

The whole purpose of event logging is keeping right vision of the current user interest. A first step was to classify data using Personalisation Hints. We will track what content the user stays onto and, alongside with the onboarding questionnaire, generate a correct amount of data for the recommendation engine.

This document presents how event logging will be structured, what event we will track in this first PoC, what will be the technical structure of this system, and what privacy measures we must set upon.

TODOs

  • Data structure + Table schema
  • Weight of event log : unit is duration of interest in seconds.
  • Hosting (reminder, put schema in overview)
  • Encryption + enable/disable
  • Consent (just to mention it, dedicated section)
  • Frontend : new OC
  • API endpoints

Data structure

Event is a generic concept. It basically represents something happening in the app reflecting a user interest. Therefore, it embeds a personalisation hint, the source of this hint, a weight, a timestamp and the pseudonomized user id.

Source

Source is a simple text column, used to identify where does the hint comes from. if this source should not be indentified, this field can stay NULL. A source identifier should respect the following format: action:feature. the full source should be written in lowercase.

For instance, we are logging a page view from motivation feature. Therefore source should be screen_view:motivation

Weight

Weight is a scoring indicator, pointing how much this hint raised user interest.

It does have a unit, in seconds. For instance, weight could be the time a user spent on a screen. On events that does not directly involve user to spend time, you can define a baseline weight and adapt the value on the answer/event a user provided. For instance, on a questionnaire, all answers could lead to the same hint, but with different power. If users are often interested on sports, we could assume they would spend 600 seconds on it per day. If they are heavily invested in sports, maybe something closer to 3600.

Weight is not defined on the CMS. It is something that must be calculated and provided by the event source. For the PoC, we will keep it simple and rely on screen views of content.

Schema overview

Hosting

This table will be the fuel our recommendation engine will consume. Since the engine is mostly relying on a machine learning model, we will need to have intensive look over it. Therefore, ML will rely on Azure ecosystem to reach standards of quality and performance of the market.

Also, to limit bandwidth usage, database will be stored inside Azure tenant and not in a regular database hosted by ITSCare.

A global view on data flow is available in Data flow.

Encryption

Event logs carry sensitive data from user : it provides a view of the user interest for at specific moment. This leads to data privacy issue, especially by using a third-party hosting provider like Azure. Moreover, we only pseudonomize the user data : recommendations would no longer make sense if we no longer know to who we should address them.

Therefore, we introduce asymetric encryption on both hint and source columns. The key would be stored only in ITSCare environment, therefore the meaning behind the hints would only be known by the AOKs.

The algorithms should rely on latest signing and encryption methods, including ECDH.

TODO User-specific encryption + AOK-mastered signature : use signature to perform collaborative filtering, and let user be the only one to have access to the value of hint.