— An automated approach to mining and categorizing large-scale user reviews.
Analyzing user feedback is a central issue for any company seeking to continuously improve its product. However, when it comes to processing thousands of customer reviews, the exercise quickly becomes tedious and time-consuming. How can this mass of data be effectively exploited, without devoting inordinate resources to it?
This is the problem faced by Toilet Finder, the application that lists and rates public washrooms worldwide. Every day, its users share opinions covering a wide range of subjects, and while this diversity of feedback is invaluable, it also makes it tricky to identify trends, especially when they don't concern the application itself. To effectively process user feedback at scale, BeTomorrow has developed an innovative solution powered by AI and Machine Learning.
Our approach leverages several advanced technologies:
Algorithmic clustering : automatically groups similar reviews for a more structured analysis.
Embeddings : converts reviews into numerical representations to uncover hidden trends.
Generative AI : facilitates feedback interpretation and highlights actionable insights.
By combining these technologies, it becomes possible to:
Structure and organize user feedback without manual intervention.
Visualize emerging trends through tailored interfaces.
Extract relevant insights to drive product decisions and enhance user experience.
We use a Natural Language Processing (NLP) model to break down each review into thematic units. A single review can therefore be broken down into several chunks, each corresponding to a specific topic (for example, a review may mention both the cleanliness of a location and a feature of the application).
We use a Natural Language Processing (NLP) model to break down each review into thematic units. A single review can therefore be broken down into several chunks, each corresponding to a specific topic (for example, a review may mention both the cleanliness of a location and a feature of the application).
Each unit is then converted into a mathematical vector using an embedding technique. This allows the semantic meaning of the comments to be captured and effectively compared.
Once the vectors have been created, we then use clustering algorithms to group similar reviews into coherent clusters. This enables us to identify major emerging trends in user feedback.
For each identified cluster, we exploit a large language model (LLM) that analyzes the cluster to understand its main theme and generate a short summary.
Rather than displaying cluster information in an easily digestible list format, we have developed a dynamic cartography that makes it easy to explore the key themes raised by users, and to track the evolution of clusters over time to spot new trends.
With this approach, feedback analysis becomes a fluid, interactive exercise. Rather than having to process hundreds of comments manually, product teams have an overview of user expectations and irritants in a matter of minutes.
The benefits are immediate: effective prioritization of improvements (rapid identification of the most frequent subjects), monitoring of trends and anticipation of needs (dynamic analysis of changes in opinions), optimization of customer support (rapid detection of recurrences to adjust responses to users).
This solution is not only adapted to Toilet Finder. It can be deployed for any product or service generating large volumes of customer feedback, such as e-commerce platforms, SaaS services or mobile applications seeking to better exploit the voice of their users.
a rate in line with academic benchmarks for NLP (source: EMNLP 2022)
through automation (up to 70% - source: Gartner 2022)
for companies proactively exploiting these technologies (source: Forrester 2023)
How to optimize user feedback analysis with AI?
“Our solution on Toilet Finder combines mathematical clustering and generative AI to structure and analyze user feedback. The main challenge was optimizing the clustering, the results of which vary depending on the parameters but also on the data generated upstream by the LLM. LLMs generate summaries, but with a tendency to oversimplify key details. Clustering, which is more explicable and less costly, enables robust and adaptable analysis. Now modular and reusable, this solution paves the way for new applications to optimize feedback exploitation and decision-making.”
Johan Chataigner
Back-end Developer
Language models enable us to go beyond simply classifying reviews. They don't just group similar comments together, but are able to synthesize and interpret trends, offering a richer, more nuanced understanding of user feedback.