
Entity Matching Pipeline and Technique Exploration
In the Fall 2024, we completed a project with Rystad Energy focusing on building and exploring a high performing pipeline for Entity Matching (EM).
Our exploration included techniques such as data preprocessing, fuzzy search, pairwise LLM matching, different prompting strategies, embeddings. We located categories of entities that were harder to match, and found a high performing pipeline in terms of both accuracy and Top N accuracy. By quantitatively measuring the performance of different techniques, we found guiding principles that help increase both speed and performance in EM pipelines, allowing Rystad Energy to reapply these insights for other datasets.
Our final deliverable consisted of the code used to produce and test new models, as well as a report on our findings, containing details on technique and model performance, insights from test results, and economic evaluations of model performance.
-
Rystad Energy is a leading global energy research and business intelligence firm, renowned for its vast databases and in-depth analysis across the oil, gas, and renewable energy sectors. Since its beginning 20 years ago, Rystad has delivered consulting and analytics services to a wide array of entities and is present all over the world, with offices in Oslo, New York, London, Singapore, Rio De Janeiro, Beijing and Sydney.