Data extraction from Russian oil and gas reports
Built an automated LLM pipeline to extract structured data from annual Russian oil and gas reports and map entries to Rystad Energy's database.
During the Fall 2025 semester, ReLU collaborated with Rystad Energy on a data-focused project within the energy sector.
Data extraction using LLMs
The project focused on extracting structured data from annual Russian oil and gas reports. These reports vary in structure between years and are published in Russian, making manual data collection time-consuming, error-prone, and difficult to standardize.
Throughout the semester, ReLU worked on designing and refining an automated data extraction pipeline using large language models. A key focus was not only extracting relevant data from unstructured reports, but also mapping each extracted entry to the correct corresponding entity in Rystad Energy’s existing database. This required an assessment of multiple attributes, where different factors were weighted to determine the most appropriate mapping despite variations in naming and structure.
The final deliverable consisted of a structured dataset extracted from the reports, accompanied by suggested mappings to Rystad Energy’s database.