Rystad Energy · Fall 2025 · Completed

Data extraction from Russian oil and gas reports

Built an automated LLM pipeline to extract structured data from annual Russian oil and gas reports and map entries to Rystad Energy's database.

LLMsInformation extractionEnergy
Abstract visualization of energy data extraction

During the Fall 2025 semester, ReLU collaborated with Rystad Energy on a data-focused project within the energy sector.

Data extraction using LLMs

The project focused on extracting structured data from annual Russian oil and gas reports. These reports vary in structure between years and are published in Russian, making manual data collection time-consuming, error-prone, and difficult to standardize.

Throughout the semester, ReLU worked on designing and refining an automated data extraction pipeline using large language models. A key focus was not only extracting relevant data from unstructured reports, but also mapping each extracted entry to the correct corresponding entity in Rystad Energy’s existing database. This required an assessment of multiple attributes, where different factors were weighted to determine the most appropriate mapping despite variations in naming and structure.

The final deliverable consisted of a structured dataset extracted from the reports, accompanied by suggested mappings to Rystad Energy’s database.

Data
Russian oil-and-gas industry reports
Methods
LLM-based extraction, prompt engineering, entity mapping
Handoff
structured dataset, suggested database mappings, extraction pipeline
Back to projects