
Image Captioning and Alt Text Generation using LLMs
During the academic year 2024/2025, ReLU has collaborated with Scibsted Media on a project regarding image captioning and alt text generation at VG. The goal was to fine-tune LLM’s to generate image captions in VG-style, based on the article text and article image. In this way, it would be possible to automate some of the journalists' repetitive tasks. We also experimented with alt text generation doing prompt engineering, to ensure good quality alt texts for vision impaired readers.
Our task was to explore and identify the best models and different methods for generating image captions and alt text. We fine-tuned models like Gemma, Llama and Qwen to produce VG-style captions. During the project period, we tested both multimodal fine-tuning and pure text finetuning, and benchmarked this with GPT-4 Turbo. In addition to this, we experimented with an AWS-based face recognition system for identifying both international and VG celebrities.
-
Schibsted is a leading Nordic media and online marketplaces group with roots dating back to 1839. Headquartered in Oslo and publicly listed on the Oslo Stock Exchange, the company operates a family of over 55 digital brands—including news outlets like VG, Aftenposten, Aftonbladet, Svenska Dagbladet, E24, and Bergens Tidende—alongside major marketplace platforms such as FINN, Blocket, DBA, and Oikotie. With nearly 6,000 employees across hubs in Oslo, Stockholm, Copenhagen, and Helsinki—and a broader presence in Poland, Austria, Portugal, and Spain—Schibsted draws over 1 billion monthly visits, reaching approximately 2.6 million Norwegians daily.