About FormationEval

Overview

FormationEval is an MMLU-style multiple-choice question (MCQ) benchmark designed for evaluating language models on oil and gas geoscience knowledge. The benchmark covers subsurface disciplines including petrophysics, petroleum geology, geophysics and reservoir engineering.

The dataset contains 505 questions derived from authoritative textbooks and open courseware, with each question including a rationale and source citation. All questions were generated using a controlled LLM pipeline with human verification to ensure accuracy and coverage.

72 language models have been evaluated on this benchmark, spanning proprietary and open-weight models from major AI providers.

Domain distribution

Questions are tagged with 1-3 domains. Percentages sum to more than 100% due to multi-domain questions.

DomainCountPercentage
Petrophysics27253.9%
Petroleum Geology15129.9%
Sedimentology9819.4%
Geophysics8015.8%
Reservoir Engineering438.5%
Drilling Engineering244.8%
Production Engineering142.8%

Difficulty distribution

DifficultyCountPercentage
Easy13226.1%
Medium27454.3%
Hard9919.6%

Source materials

Questions are derived from the following sources. All questions are concept-based derivations, not direct copies.

Well Logging for Earth Scientists, 2nd Edition

Darwin V. Ellis and Julian M. Singer (2007)

TextbookProprietary (Springer)

Petroleum Geoscience: From Sedimentary Environments to Rock Physics

Knut Bjørlykke (Ed.) (2010)

TextbookProprietary (Springer)

TU Delft OpenCourseWare - Applied Earth Sciences

TU Delft (2024)

Open CourseCC BY-NC-SA 4.0

Citation

If you use FormationEval in your research, please cite our paper:

@misc{ermilov2026formationeval,
  title={FormationEval, an open multiple-choice benchmark for petroleum geoscience},
  author={Almaz Ermilov},
  year={2026},
  eprint={2601.02158},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2601.02158},
  doi={10.48550/arXiv.2601.02158}
}

About the author

Almaz Ermilov

Software engineer with a background in petrophysics. Previously worked as a petrophysicist in the oil and gas industry before transitioning to software development. FormationEval was developed to measure how well language models understand subsurface concepts and technical knowledge in the petroleum geoscience domain.

Resources