Benchmarking LLM Sensitivity to Prompt Formats: A Contamination-Free Approach

Author: Fumio Miyata https://orcid.org/0009-0008-8797-5578
Affiliation: Noetics Institute / noetics.institute
Published at: Zenodo Preprint
DOI: 10.5281/zenodo.20192002
Original record: https://doi.org/10.5281/zenodo.20192002


Abstract

he evaluation of Large Language Models (LLMs) is complicated by prompt sensitivity and data contamination, obscuring the distinction between genuine reasoning and rote memorization. This paper introduces a reproducible, contamination-free benchmark to measure how LLM responses vary with the prompt’s language, style, and syntactic format. Our methodology uses the constructed language Lojban—virtually absent from pre-training corpora—and a suite of novel symbolic prompting tasks to assess a model’s ability to interpret unfamiliar formal systems. The results indicate three key findings: (1) prompt strictness can elicit latent capabilities, but its effectiveness is limited to familiar languages; (2) models exhibit a significant ceiling in algorithmic complexity, failing to produce bug-free code for novel tasks; and (3) performance appears more indicative of sophisticated pattern matching than abstract reasoning. This work provides a comprehensive dataset and a rigorous framework for evaluating the generalization and true reasoning abilities of LLMs. The accompanying code and data are archived on Zenodo (DOI: https://doi.org/10.5281/zenodo.18043860).


Download


Citation

English citation:

Fumio Miyata, “Benchmarking LLM Sensitivity to Prompt Formats: A Contamination-Free Approach”, Zenodo, 2026.
DOI: 10.5281/zenodo.20192002.


Also available on ResearchGate

https://www.researchgate.net/publication/404884143_Benchmarking_LLM_Sensitivity_to_Prompt_Formats_A_Contamination-Free_Approach
This paper is also listed on ResearchGate for academic visibility and researcher discovery.


BibTeX

@article{miyata_bls_2026,
  title   = {Benchmarking LLM Sensitivity to Prompt Formats: A Contamination-Free Approach},
  author  = {Fumio Miyata},
  year    = {2026},
  doi     = {10.5281/zenodo.20192002},
  url     = {https://doi.org/10.5281/zenodo.20192002},
  journal = {Zenodo Preprint}
}

Code & Data

All experimental resources and implementations are openly available on GitHub:

https://github.com/aikenkyu001/benchmarking_llm_against_prompt_formats/tree/main

Leave a Comment