Author: Fumio Miyata https://orcid.org/0009-0008-8797-5578
Affiliation: Noetics Institute / noetics.institute
Published at: Zenodo Preprint
DOI: 10.5281/zenodo.20192002
Original record: https://doi.org/10.5281/zenodo.20192002
Abstract
he evaluation of Large Language Models (LLMs) is complicated by prompt sensitivity and data contamination, obscuring the distinction between genuine reasoning and rote memorization. This paper introduces a reproducible, contamination-free benchmark to measure how LLM responses vary with the prompt’s language, style, and syntactic format. Our methodology uses the constructed language Lojban—virtually absent from pre-training corpora—and a suite of novel symbolic prompting tasks to assess a model’s ability to interpret unfamiliar formal systems. The results indicate three key findings: (1) prompt strictness can elicit latent capabilities, but its effectiveness is limited to familiar languages; (2) models exhibit a significant ceiling in algorithmic complexity, failing to produce bug-free code for novel tasks; and (3) performance appears more indicative of sophisticated pattern matching than abstract reasoning. This work provides a comprehensive dataset and a rigorous framework for evaluating the generalization and true reasoning abilities of LLMs. The accompanying code and data are archived on Zenodo (DOI: https://doi.org/10.5281/zenodo.18043860).
Download
- PDF (recommended for Google Scholar indexing)
Citation
English citation:
Fumio Miyata, “Benchmarking LLM Sensitivity to Prompt Formats: A Contamination-Free Approach”, Zenodo, 2026.
DOI: 10.5281/zenodo.20192002.
Also available on ResearchGate
https://www.researchgate.net/publication/404884143_Benchmarking_LLM_Sensitivity_to_Prompt_Formats_A_Contamination-Free_Approach
This paper is also listed on ResearchGate for academic visibility and researcher discovery.
BibTeX
@article{miyata_bls_2026,
title = {Benchmarking LLM Sensitivity to Prompt Formats: A Contamination-Free Approach},
author = {Fumio Miyata},
year = {2026},
doi = {10.5281/zenodo.20192002},
url = {https://doi.org/10.5281/zenodo.20192002},
journal = {Zenodo Preprint}
}
Code & Data
All experimental resources and implementations are openly available on GitHub:
https://github.com/aikenkyu001/benchmarking_llm_against_prompt_formats/tree/main