Journal of Advances in Artificial Intelligence and Machine Learning
Open AccessThe Role of Error in the Computational Generation of Word Forms
Authors: Tibor Mezo.
Abstract
The study examines how scribal errors, stem truncations, and orthographic fluctuations found in medieval manuscripts can affect the computational identification and normalization of word forms. Our starting point is a classic case of internal error: the form scegegkel recorded in the bilingual manuscript of the Old Hungarian Lament of Mary, which corresponds to the modern form szegekkel ("with nails" in English). Although it appears to be a graphic error (k → g) combined with a truncated stem, the seemingly incorrect word form is not merely a philological curiosity; by evoking the iron nails of the crucifixion, it also carries theological and poetic weight.
The paper seeks to answer three key questions: (1) What historical morphotactic rules underlie the distortion of the word form? (2) To what extent does the error alter the narrative meaning of the Latin-Hungarian text pair? (3) How can these phenomena be modeled and corrected within the framework of modern digital philology? To address these questions, we construct a hybrid processing pipeline: neural HTR-based correction (Transformer + CTC), rule- based finite-state transducer for stem and affix normalization, edit distance + trigram language model fine-tuning, as well as Latin–Hungarian alignment-based lacuna detection. The prototype resulted in a 7% improvement in lemma accuracy and a 12% reduction in affix resolution errors across the full corpus of the Old Hungarian Lament of Mary.
The analysis highlights that errors are not merely flaws, but carriers of historical linguistic information: the stem- final sound change ?k > g, the archaic allomorphs of the -val/-vel instrumental suffix, and the fluctuation of the plural marker -k all serve as valuable data sources in each case.
The author proposes a TEI-compatible, multi-layer annotation system to clearly distinguish between the phonological, morphological, and graphical layers.
Editor-in-Chief
View full editorial board →