It feels a little weird commenting on Sections 1 & 2 because while I agree wholeheartedly with the proposition that the application of the type of topic analysis discussed to the Voyn Mss text is (probably/almost certainly) not useful -- on reading the journal version of You are not allowed to view links.
Login to view. I wrote in my notes "It would be interesting to apply this technique to the Voynich Mss if I didn't think the results would be meaningless" -- I do so for different reasons that to some extent are contradictory to Timm & Schinner's:
1) Such papers rely -- as does Timm & Schinner's use of their token network analysis to argue against the existence of a meaningful text in the mss(*) -- on the questionable assumption that space-delimited tokens in the mss text correspond to words in whatever underlying text may exist, and
2) Such papers assume that the Currier dialects do not represent qualitatively different samples that can't blithely be thrown into the same analytic blender and produce meaningful results. This is actually one of the big points of divergence between Timm & Schinner's critique and my views, in that they argue that there is not a qualitative difference between the Currier dialects while I view the qualitative difference as real and large enough to invalidate just chucking all the pages into one's topic analysis method of choice.
(*) Having said that, it's important to acknowledge that theories in which there is an underlying meaningful text that is transformed in such a way that spaces in the mss don't correspond to word breaks in the original text need to show how those theories explain the token network analysis results. The point I'm making is that if the "words" in the mss. aren't words in the underyling text, then the fact that Timm & Schinner's token network analysis shows that "words" in the mss. don't behave like words in a natural language is irrelevant because the "words" in the mss. aren't words in a natural language.
Notes on Section 2 "Linguistic structures":
"The existence versus nonexistence of structures in the VMS that are characteristic for linguistically meaningful text (and that cannot be explained as by-product of an algorithmic construction process) is perhaps the most important key question." -- I am in vehement agreement with this, with the caveat that I'd very much like to see work that focuses on metrics that characterize "meaningful" text which are agnostic with regard to the existence/position of spaces.
"Meanwhile it has been shown by Timm and Schinner (2020) that both of Zipf’s laws can emerge as necessary by-products of an intuitive pseudo-text algorithm." -- I am also in vehement agreement that there are ways of generating pseudo-vocabularies that show Zipf's Law-like behaviors, with the caveat that those ways don't necessarily involve algorithmically generated or otherwise non-meaningful text (e.g., "Indeed, the theoretical challenge raised by this model can be illustrated by taking a corpus of text and dividing it on a character other than the space (' ') character, treating, for instance, 'e' as a word boundary.^^14 <#FN14> Doing this robustly recovers a near-Zipfian distribution over these artificial 'words,' as shown in Fig. 9." [You are not allowed to view links.
Login to view.).
So far, so good, but at this point my views start to diverge and my problems with the paper begin...
"Indeed, the existence of two statistically strictly separated sub-texts, Currier A and B, would provide some evidence for an underlying meaningful text, either as two dialects, topics, or different encryption/encoding schemes." -- I vigorously
disagree with this statement, ironically because I can envision the A/B split being the result of something like Rugg's grill method with different tables under the grill rather than the result of a meaningful text -- or at the very least, I'm fairly confident that would be Rugg's take on the matter.
In Section 2.1, "The Currier languages," Timm & Schinner present an analysis in which they measures the similarity between pairs of pages in the manuscript by computing the normalized dot product of the "word" frequency vectors (where each word in the vocabulary has a corresponding frequency count in the vector) -- this corresponds to the cosine of the angle between the pages' vectors in word frequency space. Having shown the absence of a sharp break point in the values of this page similarity metric, the authors conclude "This behavior confirms the hypothesis of a continuous evolution from Currier A to B,...The answer to the question of whether a particular folio would belong to Currier A is not a definitive yes or no, but rather a percentage number." -- Sorry, but no. This claim fails on three fronts:
1) It is fatally methodologically flawed to do any sort of cluster analysis on 102 data points ("The VMS consists of 102 folios, from which 5151 pairs can be selected (excluding redundancy by symmetry, and the case of two identical folios)") in a 7000-dimensional space ("The VMS vocabulary consists of about 7000 words. Let ~v be a vector with each component representing the token frequency of one of these words."). This is due to what is known as "the curse of dimensionality" -- indeed, as one web page describing the issue (You are not allowed to view links.
Login to view.) points out "
Too many dimensions cause every observation in your dataset to appear equidistant from all the others" (my emphasis; while that quote is talking about Euclidean distance, if T&S want to claim that the same problem doesn't exist for vector cosines I would genuinely like to see a reference and will metaphorically eat my hat on this point if they can provide one). In fact, given the inappropriately high dimensionality of their feature space the very fact that the Bio and Recipes sections stand out in Figure 2 the way they do -- that there is any kind of signal at all in the noise, that there are any kind of distinctive clusters at all in their data -- strikes me as compelling evidence
against Timm & Schinner's conclusion. That doesn't mean there aren't methodologically sound experiments one could design to test whether the Currier dialects are or aren't separable in vocabulary space, it just means this isn't one of them...Frankly, it's somewhat surprising to me that this got past the reviewers for
Cryptologia. I'm always willing to entertain the possibility that maybe I'm the crazy one here -- does anyone who's read the paper and knows their stuff when it comes to cluster analysis/pattern classification want to defend T&S's methodology here?
2) The correct conclusion from what is shown in Section 2.1 is not that whether a particular folio belongs to Currier A is fuzzy, the correct conclusion is that token frequency vector cosine similarity is a poor way of making that classification decision. The Currier A dialect and the Currier B dialects (plural -- the one used in some of the herbal folios, and the one used in what Currier referred to as the "Biological" folios) are absolutely separable on the basis of nothing more than letter pair frequency statistics -- in fact A pages can be separated from B pages with 100% accuracy using nothing more that the relative frequency on a page of EVA <ed>, and that frequency does not smoothly evolve between the A and B pages. The on-page frequency of <ed> (or, more precisely "C8" since I work in the Currier transcription alphabet) ranges from 0.00% to 0.51% for Herbal A pages; it ranges from 1.34% to 9.05% for Herbal B pages and 2.58% to 8.90% for Bio B pages. That's a 0.8% gap between the Herbal A page with most frequent use of <ed> and the Herbal/Bio B page with the least frequent use of <ed> (or put differently, the page in Herbal B that uses it least still uses it more than twice as often as the Herbal A page that uses it most). While the ranges for Herbal B and Bio B overlap, that isn't because those pages aren't separable using digram frequencies, it's because there isn't one single digram that is diagnostic of the split the way there is for the A/B split.
3) In discussing more sophisticated methods of topic analysis in Section 2.2, they acknowledge the importance of removing common "function words" from texts for topic modeling to generate meaningful results: "All topic modeling approaches need a pre-processing step that removes function words from the input data base, because they (a) usually are the most frequent tokens, and (b) carry no contextual information. Too many such words would otherwise bury the rather sensitive clustering algorithm under an intolerable amount of noise, eventually rendering it useless." While their discussion of the difficulties of doing this with the Voynich text in Section 2.2 are spot-on, their lack of any attempt to do so in their analysis in Section 2.1 poisons their conclusion here. The fact that T&S don't believe any of the "words" are, in fact, function words is irrelevant to this objection to their conclusion.
Having said all that, I think there very much is a valid critique of the body of work applying topic modeling techniques to the Voynich text that is based in the Currier dialects, it's just not the one T&S make. Consider not just the marked differences in digram frequencies between A and B pages, but also the differences in most-common words (in Currier, not EVA -- following my mother's advice, just because all my friends jumped off a particular bridge doesn't require me to follow suit...):
Herbal A: 8AM, SOE, SOR, 89, S9, 2, ZOE, Q9, 8AN, ZO
Herbal B: 8AM, SC89, OR, AR, AM, 8AR, 89, S89, 4OFC89, ZC89
Bio B: ZC89, SC89, OE, 4OFC89, 4OFCC89, 4OFAN, 4OFAE, 4OE, 4OFCC9, 4OFAM
*If* you're going to throw Herbal A, Herbal B, and Bio B pages into the same analytic blender on the assumption that the difference between them merely reflect differences in content topic, you have the burden of proof (if you think this is a natural language) to show
some example (preferably more than one) of a language and pair of topics that shows
* the same quantitative level of difference in basic letter and letter pair frequency stats that is seen between A & B pages
* the same lack of overlap in common vocabulary words (Bowern & Lindermann point this out as well: "While there is some overlap, the most common vocabulary items of Voynich A and Voynich B are substantially different. While the words in both languages are built from the same three-field structure, they do not clearly correspond to each other. They might be the result of different encoding processes, or they might represent different underlying natural languages." You are not allowed to view links.
Login to view.)
On top of that, you also need to explain the marked differences between Herbal A and Herbal B
despite the apparent commonality of topic based on illustration type (single large plant drawing).
Section 2.2, "Topic modeling" -- as should be clear from what I've said above, I mostly have an "a pox on both their houses" reaction to this.
Section 3 briefly critiques a number of papers from the Malta conference:
3.1, "Crux of the MATTR": Timm & Schinner say, "His conclusion 'The profile of Voynich A suggests that t is more morphologically complex than Voynich B, which may indicate that it encodes a separate language or dialect' (which later on is used as implicit argument against the gibberish interpretation) is based on two assumptions: 1) separable 'linguistic domains' Currier A and B, and 2) the position of a text sample in the MATTR/MCW plane characterizes its morphology. Assumption (1) definitely is wrong, see our analysis in Section 2.1, as well as in Timm and Schinner (2020)." See the critique of Section 2.1 above.
3.2 "Voynich paleography": Lisa Fagin Davis is more than capable of defending her own work, doesn't need me to do it for her, and will hopefully do so here once she has read T&S's paper.
Of the other papers T&S discuss in section 3, Zattera's (Section 3.6, "Evidence of word structure") is the only one I've read in sufficient detail to evaluate their critique. I'm inclined to agree that the appearance of a "word" grammar or grammars, while seductive, is a mirage (although, again, for different reasons than T&S). Having said that, I'm puzzled by the objection that "Furthermore, it is difficult for the word grammar approach to really explain the characteristic relationship between similarity, spatial vicinity, and token frequency" given that explaining those things is generally not what people looking for a grammar for Voynichese word morphology are trying to do.
In focusing as hard as they do on arguing that their model is better, T&S miss a significant problem with Zattera's paper. While I'm sympathetic to Zaterra's concern about over-generalization by proposed Voynichese word grammars, his approach to addressing the issue treats any word generated by a proposed grammar but not in the actual text as a false positive. Looking at how the number of new word types grows as a function of the first N lines of Bio B (say) make very clear that the finite sample of the Voynich "language(s)" we have make that an incredibly sketchy assumption. There is a body of literature in the area of induction of regular grammars that addresses how to deal with the over generalization problem given only positive training examples (i.e., we don't really have a list of words that aren't in the vocabulary of the text), and the limitations that imposes on what you can do.
Section 3.8, "Gibberish after all?" -- T&S state "...later on in their paper Gaskell and Bowern state: 'A more significant limitation of this work is that, because of the short length of our text samples, we are unable to test whether gibberish can replicate the larger structural features (such as “topic words”) which have been observed in the VMS (Montemurro and Zanette 2013; Reddy and Knight 2011; Sterneck, Polish, and Bowern 2021). At present, these features pose a serious challenge to proponents of the hoax hypothesis.' While they do not explicitly explain (or give examples) for the term 'topic words' in this context, we presume that it refers to 'topic modeling', or previous attempts to
associate some Voynichese words with particular illustrations and/or Currier languages."
The concept of "topic words" appears to be a standard one in document analysis work (see, for instance, Shin & Zhang, "Extracting Topic Words and Clustering Documents by Probabilistic Graphical Models" You are not allowed to view links.
Login to view.). T&S are half-right, in that it is a term of art used in the topic modeling literature (see, for instance, Alokaili et al, You are not allowed to view links.
Login to view.). A conference paper is not a tutorial, and Gaskell & Bowern (or any other authors) are not responsible for T&S (or any other readers) being too lazy to spend 5 minutes with a search engine.