Apologies for overlap with prior comments that showed up after I composed this...
* There is a related paper (in English) which Orlov is a co-author on at You are not allowed to view links.
Register or
Login to view., which should also be read by anyone looking at this paper who doesn't speak Russian (but does speak English) -- same techniques applied.
* Obvious but necessary caveat that my review is based on the Google translation from Russian to English, so apologies for any errors in my comments caused by flaws in the automated translation.
* Paper appears to have been presented at a conference that ran Feb 4-5, 2021, so it's unclear that there's any reasonable accessibility-based explanation for lack of awareness of/reference to key prior work.
* Very first sentence of intro (in translation): "The Voynich Manuscript(hereinafter MV) [1] is a manuscript dated by researchers of the 16th century." (sic) -- lack of awareness of C-14 dating.
* "Numerous studies to decipher this text have been carried out for more than a hundred years, but without success. The existing versions about the authorship, content, and language of the manuscript, a review of which can be found in [2–4], are not sufficiently convincingly supported by full-fledged statistical studies." -- references 2 - 4 are Nick Pelling's _The Curse of the Voynich_, J. G. Barabe's report for McCrone on the materials analysis, and Levitov's '87 book describing his "solution." Along with a reference to Yale's catalog entry and Landini & Zandbergen '98, those are the only references to anything to do with the manuscript. No reference to existing overviews of the statistical properties of the text such as Bowern & Lindemann or Reddy & Knight.
* "There is also no consensus on how many and what signs are in the MV. There is a so-called "European transcription" (EVA [6]) mapping characters of the manuscript into the Latin alphabet. In addition, there is a transcription of Takahashi [7] - also in Latin, but with different frequencies." Where to begin? EVA isn't a transcription, it's a transcription alphabet; the Takahashi transcription uses EVA. Raw EVA is used in the analysis without any recognition/discussion of the commitments that's making regarding the nature of the underlying script (are ligatured gallows single characters? "iin"/"iiin"?) -- the results are only going to be as valid as those implicit assumptions are...
* First statistical analysis performed: comparison of L1 norm (taxicab distance) between rank-ordered character frequency vectors compared with various languages -- GIGO issue wrt use of raw EVA.
* Second statistical analysis performed: comparison of "the Hurst exponent for a series of the number of letters enclosed between the two most frequently occurring identical letters" -- unfortunately, the labels for the graph on p. 10 got stripped out during the translation process. Again, the issue of how EVA affects those distance counts is a potential problem. The Arxiv paper reference above presents the same or a similar analysis; part of the conclusion from that analysis given there is, "In case of the Manuscript observed distributions are shifted to the right and have much less acute maximum compared to all other curves on Fig.8. This means that statistics of the Manuscript does not agree with statistics of texts written in one particular language. Roughly speaking, symbols in the Manuscript are placed 'more randomly' compared to the latter. Further analysis of these issues will be presented in the following sections of the paper. There are two main options here: the Manuscript is written in a special constructed language or it is written in several languages."
* Here we get to one of the key failings of the paper(s): the authors show no awareness of (or at the very least do not engage in any way with) any of the prior observations regarding statistically distinct "languages" in the mss. going back to Currier's paper and confirmed/refined by multiple published cluster analysis studies over the intervening decades.
* They then compare the spectral properties of the digram frequency matrices for the two EVA-based transcriptions they use vs. languages in the Germanic and Romance families with and without vowels. Not clear what text corpora are being used (i.e., are they using 16th (sic) century or earlier texts in the various languages or more modern samples?).
* One of their conclusions is that the differences between the character statistics examined in the different sections of the mss. are more comparable in magnitude to the differences between languages than within languages (again, with the caveats that go with using raw EVA as input). Without prejudice to what that *means*, it points to an issue with other analyses that merrily assume that the differences between "languages" in the mss merely reflect changes in topic/author/etc. rather than difference in language/cipher key or system/etc. That should be demonstrated, not assumed. BTW, having fed Herbal A & Bio B (in Currier) into my monoalphabetic cipher solver in the past, that is consistent with what I have observed wrt within-/between-language differences in the matching metric used there (a chi^2 statistic on the digram frequency stats) -- the magnitude of the difference is more consistent with two different languages than variation within a language -- not that I think Voynichese is a monoalphabetic substitution at the glyph level...
* The bottom line of the work (as given in the Arxiv paper) is, "Concerning the Manuscript, it seems most plausible that it was written in two languages having the same alphabet without vowel letters: 30% of the text is written in one of the Germanic languages (Danish or German) and the rest 70% – in one of the Romance languages (Latin or Spanish)."
* The bottom line of my impression of the paper:
1) The statistical analyses per se seem fine, subject to all the appropriate caveats about using raw EVA.
2) With regard to their overall conclusion (as given in the Arxiv paper), there's an old joke about two economists who are walking down the street when they see what appears to be a $20 bill lying on the sidewalk. One of them starts to bend down to pick it up, and the other one says, "Don't bother -- if that really was a $20 bill, someone would have picked it up by now." I hate to be that economist, but...if the Voynich text were a monoalohabetic cipher with EVA characters mapping to consonants in a devoweled Germanic or Romance language I'd think it'd have been solved by now -- especially with the increasing availability of historical text corpora. On the other hand, _chacun a son gout_.
3) The lack of any apparent awareness of prior work on different "languages" within the text is troubling, although independent confirmation by different means still has value.
4) The lack of any apparent awareness of the broader array of prior work on statistical characteristics of the text is also troubling. They don't address the question of what the entropy statistics of devoweled European languages look like (I genuinely don't remember how that sorts out compared to the Voynich text). Referencing/replicating Reddy & Knight's observations on word length distribution ("However, Stolfi (2005) show that Pinyin Chinese, Tibetan, and Vietnamese word lengths follow a binomial distribution, and we found (Figure 3) that certain scripts that do not contain vowels, like Buckwalter Arabic and devoweled English, have a binomial distribution as well.3 The similarity with devoweled scripts, especially Arabic, reinforces the hypothesis that the VMS script may be an abjad.") as an independent line of support for Voynichese as a devoweled European language would have helped make their case.
5) As an aside regarding the results of apply Sukhotin's algorithm to the Voynich text, it's pretty clear that its identification of Currier O, A, and C as vowels is an artifact of the verbose glyph combinations that start with them (which also are the main contributors to pulling down the 1st and 2nd order entropy values). I don't consider that result strong evidence for the existence of vowels in "Voynichese".