-JKP- > 23-04-2020, 06:54 PM
RenegadeHealer > 23-04-2020, 09:30 PM
(23-04-2020, 06:54 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.I have worked out rules that work for a great majority of the text, enough that I can tell if something is legal Voynichese, and enough to reproduce "legal" chunks of text but I have not fully worked out an order for the tokens. I have struggled with this for years, but I haven't gotten discouraged. I think it might be possible. What I can do is classify many of them into family groups.
-JKP- Wrote:I can get some good results on subsets of tokens or particular sections, but don't yet have something that explains it all. There are places here and there where a character inexplicably shows up in a place where it shouldn't. It doesn't happen often, but it happens often enough that I wonder if there's a dynamic that I haven't sorted out correctly. I don't want to write it off as scribal error. I'd rather search for the reason until I've exhausted all possibilities.
-JKP- > 24-04-2020, 12:27 AM
ReneZ > 24-04-2020, 06:18 AM
MarcoP > 24-04-2020, 08:44 AM
(23-04-2020, 09:30 PM)RenegadeHealer Wrote: You are not allowed to view links. Register or Login to view.If I were an idle rich geek, I'd hold a contest, modeled on engineers' egg drop and load-bearing bridge building contests, called the Build-a-Vord Challenge. Each entrant would have a set amount of time (a couple of months, maybe) to design an algorithm that generates Voynichese vords. Each one would be modeled and run for 38,000 cycles with the same hardware and software. The entrant whose algorithm output had the highest ratio of types actually found in the VMS to types not found in the original, would get a large donation made by me to a charity of their choice, or a scholarship, or something like that.
-JKP- > 24-04-2020, 05:37 PM
MarcoP > 24-04-2020, 09:02 PM
-JKP- > 24-04-2020, 10:12 PM
RenegadeHealer > 24-04-2020, 11:55 PM
(24-04-2020, 08:44 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Hi RenegadeHealer ,
as stated, the contest would find that this script is unbeatable (100% accurate):
for i in range(0,38000):
print "daiin"
MarcoP Wrote:Maybe you mean that we should compare word frequencies in the output with actual word frequencies in the manuscript (so that daiin should occur about 850 times, each of ol, chedy, aiin about 500 and so on).
MarcoP Wrote:Something very similar can be done with the grammar that Stolfi built 20 years ago (see You are not allowed to view links. Register or Login to view.): differently from most grammars (e.g. what Thomas posted at the start of this thread) Stolfi includes numerical weights for each rule. So, while in Thomas' model 'k' and 'f' are totally equivalent, Stolfi also models the fact that 'k' is about 30 times more frequent than 'f' (second number in each row):
G:
5858 0.34755 0.34755 t
1243 0.07375 0.42130 p
9423 0.55906 0.98036 k
331 0.01964 1.00000 f
There is no doubt that Stolfi's model (however good) can be improved, but is getting a better fit for word frequencies the most promising task on which we should spend our money (or time)?
MarcoP Wrote:Another participant to your contest could be Timm and Schinner's algorithm (see You are not allowed to view links. Register or Login to view.r). Like Stolfi's model, their algorithm contains several numerical parameters and one could tweak them to get a better fit for word frequencies. But they have chosen to follow a different line, investigating other properties of the text, rather than focusing on word structure. For instance their algorithm reproduces these phenomena:Though I don't think that Timm and Schinner come closer to actual word frequencies than Stolfi, their work marks a significant step forward, building on Stolfi's grammar by integrating word structure with other parts of the larger picture.
- the progressive drift in word frequencies through the text (what was initially seen as two different "languages" Currier A and B);
- reduplication and quasi-reduplication (words repeating consecutively, identically or with minimal changes);
- line-effects: words at the beginning or end of lines are different from other words.
MarcoP Wrote:Another recent "generative system" that adds to the field, without addressing the area of word frequencies is You are not allowed to view links. Register or Login to view..
Personally, I would not be terribly interested in a complex software (say, a You are not allowed to view links. Register or Login to view.) that produces a perfect word histogram and tells us nothing about dialects/language-drift, reduplication, first-last combinations (the influence of the last character of a word on the first character of the following word), the relationship between labelese and paragraph text, etc. Not only I believe that all features should be explained together (and Timm and Schinner have done the most extensive work in this direction) but I am sure there are many more features and patterns that have not been discovered yet (see Lisa Fagin Davis' ongoing research).
ReneZ > 25-04-2020, 06:04 AM
(24-04-2020, 05:37 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.Should anyone else be mentioned? I'm trying to keep the Prior Art down to a couple of pages, just enough to explain the basic concepts of each system, and the key ways in which they differ from one another.