First, I want to say I completely support the utility of understanding the alpha state of the VM to whatever extent is possible. The obvious shuffling of the folio and quire order is a detriment to getting reliable results for many kinds of analysis that bridges these lengths, especially analysis of the text, and increased certainty of any kind would be useful.
You state that you have gotten some pushback on the suggested DNA gathering analysis. As much as I support biocodicology experiments to be done on the VM, it should be emphasized that
the answers you are asking from the technology have not been successfully achieved to date.
These are the issues that remain. Note I am assuming that based on the ZooMS data gotten in 2014, all the folios are bovine parchment -- but given the small sample size, I admit this could be an incorrect assumption.
1. Keep in mind you are asking for individual differentiation of a large number of animal sources of the parchment from each other. There is a high likelihood these animals will be highly related to each other because of geographic constraints of animal husbandry and, secondarily, trade of the time. This greatly complicates the analysis. It is quite different to distinguish protein samples of different species from each other than DNA samples of different, highly related individuals from each other.
This requires either single nucleotide polymorphism (SNP) or short tandem repeat (STR) analysis to get to the individual level. An analysis of this issue with modern herds estimates that it take at least 99 mapped SNPs that are different between individuals for a successful differentiation (see, Fernandez et al. You are not allowed to view links.
Register or
Login to view..) This requires a decent amount of intact DNA to do successfully. Obviously not an issue when you get your DNA through blood draws, but it is an issue when you are dealing with extracted DNA that is over 600 years old.
Campana et al. attempted STR analysis unsuccessfully You are not allowed to view links.
Register or
Login to view. -- granted this was 12 years ago, using standard DNA sequencing rather than next gen, but there hasn't been anyone that has tried it since. Their results indicated that none of the parchment sources were related to each other.
2. The eraser crumb sampling technique has proven decent for protein collection, but much less successful for DNA collection. The one publication that I am aware of that did both (and I have tried to find all related publications in my review) was Teasdale et al. You are not allowed to view links.
Register or
Login to view.. Although sufficient protein was obtained from all the bifolios that were attempted (65 samples) to identify species, only eight bifolios were attempted for DNA analysis ("large volume of eraser waste" was required) and only three of these eight had sufficient DNA integrity to do any kind of SNP analysis (e.g. about 4.6%). They saw "a trend" toward SNPs seen in modern North Europe breeds. There was no attempt to assign individual identity to these three samples. In fact, they saw a greater than expected differentiation between the samples, which they attributed to "the limited SNP recovery in these samples." This is precisely what would be expected with the VM samples, too.
Note that the most recent and most extensive project (Ruffini-Ronzani et al., You are not allowed to view links.
Register or
Login to view.) didn't even attempt DNA analysis but just stuck with ZooMS. Note the new software that has been developed is only for protein, and not DNA analysis. The full data set was recently published and it appears that nothing along a DNA analysis was even tried. This could well be related to the much greater difficulty of getting sufficient intact DNA from this sampling approach.
3. In my opinion, this pinpoints the most significant issue with the proposed analysis -- the likelihood of getting useful data is relatively low.
If you get only a small amount of DNA results, there will be an inability to associate any individual result with any other individual result (you need a minimum of 99 mismatches to know you are looking at a different individual). In Teasdale, they were comparing the limited samples to a modern cow SNP collection (they were looking for geographic placement through SNP
matches). With the proposed study, the comparisons will be looking for lack of matches
between the samples. Given the numbers of Teasdale, only a fraction of the samples will have enough DNA to analyze at all, and then, you have to get lucky enough for those samples to have enough overlap to find 99 mismatches in order to distinguish.
4. Given this scenario, of a low number of samples with enough DNA, and scattered location of what intact DNA that has been sequenced (which will be unique for each sample), and the likely relationship of the cows to each other, the greatest likelihood of a result is that
all the samples will appear related to each other (if you get decent overlap between the samples) OR alternatively, the amount of coverage will be so sparse,
none of the samples will be related to each other, like what Campana et al. found. Both of these results are equally not useful.
Asking this kind of question really emphasizes the issues with (1) being hampered by having to use non-destructive collection; (2) how the data from DNA available degrades with time; and (3) the lack of comparative data for medieval era cow genomes.
So, I hate to be negative -- but I tend to agree that progress in DNA extraction from eraser crumbs, progress in analysis of highly fragmented DNA results (e.g. software development for the DNA part -- maybe needing artificial intelligence to bridge gaps), and further information about medieval cow genomes in general (e.g. more DNA samples from medieval parchment -- maybe using destructive sampling if there is "waste" parchment?) to build a decent SNP library is needed before the likelihood of getting useful data seems high enough to me to attempt the study.
Happy to answer questions about these thoughts.