(16-02-2021, 05:48 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I am not sure to what extent this impacts our supposed ability to detect function words. Maybe the consistency of function words in a high-TTR text should make them easier to spot, if anything?
Hi Koen,
I think the only contribution to detection, if any, is setting a quantitative goal for token counts. It would be more helpful if there was a correlation with the number of function words among high-frequency word
types: e.g. the counts at the bottom of the "top 20" table in the previous post. But this is not the case: correlation is not significant (.21) with a flat regression line.
The You are not allowed to view links.
Register or
Login to view. puts forward some interesting ideas. High frequency types are more likely to be function words, but of course a few content words will also be frequent in most texts. One can compare different texts in the same language and only select as function words those that are very frequent in most texts: only frequent content words depend of the subject of the text.
For instance, compare the function words that are common to the English Genesis and the Grete Herball: six function words appear in the top 10 types in both texts (the, and, of, it, in, that). On the other hand, the two sets of frequent content words are totally disjunct.
This sounds good for the VMS, since the illustrations suggest that, say, the subject of Quire 13 is not identical to that of the Herbal. But of course the very basic problem is that, because of the differences in "dialects", there is very little overlap between the lexicon in different sections.