I hadn't considered or read about this problem before but it makes sense.
It reminds me of the cuneiform problem. Between 500,000 and 1 million tablets have been collected. This is one of the earliest preserved writing systems. Even so, fewer than 10% of these tablets have been translated. I was surprised to learn this but it makes sense. There are several problems:
1. Scribes used a lot of shorthand;
2. Cuneiform itself changed over time;
3. Writers would use multiple languages (eg Sumerian, Akkadian), even on the same tablet. There are relatively few people fluent in these languages, particularly in multiple of them at once;
4. To some extent the tablets are 3D such that a 2D photo might not be sufficient to translate because you might need to physically turn the tablet to accurately see the marks; and
5. In some cases the tablets are incomplete or broken so you may not to figure out how things fit together.
I wonder if AI can help make inroads into this 90%. I really wonder what is waiting to be unearthed.