Automated Analysis of Manuscripts: Entry-Level Possibilities for Individual Scholars
Cornelis van Lit (Utrecht University - www.digitalorientalist.com)
As libraries are completing digitisation projects, it us up to us scholars to find ways to utilise them. So far digitised manuscripts are mostly used for traditional philological purposes, merely as a stand-in for the actual artefact. However, their digital format and large quantities make them excellently suited for automated analysis. Attempts at this have virtually exclusively relied on big, well-funded, collaborative projects. Instead, I argue for digital research done by individuals, using and producing free tools, in a sustainable, future-proof manner. In short, I see benefit in relying on evolution rather than revolution. Next to promoting this programmatic paradigm shift, I introduce how, then, scholars can start such automated analysis themselves.
The technologies I use are Python and OpenCV (and NumPy for all kinds of calculations). Python is a programming language that is quite easy to learn and popular among Digital Humanists. OpenCV is open source software geared towards 'computer vision', that is, it provides all kinds of commands for the computer to bring the millions of pixels that an image consist of into a coherent whole from which relevant information can be deduced. I will discuss the setup and some crucial parts of the actual code I wrote to analysis a curious characteristic of Islamic manuscripts, namely a flap that falls onto the frontcover of a codex to keep it closed. This talk comes out of my research for a book about Manuscript Studies in the digital era.