Coder Social home page Coder Social logo

kidel / in-codice-ratio-ocr-with-cnn Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 4.0 928.54 MB

Deep learning experiments and library for 'In Codice Ratio' OCR, part of a project involving an AI that can process document from Archivio Segreto Vaticano.

License: Apache License 2.0

Jupyter Notebook 99.69% Python 0.31%
neural-network convolutional-neural-networks ocr mnist vatican deep-learning carolingian-minuscule

in-codice-ratio-ocr-with-cnn's Introduction

In Codice Ratio - OCR with CNN

Deep learning experiments and library for the OCR of In Codice Ratio, part of a project involving an artificial intelligence that can process document from Vatican Secret Archives. logo

In Codice Ratio (ICR) is a project curated by Roma Tre University in collaboration with Vatican Secret Archives. This project has the purpose of digitalizing the contents of documents and ancient texts from the Archive.

The problem we faced in this repository wes just a part of ICR, basically its core. We had to classify handwritten characters in Carolingian minuscule starting from an image of that character. The input is an ensemble of possible cuts of the word that has to be read, and our system has to be able to decide if a cut is correct and, if it is, which character it is.

Example

  • Bad cut of the word "asseras", recognized as "----s"
tagli cattivi della parola asseras
* Good cut of the word "asseras", recognized as "asseras"
tagli buoni della parola asseras

Other parts of ICR include a segmentation software, that is used to find words in a document and provide possible letter cuts to the OCR, and a Language Model to discriminate false positives among cuts classified by the OCR. The dataset is provided via a crowdsourcing platform. Those parts are not included in this repository.

The folder "Notebooks" includes our experiments and examples. The folder "Relazione" contains a deatiled relation about what we did. The folder "Libreria" has everything that is needed to use, load or retrain our networks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.