saim-khalid / ai-id-scanner Goto Github PK

View Code? Open in Web Editor NEW

Shell 1.98% JavaScript 0.07% C++ 5.61% Python 49.52% C 0.07% Java 0.55% Cuda 0.17% Makefile 0.10% HTML 0.01% CMake 0.16% Cython 0.03% Dockerfile 0.07% Jupyter Notebook 41.67%

ai-id-scanner's Introduction

Extracting Text from IDs and Passports using PaddleOCR

Scope

This repository includes files and instructions on how to use PaddleOCR to extract text from images of South African, Zimbabwe, Kenyan IDs passports and Driving Lisences.

ID Card Analysis

This repository contains differentt key components, which are as follows:

OCR Subrepository: The OCR Subrepository is dedicated to extracting text content from a variety of ID cards through optical character recognition (OCR). This pivotal process involves converting OCR data into readable text, thereby capturing essential information from ID cards of diverse formats. The extracted text serves as the foundational data for downstream tasks, enabling further analysis and processing.
NER Subrepository: The NER focuses on extracting valuable information from OCR (Optical Character Recognition) results obtained using PaddleOCR on various ID cards. NER enables the identification and categorization of entities such as names, dates, addresses, and more from the OCR text.
Text Classification Subrepository: The Text Classification is responsible for classifying the pre-processed OCR text into specific ID card categories. This classification step helps determine which type of ID card the extracted information belongs to, providing valuable insights for further analysis and processing.
Signature Extraction Subrepository: The Signature Extraction subrepository employs a customized YOLOv5 model to detect and extract signatures from various ID cards.