Coder Social home page Coder Social logo

ocr_for_contacts_list's Introduction

ocr_for_contacts_list

examples of screenshots to extract info from
examples of screenshots to extract info from

extracted list:

{
   "Jean Claude DUSS pro":"01234567891",
   "Jean Claude DUSS perso":"01234567892",
   "Frangois Pignon home":"01234567893",
   "Frangois Pignon pro":"01234567894",
   "Frangois Pignon perso":"01234567895",
   "Hubert Boniseur de La Bath":"01234567896",
   "Hubert perso":"01234567897",
   "oss":"01234567898",
   "Commisssire San-Antonio":"01234567899",
   "Patrick Chirac":"012345678910",
   "Béru":"012345678911"
}

disclaimer:

  • there are better solutions for exporting a contact list from a smartphone (e.g. here)
  • the present script is just for getting familiar with some off-the-shelf OCR tools

motivation:

  • export a simple phone contact list (in particular from iOS)
  • some pieces of software (e.g. dr.fone) are able to do it but you have to pay to export the final list

principle:

  • go to any software that can display your contact list when connecting your phone (e.g. dr.fone)
  • make screenshots of every sections (I managed to pack my 250 contacts in 25 groups of 11)
    • help: on Windows, press the Windows+PrtScn buttons on your keyboard to save the screen
  • automate the cropping to keep relevant info (need to set the parameters manually)
  • save these images in the "/data" directory
cropped image = input of OCR
cropped image = input of OCR

known issues:

  • the OCR used (tesseract) may make errors (for instance "0" read as "(")
  • worse, sometimes the dr.fone software ignores a name or a num, leaving a blank cell
    • the python script cannot intrepret that a correct the odd number when match
    • solution could be to force the OCR to read line by line and not column by column as it is the case by default
  • I manually add patches based on the errors I met. Corresponding lines are commented with # hack!:

further work:

  • generate a .vcf file from the output .json (maybe using excel first)
  • example

ocr_for_contacts_list's People

Contributors

chauvinsimon avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.