examples of screenshots to extract info from |
extracted list:
{
"Jean Claude DUSS pro":"01234567891",
"Jean Claude DUSS perso":"01234567892",
"Frangois Pignon home":"01234567893",
"Frangois Pignon pro":"01234567894",
"Frangois Pignon perso":"01234567895",
"Hubert Boniseur de La Bath":"01234567896",
"Hubert perso":"01234567897",
"oss":"01234567898",
"Commisssire San-Antonio":"01234567899",
"Patrick Chirac":"012345678910",
"Béru":"012345678911"
}
- there are better solutions for exporting a contact list from a smartphone (e.g. here)
- the present script is just for getting familiar with some off-the-shelf OCR tools
- export a simple phone contact list (in particular from iOS)
- some pieces of software (e.g. dr.fone) are able to do it but you have to pay to export the final list
- go to any software that can display your contact list when connecting your phone (e.g. dr.fone)
- make screenshots of every sections (I managed to pack my 250 contacts in 25 groups of 11)
- help: on Windows, press the
Windows
+PrtScn
buttons on your keyboard to save the screen
- help: on Windows, press the
- automate the cropping to keep relevant info (need to set the parameters manually)
- save these images in the
"/data"
directory
cropped image = input of OCR |
- the OCR used (tesseract) may make errors (for instance "0" read as "(")
- worse, sometimes the dr.fone software ignores a name or a num, leaving a blank cell
- the python script cannot intrepret that a correct the odd number when match
- solution could be to force the OCR to read line by line and not column by column as it is the case by default
- I manually add patches based on the errors I met. Corresponding lines are commented with
# hack!:
- generate a
.vcf
file from the output.json
(maybe using excel first) - example