Coder Social home page Coder Social logo

fff's Introduction

fff

Freds Form Filler PHP and pdftk together make a REST form filler.

Some Notes On pdftk CHARSET madness

pdftk is great software but its maintence is a little wonky... Recent versions have moved, with the pdf spec to using strange and wonderous character sets when using fdf_generate. Earlier versions generated a readable ASCII/UTF-8 version. Other pdftk commands have a specific utf_8 flag to handle this... but not generate_fdf. specifically, UTF-16 strings that are prefixed with a Byte Order Mark or BOM (http://en.wikipedia.org/wiki/Byte_order_mark)... theoretically, the resulting fdf should work for actual form filling...

The compile toolchain

In order to build fff, you need to run the .compile.sh script. This will read the tx_cred.pdf form that you have pdfsrc/ and generate a bunch of build artifacts in build/ Read compile.sh for the real tool chain documentation... but here is the basic process..

  • First we use pdftk to generate the fdf from our source pdf. We have created a pdf with properly named form fields to make this possible.
  • The resulting fdf will be unreadable to humans because of the charset issues.. until pdftk also offers a utf8 output option from generate_fdf Thnks to http://blog.tremily.us/posts/PDF_forms/
  • So we generate a map using pdftk dump_data_fields_utf8
  • we turn that into both a HTML test form (which is one of the few things that is also compiled into the current directory for ease of use.. and a json file that represents all of elements of pdf
  • We upload that json to http://www.jsonschema.net/ and we get back a json_schema which also needs to be saved in the main directory. Eventually this is where we will model required fields for the pdf.
  • We also auto generate a smarty template form for this system, but we did this before we knew about http://alpacajs.org/

We abuse standard build process somewhat, because we regularly check the build directory into git, but we think that version controlling that directory could really help us as we track down bugs etc... esp because pdftk generate_fdf behaves so differently on different OSes.

fff's People

Contributors

ftrotter avatar rickgithubs avatar ashish-op avatar

Stargazers

Sam Likins avatar

Watchers

 avatar James Cloos avatar Sam Likins avatar  avatar

fff's Issues

run spellcheck on result of tx_cred.pdf compile

Rick,
I have created a new stage of the pdf compile process that serves to output all of the words in all of the fields that we extract from your pdf.

This way you can detect both naming inconsistencies and mispellings, both of which screw up later stage process badly. You can run the compile process in your own fork of fff... amd then look in the file build/field_word.txt for a word count. This will not tell you where your errors are but you can import it into a word processor and run a spell check, and then use the file build/upload_me_to_jsonschema.net.json to see which fields contain spelling mistakes by searching for strange characters...

You will need to add certain words like sigmoidoscopy to your word processors dictionary to make this work... etc etc

-FT

PDF button on last page

Put a new page on the back of the pdf with one very large button.

Change the url for the posting of the form to post to

http://REPLACE_THIS_URL

so that I can do a find and replace programmatically on it...

-FT

speciality vs specialty in tx_cred.pdf

Need to pick just one spelling:

http://grammarist.com/usage/speciality-specialty/

 "speciality_0_type": "speciality_0_type",
    "speciality_0_date_start": "speciality_0_date_start",
    "speciality_0_recertification_date": "speciality_0_recertification_date",
    "speciality_0_expiration_date": "speciality_0_expiration_date",
    "speciality_1_type": "speciality_1_type",
    "speciality_1_date_start": "speciality_1_date_start",
    "speciality_1_recertification_date": "speciality_1_recertification_date",
    "specialty_0_certifying_board": "specialty_0_certifying_board",
    "specialty_1_certifying_board": "specialty_1_certifying_board",
    "specialty_1_is_board_certified": "specialty_1_is_board_certified",
    "specialty_0_is_board_certified": "specialty_0_is_board_certified",
    "specialty_0_is_taken_exam_results_pending": "specialty_0_is_taken_exam_results_pending",
    "specialty_1_is_taken_exam_results_pending": "specialty_1_is_taken_exam_results_pending",
    "specialty_0_is_takenpart1_eligiblefor2": "specialty_0_is_takenpart1_eligiblefor2",
    "specialty_1_is_takenpart1_eligiblefor2": "specialty_1_is_takenpart1_eligiblefor2",
    "specialty_0_is_intending_sit_boards": "specialty_0_is_intending_sit_boards",
    "specialty_1_is_intending_sit_boards": "specialty_1_is_intending_sit_boards",
    "specialty_0_ist_planning_takeboards": "specialty_0_ist_planning_takeboards",
    "specialty_1_ist_planning_takeboards": "specialty_1_ist_planning_takeboards",
    "specialty_0_hmo_listed_yes": "specialty_0_hmo_listed_yes",
    "specialty_0_hmo_listed_no": "specialty_0_hmo_listed_no",
    "specialty_0_ppo_listed_yes": "specialty_0_ppo_listed_yes",
    "specialty_0_ppo_listed_no": "specialty_0_ppo_listed_no",
    "specialty_0_pos_listed_yes": "specialty_0_pos_listed_yes",
    "specialty_0_pos_listed_no": "specialty_0_pos_listed_no",
    "specialty_1_hmo_listed_yes": "specialty_1_hmo_listed_yes",
    "specialty_1_hmo_listed_no": "specialty_1_hmo_listed_no",
    "specialty_1_ppo_listed_yes": "specialty_1_ppo_listed_yes",
    "specialty_1_ppo_listed_no": "specialty_1_ppo_listed_no",
    "specialty_1_pos_listed_yes": "specialty_1_pos_listed_yes",
    "specialty_1_pos_listed_no": "specialty_1_pos_listed_no",
    "speciality_2_type": "speciality_2_type",
    "speciality_2_date_start": "speciality_2_date_start",
    "specialty_2_certifying_board": "specialty_2_certifying_board",
    "speciality_2_recertification_date": "speciality_2_recertification_date",
    "specialty_2_is_board_certified": "specialty_2_is_board_certified",
    "specialty_2_is_taken_exam_results_pending": "specialty_2_is_taken_exam_results_pending",
    "specialty_2_is_takenpart1_eligiblefor2": "specialty_2_is_takenpart1_eligiblefor2",
    "specialty_2_is_intending_sit_boards": "specialty_2_is_intending_sit_boards",
    "specialty_2_ist_planning_takeboards": "specialty_2_ist_planning_takeboards",
    "specialty_2_hmo_listed_yes": "specialty_2_hmo_listed_yes",
    "specialty_2_hmo_listed_no": "specialty_2_hmo_listed_no",
    "specialty_2_ppo_listed_yes": "specialty_2_ppo_listed_yes",
    "specialty_2_ppo_listed_no": "specialty_2_ppo_listed_no",
    "specialty_2_pos_listed_yes": "specialty_2_pos_listed_yes",
    "specialty_2_pos_listed_no": "specialty_2_pos_listed_no",

post_grad vs postgrad vs post in tx_cred.pdf

The problem lives here...
https://github.com/ftrotter/fff/blob/master/pdfsrc/tx_cred.pdf

There are three variations of "postgrad" that need to be collapsed to just "postgrad"

These two are not ok..
"post_grad",
"post",

Current the post grade section compiles like this:

    "postgrad_2_internship_is": "postgrad_2_internship_is",
    "postgrad_2_residency_is": "postgrad_2_residency_is",
    "postgrad_3_internship_is": "postgrad_3_internship_is",
    "postgrad_4_internship_is": "postgrad_4_internship_is",
    "postgrad_5_internship_is": "postgrad_5_internship_is",
    "postgrad_6_internship_is": "postgrad_6_internship_is",
    "postgrad_2_fellowship_is": "postgrad_2_fellowship_is",
    "postgrad_3_fellowship_is": "postgrad_3_fellowship_is",
    "postgrad_4_fellowship_is": "postgrad_4_fellowship_is",
    "postgrad_5_fellowship_is": "postgrad_5_fellowship_is",
    "postgrad_6_fellowship_is": "postgrad_6_fellowship_is",
    "postgrad_2_teaching_appointment_is": "postgrad_2_teaching_appointment_is",
    "postgrad_3_teaching_appointment_is": "postgrad_3_teaching_appointment_is",
    "postgrad_4_teaching_appointment_is": "postgrad_4_teaching_appointment_is",
    "postgrad_5_teaching_appointment_is": "postgrad_5_teaching_appointment_is",
    "postgrad_6_teaching_appointment_is": "postgrad_6_teaching_appointment_is",
    "post_grad_2_program_completed_yes": "post_grad_2_program_completed_yes",
    "post_grad_3_program_completed_yes": "post_grad_3_program_completed_yes",
    "post_grad_4_program_completed_yes": "post_grad_4_program_completed_yes",
    "post_grad_5_program_completed_yes": "post_grad_5_program_completed_yes",
    "post_grad_6_program_completed_yes": "post_grad_6_program_completed_yes",

also these are wrong

  "post_1_is_internship": "post_1_is_internship",
    "post_1_is_residency": "post_1_is_residency",
    "post_1_is_fellowship": "post_1_is_fellowship",
    "post_1_is_teaching_position": "post_1_is_teaching_position",
    "post_2_is_internship": "post_2_is_internship",
    "post_2_is_residency": "post_2_is_residency",
    "post_2_is_fellowship": "post_2_is_fellowship",
    "post_2_is_teaching_position": "post_2_is_teaching_position",

practice_0n_physician_provider_is in tx_cred.pdf

Is practice_0n_physician_provider_is

Not sure what is should be

context

   "practice_0_limitations_age_top": "practice_0_limitations_age_top",
    "practice_0_limitations_other_is": "practice_0_limitations_other_is",
    "practice_0_limitations_other_explanation": "practice_0_limitations_other_explanation",
    "practice_0n_physician_provider_is": "practice_0n_physician_provider_is",
    "practice_0_non_physician_provider_name_0": "practice_0_non_physician_provider_name_0",
    "practice_0_non_physician_provider_name_1": "practice_0_non_physician_provider_name_1",
    "practice_0_non_physician_provider_name_3": "practice_0_non_physician_provider_name_3",

remove address as a prefix...

Rick,
sometimes the natural section name is also a data type that I would like to be able to do things problematically with. What do you think about changing some of these to better match a section regex process?

These would need to change:
"address_correspondence_line1": "address_correspondence_line1",
"address_correspondence_city": "address_correspondence_city",
"address_correspondence_postal": "address_correspondence_postal",
"address_correspondence_state": "address_correspondence_state",
"address_correspondence_county": "address_correspondence_county",
"address_correspondence_email": "address_correspondence_email",
"address_correspondence_phone": "address_correspondence_phone",
"address_correspondence_fax": "address_correspondence_fax",
"address_home_country": "address_home_country",
"address_home_line1": "address_home_line1",
"address_home_city": "address_home_city",
"address_home_state": "address_home_state",
"address_home_postal": "address_home_postal",
to the following:

    "correspondence_address_line1":
    "correspondence_address_city":
    "correspondence_address_postal":
    "correspondence_address_state":
    "correspondence_address_county":
    "correspondence_email":
    "correspondence_phone": 
    "correspondence_fax":
    "home_address_country":
    "home_address_line1": 
    "home_address_city": 
    "home_address_state":
    "home_address_postal": 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.