Coder Social home page Coder Social logo

kmisztal / opentaxforms Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jsaponara/opentaxforms

0.0 1.0 0.0 1.25 MB

open-sourcing US tax forms

Home Page: https://opentaxforms.org/

License: GNU Affero General Public License v3.0

Python 69.60% Shell 1.21% CSS 0.22% JavaScript 7.12% HTML 21.85%

opentaxforms's Introduction

OpenTaxForms opens and automates US tax forms--it reads PDF tax forms (currently from IRS.gov only, not state forms), converts them to more feature-full HTML5, and offers a database and API for developers to create their own tax applications. The converted forms will be available to test (and ultimately use) at OpenTaxForms.org.

  • PyPI

    PyPI version

  • License

    GNU AGPLv3

  • Install

    pip install opentaxforms

  • External dependencies

    pdf2svg

  • Github

    • code
    • [issue tracker link forthcoming]
    • [milestones link forthcoming]
  • Build status

    Build Status

  • Form status

    The script reports a status for each form. Current status categories are:

    • layout means textboxes and checkboxes--they should not overlap.
    • refs are references to other forms--they should all be recognized (ie, in the list of all forms).
    • math is the computed fields and their dependencies--each computed field should have at least one dependency, or else what is it computed from?

    Each status error has a corresponding warning in the log file, so they're easy to find. Each bugfix will likely reduce errors across many forms.

    1040 form status listing

  • API

    The ReSTful API is read-only and provides a complete accounting of form fields: data type, size and position on page, and role in field groupings like dollars-and-cents fields, fields on the same line, fields in the same table, fields on the same page, and fields involved in the same formula. The API will also provide status information and tester feedback for each form.

    [API docs forthcoming, for now see examples in test/run_apiclient.sh]

  • How it works

    Most of the IRS tax forms embed all the fillable field information in the XML Forms Architecture (XFA) format. The OpenTaxForms python script extracts the XFA from each PDF form, and parses out:

    • relationships among fields (such as dollar and cent fields; fields on the same line; columns and rows of a table).
    • math formulas, including which fields are computed vs user-entered (such as "Subtract line 37 from line 35. If line 37 is greaterthan line 35, enter -0-").
    • references to other forms

    All this information is stored in a database (optionally PostgreSQL or the default sqlite) and served according to a ReSTful API. For each tax form page, an html form (with javascript to express the formulas) is generated and overlaid on an svg rendering of the original PDF. The javascript saves all user inputs to local/web storage in the browser via basil.js. When the page is loaded, those values are retrieved. Values are keyed by tax year, form number (eg 1040), and XFA field id (and soon taxpayer name now that I do my kids' taxes too). Testers will annotate the page image with boxes and comments via annotorious.js. A few of the 900+ IRS forms don't have embedded XFA (such as 2016 Form 1040 Schedule A). Eventually those forms may be updated to contain XFA, but until then, the best automated approach is probably OCR (optical character recognition). OCR may be a less fool-proof approach in general, especially for state (NJ, NY, etc) forms, which generally are not XFA-based.

  • To do

    • Move lower-level ToDo items to github/issues.
    • Refactor toward a less script-ish architecture that will scale to more developers. [architecturePlease]
    • Switch to a pdf-to-svg converter that preserves text (rather than converting text to paths), perhaps pdfjs, so that testers can easily copy and paste text from forms. [copyableText]
    • Should extractFillableFields.py be a separate project called xfadump? This might provide a cleaner target output interface for an OCR effort. [xfadump]
    • Replace allpdfnames.txt with a more detailed form dictionary via a preprocess step. [formDictionary]
    • Offer entire-form html interface (currently presenting each page separately). [formAsSingleHtmlPage]
    • Incorporate instructions and publications, especially extracting the worksheets from instructions. [worksheets]
    • Add the ability to process US state forms. [stateForms]
    • Fix countless bugs, especially in forms that contain tables (see [issues])
    • Don't seek in a separate file a schedule that occurs within a form. [refsToEmbeddedSchedules]
    • Separate dirName command line option into pdfInputDir,htmlOutputDir. [splitIoDirs]
  • Other tax- and PDF-related projects

opentaxforms's People

Contributors

jsaponara avatar perimosocordiae avatar polera avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.