Coder Social home page Coder Social logo

lihuibng / pipeline-oer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from unstructured-io/pipeline-oer

0.0 1.0 0.0 3.68 MB

Pipeline for extraction information from Army OERs

License: Apache License 2.0

Shell 0.13% Python 1.17% Makefile 0.17% Jupyter Notebook 98.53%

pipeline-oer's Introduction

Pre-Processing Pipeline for OERs

This repo implements a document pre-processing pipeline for Army Officer Evaluation Reports (OERs). The pipeline assumes the OERs are in PDF format and include both pages. The API is hosted at https://api.unstructured.io.

Developer Quick Start

  • Using pyenv to manage virtualenv's is recommended

    • Mac install instructions. See here for more detailed instructions.

      • brew install pyenv-virtualenv
      • pyenv install 3.8.15
    • Linux instructions are available here.

    • Create a virtualenv to work in and activate it, e.g. for one named oer:

      pyenv virtualenv 3.8.15 oer
      pyenv activate oer

  • Run make install

  • Start a local jupyter notebook server with make run-jupyter
    OR
    just start the fast-API locally with make run-web-app

Quick Tour

You can run this Colab notebook to see how pipeline-raters.ipynb extracts the the elements of OER files and defines an API.

Extracting Structured Text from an OER PDF document

After API starts, you can extract the elements of OER files with the command:

curl -X 'POST' \
  'http://localhost:8000/oer/v0.0.1/raters' \
  -F 'files=@<your_oer_pdf_file>' \
  | jq -C . | less -R

Using the example fake OER in the sample-docs folder, you can run:

curl -X 'POST' \
  'http://localhost:8000/oer/v0.0.1/raters' \
    -F '[email protected]' | jq -C | less -R

and get the following JSON as the output:

{
  "duty_description": "Personnel and Administration Officer (S1) for a training battalion in the U.S. Army reserve. Principal staff assistant to the battalion commander. Exercise staff supervisor in matters pertaining to strength management, personnel qualifications and evaluations, personnel assignment, clearance, recruiting, retention, and battalion administration. Responsible for the overall supervision of the battalion Personnel Administration Center (PAC) and its activities. Serves as commander of Headquarters and Headquarters Detachment. Additional duties include; Battalion Safety Officer, Equal Opportunity Officer, Records Management Officer, and Retention Officer.",
  "rater": {
    "comments": "1LT X performed flawlessly in the execution of an overseas detention and area security mission at Guantanamo Bay, Cuba. Exceptional performance during this limited rating period by CPT X.",
    "sections": {
      "achieves": "Developed AAR reporting template that standardized information across the battalion and ensured compliance with Army Regulations. She consistently presented appropriate and useful monthly reports on security clearances, weather effects, and threat assessments.",
      "develops": "Absolute professional and squared away for duty; current on all applicable skills, knowledge, and mental toughness by engaging in engages in continual self-development. Using his extensive experience, 1LT X works well after normal duty hours, provides coaching, and counseling and mentoring.",
      "leads": "1LT X demonstrates the full range of required influence techniques enabling him to speak, lead and motivate every person in his unit. 1LT X works with the Alameda County Sheriff’s office, as well as other outside agencies, in order to build positive relationships established that have enhanced unit training.",
      "intellect": "1LT X is able to analyze a situation and introduce new ideas when opportunities exist, approaching challenging circumstances with creativity and intellect. 1LT X is highly proficient in interacting with others, effectively adjusting behaviors when interacting with superiors, peers, and subordinates.",
      "presence": "1LT X maintains an excellent fitness level and sets the standard for his Soldiers, with a score of 275 on his last APFT. 1LT X models the composure, outward calm, and control over his emotions that you want to see in a leader during adverse conditions.",
      "character": "1LT X’s exceptional command presence and resilience lends itself to consistent mission accomplishment, good order and discipline, and a positive climate. 1LT X’s outstanding attitude and thirst for knowledge exceeds those around him which contributes to his overall exceptional character."
    },
    "referred": "No",
    "performance": "PROFICIENT",
    "name": "RAYMOND, BRIAN",
    "position": "EXECUTIVE OFFICER"
  },
  "senior_rater": {
    "comments": "I currently senior rate Army Officers in this grade. 1LT X is #4 of the 44 Lieutenants I senior rated. 1LT X is an intelligent and creative Officer with the potential to progress in rank as a leader. 1LT X is ready for positions of increased responsibilities; he will excel as a Staff Officer followed by Company Command if given the opportunity. Select for Military Police Captains Career Course and promote to captain when eligible.",
    "next_assignment": [
      "Battalion FDO",
      " Battalion AS3",
      " Battalion S4"
    ],
    "potential": "HIGHLY QUALIFIED",
    "name": "BERTL, ALAN",
    "position": "BATTALION COMMANDER"
  },
  "intermediate_rater": {
    "comments": "1LT X is #2 of the 20 Lieutenants I intermediate rated. He is an asset for the future and will progress further in his military career. Keep assigning him to demanding position and select him for the Military Police Captains Career Course now. Promote ahead of peers to Captain and select him for the next Company Command.",
    "name": "WOLFE, CRAG",
    "position": ""
  },
  "rated_name": "ROBINSON, MATTHEW W",
  "rated_position": ""
}

You can also run the extraction code with Python directly using the following commands from the pipeline-oer directory:

from prepline_oer.api.raters import pipeline_api

filename = "sample-docs/fake-oer.pdf"

with open(filename, "rb") as f:
    pipeline_api(file=f, filename=filename)

Generating Python files from the pipeline notebooks

You can generate the FastAPI APIs from your pipeline notebooks by running make generate-api.

Security Policy

See our security policy for information on how to report security vulnerabilities.

Learn more

Section Description
Company Website Unstructured.io product and company info
Fillable OER Form Blank OER from Army pubs that you can fill in.
OER Narrative Guide Example OER narratives to use for training data.

pipeline-oer's People

Contributors

dependabot[bot] avatar mthwrobinson avatar qued avatar natygyoon avatar cragwolfe avatar laverdes avatar yuming-long avatar ryannikolaidis avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.