Coder Social home page Coder Social logo

aws-samples / aws-ai-intelligent-document-processing Goto Github PK

View Code? Open in Web Editor NEW
119.0 9.0 66.0 59.82 MB

Intelligent Document Processing with AWS AI Services and generative AI

Jupyter Notebook 96.88% HTML 0.39% Python 2.62% Dockerfile 0.09% Shell 0.01%

aws-ai-intelligent-document-processing's Introduction

Intelligent Document Processing with AWS AI Services

Latest Version License: MIT

This repository is part of Intelligent Document Processing with AWS AI Services workshop.

Documents contain valuable information and come in various shapes and forms. In most cases, you are manually processing these documents which is time consuming, prone to error, and expensive. Not only do you want this information extracted quickly but you also want to automate business processes that presently rely on manual inputs and intervention across various file types and formats.

To help you overcome these challenges, AWS Machine Learning (ML) now provides you choices when it comes to extracting information from complex content in any document format such as insurance claims, mortgages, healthcare claims, contracts, and legal contracts.

Different phases of Intelligent Document Processing pipeline

cfn1

In this workshop, we will deep-dive into each of these phases of the IDP Pipeline with solutions to automate each step. We have hands-on labs to familiarize yourself with AWS AI services ( Amazon Textract, Amazon Comprehend) to build your solution

Getting Started

In order to be able to execute all the Jupyter Notebooks in this sample, we will first need to create a SageMaker Studio domain. The CloudFormation template to create the SageMaker Studio domain and all the related resources, such as IAM Roles, S3 Bucket etc. is included under the /dist directory. Follow the steps below to create the CloudFormation stack using the idp-deploy.yaml file.

โš ๏ธ Your AWS account must have a default VPC for this CloudFormation template to work. Your AWS account may incur some nominal charges for SageMaker Studio domain, Amazon Textract, and Amazon Comprehend. However, Amazon Textract, Comprehend, and SageMaker are free to try as part of AWS Free Tier.

  • Navigate to AWS Console
  • Search for CloudFormation in the "Services" search bar
  • Once in the CloudFormation console, click on the "Create Stack" button (use the "With new resources option")
  • In the "Create Stack" wizard, chose "Template is ready", then select "Upload a template file"

cfn1

  • Upload the provided yaml file, click "Next"
  • In the "Specify stack details" screen, enter "Stack name". Click "Next"

cfn2

  • In the "Configure Stack options" screen, leave the configurations as-is. Click "Next"
  • In the "Review" screen, scroll down to the bottom of the page to the "Capabilities" section and acknowledge the notice that the stack is going to create required IAM Roles by checking the check box. Click "Create stack".

cfn3

The stack creation can take upto 30 minutes. Once your SageMaker domain is created, you can navigate to the SageMaker console and click on "Amazon SageMaker Studio" on the left pane of the screen. Choose the default user created "SageMakerUser" and Click on "Launch Studio". This will open the SageMaker Studio IDE in a new browser tab. NOTE: If this is your first time using SageMaker Studio then it may take some time for the IDE to fully launch.

cfn4

Setup SageMaker Studio

Once the SageMaker Studio IDE has fully loaded in your browser, you can clone this repository into the SageMaker Domain instance and start working on the provided Jupyter Notebooks. To clone this repository-

  • On the SageMaker Studio IDE, click on "File menu > New > Terminal". This will open a terminal window within SageMaker Studio.

sm1

  • By default, the terminal launches at the root of the SageMaker Studio IDE workspace.
  • Next, clone this repository using
git clone https://github.com/aws-samples/aws-ai-intelligent-document-processing idp_workshop
  • Once the repository is cloned, a direcotry named idp_workshop will appear in the "File Browser" on the left panel of SageMaker Studio IDE
  • You can now access the Jupyter Notebooks inside the directory and start working on them.

You're all set to begin the workshop!

License

This library is licensed under the MIT-0 License. See the LICENSE file.

aws-ai-intelligent-document-processing's People

Contributors

amazon-auto avatar anjanvb avatar arlindnocaj avatar chinrane avatar christian-kam avatar davidgirling avatar hi4rahul avatar pymia avatar rppth avatar sherryxding avatar tagekezo avatar timcond avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-ai-intelligent-document-processing's Issues

Cannot Create User Pool for Notebook 04 & 04.01

Launched via Workshop Studio. When creating Private Labelling team:

400 AccessDeniedException: User: arn:aws:sts::xxxxxxxxxx:assumed-role/WSParticipantRole/Participant is not authorized to perform: cognito-idp:CreateUserPool on resource: * because no identity-based policy allows the cognito-idp:CreateUserPool action Request ID: 

CloudFormation error

Despite following the instructions closely, I'm encountering the following error on the first CloudFormation step.
Template error: Unable to get mapping for RegionMap::me-central-1::datascience
2024-02-29_21-44-11

AWS::SageMaker::Domain with identifier already exist

HI. During the creation of a new stack I got the message "AWS::SageMaker::Domain with identifier already exist" and the status is automatically set to "rollback". What do I need to change in yaml file to have a new stack created? Thanks

02-idp-document-extraction-01.ipynb

Document

documentName = "simple-document-image.jpg"
display(Image(filename=documentName))

ERROR:

FileNotFoundError Traceback (most recent call last)
in
----> 1 display(Image(filename=documentName))

/opt/conda/lib/python3.7/site-packages/IPython/core/display.py in init(self, data, url, filename, format, embed, width, height, retina, unconfined, metadata)
1230 self.unconfined = unconfined
1231 super(Image, self).init(data=data, url=url, filename=filename,
-> 1232 metadata=metadata)
1233
1234 if self.width is None and self.metadata.get('width', {}):

/opt/conda/lib/python3.7/site-packages/IPython/core/display.py in init(self, data, url, filename, metadata)
635 self.metadata = {}
636
--> 637 self.reload()
638 self._check_data()
639

/opt/conda/lib/python3.7/site-packages/IPython/core/display.py in reload(self)
1261 """Reload the raw data from file or URL."""
1262 if self.embed:
-> 1263 super(Image,self).reload()
1264 if self.retina:
1265 self._retina_shape()

/opt/conda/lib/python3.7/site-packages/IPython/core/display.py in reload(self)
660 """Reload the raw data from file or URL."""
661 if self.filename is not None:
--> 662 with open(self.filename, self._read_flags) as f:
663 self.data = f.read()
664 elif self.url is not None:

FileNotFoundError: [Errno 2] No such file or directory: 'simple-document-image.jpg'

Expected AWS Platform Spend for deployment

Hi,

I'd like to recommend this as a Proof-of-Concept deployment to enable discovery of this service, but to do this I need an estimate of the estimated AWS platform spend for this. Could this be displayed on the README?

Thank you,
@herbtama

02-idp-document-extraction-02.ipynb does not exist

The last step in module2 - doc extraction
cannot performed since the file 02-idp-document-extraction-02.ipynb does not exist.

Fix:
Add and execute the following lines in 02-idp-document-extraction.ipynb to receive the notebook 02-idp-document-extraction-02.ipynb

!wget 'https://github.com/aws-samples/amazon-textract-code-samples/raw/master/python/queries/insurance-card.ipynb' -O './02-idp-document-extraction-02.ipynb'
!wget 'https://github.com/aws-samples/amazon-textract-code-samples/raw/master/python/queries/insurance-card.png' -O './insurance-card.png'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.