Coder Social home page Coder Social logo

pdf-to-excel's Introduction

PDF to Excel Converter in Python ๐Ÿ

Open in GitHub Codespaces

This Python script uses the tabula-py and pandas libraries to convert a PDF file into an Excel file. Each table in the PDF file is written to a separate sheet in the Excel file.

Running with GitHub Codespaces ๐Ÿš€

This repository is configured to use GitHub Codespaces, which provides a complete, configurable development environment in the cloud. Here's how to use it:

  1. Click the Open in Codespaces button at the top of the repository, then click the green, Create Codespace buttn. This will open the repository in a new Codespace.

  2. Wait for the Codespace to be created. GitHub will create a new Codespace for this repository and set it up according to the devcontainer.json file. This includes pulling the specified Docker image, running the postCreateCommand to install tabula-py and pandas, and installing the specified VS Code extensions. This process might take a few minutes.

  3. Add your PDF file. Once the Codespace is ready, add your PDF file to the repository. You can do this by dragging and dropping the file into the file explorer on the left side of the screen.

  4. Add your empty Excel file. Add an empty Excel file to the repository. You can do this by right-clicking on the file explorer and selecting New File. Name the file with the .xlsx extension.

  5. Run the Python script. Once the Codespace is ready, run the following command in the terminal:

python pdf_to_excel.py

Usage ๐Ÿ’ป

The script defines a function pdf_to_excel(pdf_file_path, excel_file_path), which reads a PDF file and writes its tables to an Excel file.

Here's how you can use this function:

pdf_to_excel('path_to_pdf_file.pdf', 'path_to_excel_file.xlsx')

Replace path_to_pdf_file.pdfwith the path to the PDF file you want to convert, and replace `path_to_excel_file.xlsx`` with the path where you want to save the Excel file.

Dependencies ๐Ÿ“ฆ

  • tabula-py: A simple wrapper for Tabula, which can read tables in a PDF.
  • pandas: A powerful data manipulation library.

You can install these dependencies with pip:

pip3 install tabula-py pandas

How It Works ๐Ÿ”ง

The tabula.read_pdf function reads the PDF file and returns a list of tables. Each table is a pandas DataFrame.

The pd.ExcelWriter context manager is used to write to the Excel file.

Inside the context manager, a for loop iterates over the list of tables. Each table is written to a separate sheet in the Excel file with the DataFrame.to_excel method provided by the pandas library.

pdf-to-excel's People

Contributors

ladykerr avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.