Coder Social home page Coder Social logo

essay-datasets's Introduction

Thought Leader Essay Collection

This repository contains the code and resources to generate two essay datasets:

  1. Paul Graham Essay Collection
  2. Sam Altman Essay Collection

These datasets are designed for various natural language processing tasks such as question-answering, summarization, text generation, and text-to-text generation. The essays in these datasets provide valuable insights and perspectives on a range of topics related to startups, technology, artificial intelligence, leadership, and personal growth.

Dataset Descriptions

Paul Graham Essay Collection

The Paul Graham Essay Collection dataset contains a complete collection of essays written by Paul Graham, a renowned programmer, venture capitalist, and essayist. The essays cover a wide range of topics including startups, programming, technology, entrepreneurship, and personal growth. Each essay has been cleaned and processed to extract the title, date of publication, and the full text content.

HuggingFace Dataset: https://huggingface.co/datasets/sgoel9/paul_graham_essays

Sam Altman Essay Collection

The Sam Altman Essay Collection dataset contains a complete collection of essays written by Sam Altman, an entrepreneur, investor, and former president of Y Combinator. The essays cover a wide range of topics including startups, technology, artificial intelligence, leadership, and personal growth. Each essay has been cleaned and processed to extract the title, date of publication, and the full text content.

HuggingFace Dataset: https://huggingface.co/datasets/sgoel9/sam_altman_essays

Repository Structure

The repository is organized as follows:

  • paul_graham_essays/: All data and scripts to preprocess and generate the Paul Graham Essay Collection dataset.

    • /dataset: The generated Paul Graham Essay Collection dataset.
    • /text_data: Raw essay text from Paul Graham's blog.
    • paul_graham_essays_card.md: Dataset card for the Paul Graham Essay Collection.
  • sam_altman_essays/: All data and scripts to preprocess and generate the Sam Altman Essay Collection dataset.

    • /dataset: The generated Sam Altman Essay Collection dataset.
    • /sam_altman_essays_card.md: Dataset card for the Sam Altman Essay Collection.
  • requirements.txt: The required Python dependencies for running the scripts.

Usage

  1. Clone the repository:

    git clone https://github.com/your-username/essay-datasets.git
    
  2. Install the required Python dependencies:

    pip install -r requirements.txt
    
  3. Run the preprocessing scripts to generate the essay datasets in jupyter

    The generated datasets will be saved in the datasets/ directories.

  4. Use the generated datasets for your natural language processing tasks.

License and Attribution

The datasets generated by this code are released under the MIT License. When using these datasets, please attribute them to Paul Graham and Sam Altman, and provide links to their respective websites:

Contact Information

For any questions or inquiries about this repository or the generated datasets, please contact [email protected].

essay-datasets's People

Contributors

sgoel97 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.