Coder Social home page Coder Social logo

valrcs / tdm-notebooks Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ithaka/constellate-notebooks

0.0 0.0 0.0 8.58 MB

Example notebooks and tutorials for JSTOR's text analysis project, Constellate.

Home Page: https://constellate.org/tutorials/

Jupyter Notebook 99.28% Shell 0.05% Python 0.07% JavaScript 0.61%

tdm-notebooks's Introduction

Constellate Collaborative Notebooks and Lessons

Constellate is a platform to learn and perform text analysis, build datasets, and share analytics course materials. Open the black box of text analysis with Constellate, from JSTOR and Portico.

This repository is a collection of Jupyter Notebooks that may be used by individuals for learning or teaching text analytics. You may access them in our textbook, Teaching Text Analysis with Constellate, and also interact with a number of them from your Constellate datasets. All these Notebooks will run in the Constellate environment.

Read more about Constellate below, after the nuts and bolts of cloning these notebooks.

Local Installation

You can also clone this repository and run the examples in a local Jupyter Notebook environment. Please keep in mind that this project is in a beta phase and both the notebooks and client code may change frequently. If you have created your own notebooks or used the client code in some way, we would like to hear about it. Please let us know either here via Github issues or by emailing [email protected].

To install the client run the following.

  • python -m pip install -r requirements.txt

Some of the examples use nltk model files. Download these with:

  • python -m nltk.downloader stopwords wordnet

About Constellate

Problem:

Text analytics, or the process of deriving new information from pattern and trend analysis of the written word, has the potential to revolutionize research across disciplines. Sadly, there is a massive hurdle facing those eager to unleash its power. The coding skills and statistical knowledge that text mining requires can take years to develop. All too often, researchers learn about the promise of text mining, only to have it revealed that the promise can be realized solely by the select few with the necessary technical skills. Ted Underwood, Professor of English at the University of Illinois, likens this scenario to researchers being presented with a “deceptively gentle welcome mat, followed by a trapdoor."

Solution:

ITHAKA has addressed this problem by building Constellate, a text analytics platform aimed at teaching and enabling a generation of researchers to text mine. Two of ITHAKA’s services, JSTOR and Portico, are the initial sources of content for the new platform, which now includes Chronicling America, collections from Documenting the American South, the South Asia Open Archives and Independent Voices from Reveal Digital.

Constellate provides value to users in three core areas -- they can teach and learn text analytics, build datasets from across multiple content sources, and visualize and analyze their datasets:

Learn & Teach

  • Template and Tutorial Code: Work with template Jupyter Notebooks to analyze your dataset and learn about text analytics (with additional environments forthcoming, such as R Studio).
  • Lessons and Documentation: Lessons and educational materials created by a community of experts, including those from the NEH-funded Text Analysis Pedagogy Institutes.
  • Collaborative Teaching Materials Creation: Users may create, edit, reuse and collaborate in the creation of tutorials, code, documentation, and other educational resources for text analysis.

Build

  • Multiple Collections: Anchor collections from JSTOR and Portico, with additional content sources continually added (such as Library of Congress’ Chronicling America). Further details about the collections are available.
  • Data Download in JSON
    • All content - bibliographic metadata, unigrams, bigrams, trigrams
    • Open content - bibliographic metadata, full-text, unigrams, bigrams, trigrams
  • Dataset Dashboard: Easily view datasets you have built or accessed.

Analyze

  • Analytics Lab: Integrated computational environment powered by BinderHub that allows users to seamlessly analyze text content using provided template Jupyter Notebooks and tutorials.
  • Visualize: Built-in visualizations for your datasets.
  • Work with Rights Restricted Full Text: We are investigating the best way to meet this need -- please contact us at [email protected] if you need rights restricted full-text or just want to talk about your research.

Interested in Participating?

Reach out to us to participate in our beta program and get access to larger datasets and text analytics classes.


Created by Nathan Kelber and Ted Lawless for JSTOR Labs under Creative Commons CC BY License

For questions/comments/improvements, email [email protected].


CC BY License Logo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.