Coder Social home page Coder Social logo

class-project's Introduction

Faculty Research in Animal Science

Overview
This is a project for my Data & Web Technologies for Data Analysis class. Skills learned include data wrangling using pandas, plotting using seaborn, word clouds, natural language processing using scikitlearn and TfidfVectorizer, knn regression, knn classification, and kmeans clustering.

I used the scholarly package to get publication titles, years, and number of citations from Google Scholar for animal science professors at 5 universities in the US. One goal was to see whether research topics vary a lot by university. For this, I made word clouds. Another goal was to make predictions based on the text of the titles.

Getting the data

  • Notebook: data_acquisition.ipynb
  • Output dataset: data.txt

In this notebook, I used the scholarly.search_author() function from scholarly package to get publication data for animal science professors at 5 universities. At the end, I also discuss the pros and cons of the data extraction method that I used.

Exploring and Cleaning the Data

  • Notebook: data_exploration.ipynb
  • Input dataset: data.txt
  • Output dataset: cleaned.txt

This notebook is my favorite! In this notebook, I found some non-English publication titles, and checked for and removed erroneous observations and profiles. I also did some summary statistics including counting the number of publications per professor and exploring the relationship between age and number of publications. I also made word clouds to see what topics are popular at each university.

Modeling

  • Notebook: statistics.ipynb
  • Input dataset: cleaned.txt

In this notebook, I wanted to see what kinds of things I could predict using the text of the titles. One thing I tried was predicting the number of citations based on the title. For this, I used knn regression and that didn't give a very effective prediction. I then used a knn classifier to predict the author based on the title. I also used a separate knn classifier to predict the school based on the title.

class-project's People

Contributors

courtneylouie avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.