Coder Social home page Coder Social logo

dighum-101-summer-2020's Introduction

DigHum-101-Summer-2020

This is a repository for DigHum 101, a course that I took in Summer 2020. This course equips students with Data Science tools and knowledge to solve humanities centered problems in a data-driven manner, and also forces them to think about what implications their data-driven solutions have on society.

Brain

Abstracts

Exploring the biases and assumptions influencing big data in the Digital Humanities : Group Project

The whole purpose of this group project was to explore the various biases and assumptions in the realm of data. As the world grows technologically, socially and economically every year, there is a plethora of data that is being spewed out every moment. And it is up to us humans to make sense of that data through various tools and techniques and decipher what story it is trying to tell us, or what problems it is allowing us to pinpoint and solve. However, we are unfortunately prone to confirmation bias, which is the tendency to process and analyze information in such a way that It supports one’s pre-existing ideas and convictions. And unfortunately this phenomena plays out very often in our conclusions and solutions based off the data we analyze! So my group and I wanted to explore the issue by making an effort to answer the research questions “To what extent are human bias and assumptions observed within the field of big data and data analytics?”, “What are the short term and long term consequences of algorithmic bias and data misrepresentation?” and “In what ways can researchers prevent the incorporation of assumptions or bias in order to make reliable conclusions from data?”. We used readings like Sculley’s “Meaning and Mining: the Impact of Implicit Assumptions in Data Mining for the Humanities”, Owen’s, “Defining Data for Humanists: Text, Artifact, Information, or Evidence” and Boyd’s , “Critical Questions for Big Data” as reference and went on to connect our conclusions to Nan Z. Da’s, “The Digital Humanities Debacle”. Through our research , we explore how to employ a critical lens while analyzing data, how to extensively explore the data we’re working with, how to consider the context of our data and how it is important to go with the notion that just because our results are interpretable doesn’t mean that it is accurate.

Overall, doing this project in a group and discussing about the problem with my teammates helped me gain clarity on how important it is to approach and accept data analysis results with a skeptical mindset and not let personal judgement or bias taint our paradigm while coming to conclusions about the data.

ML techniques for classifying Phishing websites : Individual Project

The premise of this individual project is to explore efficient machine learning driven algorithms in order to detect phishing websites. I thought it was crucial for me to explore this problem, because the COVID-19 era has turned all of us to use the internet more than ever. And unfortunately, as a result, bad actors on the internet are resorting to scam people of their personal information through using phishing websites that look legitimate! Thus, it is up to cybersecurity professionals to employ sophisticated tools and methods to protect their confidentiality and their perceived integrity of the internet! So, in this project I test the efficiency of using individual machine learning algorithms like logistic regression, decision trees and random forest against the “super learner model” which is an ensemble machine learning algorithm that combines all the models and model configurations that you might investigate for a predictive modeling problem. I also decide to use the ROC-AUC metric as a measure of accuracy as opposed to the accuracy percentage itself. Through these techniques I hope to explore methods of finding a convenient ML based solution to detecting phishing websites and hence contribute to the bigger pool of ML research in the domain of phishing.

dighum-101-summer-2020's People

Contributors

akshatha1017 avatar

Stargazers

Evan Muzzall avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.