Coder Social home page Coder Social logo

Soha Mohajeri's Projects

buzzfeed-news-analysis-and-classification-by-natural-language-processing icon buzzfeed-news-analysis-and-classification-by-natural-language-processing

FakenewsNet is a repository for an ongoing data collection project for fake news research at ASU. The repository consists of comprehensive dataset of Buzzfeed news and politifact which contains two separate datasets of real and fake news. The FakenewsNet consists of multi-dimension information that not only provides signals for detecting fake news but can also be used for researches such as understanding fake news propagation and fake news intervention. However, the repository is very wide and multi-dimensional, In this project, we perform a detailed analysis on Buzzfeed news dataset. The Buzzfeed news dataset comprises a complete sample of news published in Facebook from 9 news agencies over a week close to the 2016 U.S. election from September 19 to 23 and September 26 and 27. Every post and the linked article were fact-checked claim-by-claim by 5 BuzzFeed journalists. There are two datsets of Buzzfeed news one dataset of fake news and another dataset of real news in the form of csv files, each have 91 observations and 12 features/variables. The Buzzfeed news dataset consists of two datasets which has the following main features: id: the id assigned to the news article webpage Real if the article is real or fake if reported fake. title : It refers to the headline that aims to catch the attention of readers and relates well to the major of the news topic. text : Text refers to the body of the article, it elaborates the details of news story. Usually there is a major claim which shaped the angle of the publisher and is specifically highlighted and elaborated upon. source: It indicates the author or publisher of the news article. images: It is an important part of body content of news article, which provides visual cues to frame the story. movies: It is also an important part of news article, a link to video or a movie clip included in a article, also provides visual cues to frame the story. In this analysis, we do not consider features like url, top_img, authors, publish_date, canonical link and metedata because these usually provide redundant information which we can be obtained from other main variables and do not add more value to our analysis. The two main features we care about are the source of the fake news and the language used in the fake news. In particular, we are interested in finding sources which published fake news and finding words that are more associated with one category than other. The main purpose of this analysis is to develop methods to analyze fake news versus real news. This project is divided into two parts: (1) Exploratory Data Analysis (2) Classification. The goal of the second part is to build a classifer that can predict and detect fake news. We use three different classifiers to classify documents into real/fake news categories.

covid-19-analysis-visualization-and-forecasting icon covid-19-analysis-visualization-and-forecasting

Introduction COVID-19 Analysis: The dataset used in this notebook (Covid-19_dataset.csv) is same as the COVID19_line_list_data.csv dataset taken from https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset, but the only difference is that in our dataset death and recovered features are encoded as (0 or 1) and not in form of dates as in the later dataset. There are three parts to my report as follows: -Cleaning -Visualization -Prediction. The first purpose of choosing this work is to find out which factors are more important in the death and recovery of patients. The second purpose is implementing several machine learning algorithms to predict the death and recovery of patients and compare the result to discover which algorithm works better for this specific dataset.

malware-classification icon malware-classification

Towards Building an Intelligent Anti-Malware System: A Deep Learning Approach using Support Vector Machine for Malware Classification

national-health-dataset-dimensionality-reduction-and-clustering icon national-health-dataset-dimensionality-reduction-and-clustering

The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. Here, we use the Demographics dataset and reduce its dimensionality by Principal Component Analysis (PCA). Afterwards, we find the main clusters by KMeans Clustering.

predicting-loan-eligibility-by-machine-learning-algorithms icon predicting-loan-eligibility-by-machine-learning-algorithms

# Predicting Loan Repayment The dataset for this project is retrieved from kaggle, the home of Data Science. The major aim of this project is to predict whether the customers will have their loan paid or not. Therefore, this is a supervised classification problem to be trained.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.