sohamohajeri Goto Github PK
Name: Soha Mohajeri
Type: User
Company: YYYY
Bio: Data Scientist
Location: California
Name: Soha Mohajeri
Type: User
Company: YYYY
Bio: Data Scientist
Location: California
FakenewsNet is a repository for an ongoing data collection project for fake news research at ASU. The repository consists of comprehensive dataset of Buzzfeed news and politifact which contains two separate datasets of real and fake news. The FakenewsNet consists of multi-dimension information that not only provides signals for detecting fake news but can also be used for researches such as understanding fake news propagation and fake news intervention. However, the repository is very wide and multi-dimensional, In this project, we perform a detailed analysis on Buzzfeed news dataset. The Buzzfeed news dataset comprises a complete sample of news published in Facebook from 9 news agencies over a week close to the 2016 U.S. election from September 19 to 23 and September 26 and 27. Every post and the linked article were fact-checked claim-by-claim by 5 BuzzFeed journalists. There are two datsets of Buzzfeed news one dataset of fake news and another dataset of real news in the form of csv files, each have 91 observations and 12 features/variables. The Buzzfeed news dataset consists of two datasets which has the following main features: id: the id assigned to the news article webpage Real if the article is real or fake if reported fake. title : It refers to the headline that aims to catch the attention of readers and relates well to the major of the news topic. text : Text refers to the body of the article, it elaborates the details of news story. Usually there is a major claim which shaped the angle of the publisher and is specifically highlighted and elaborated upon. source: It indicates the author or publisher of the news article. images: It is an important part of body content of news article, which provides visual cues to frame the story. movies: It is also an important part of news article, a link to video or a movie clip included in a article, also provides visual cues to frame the story. In this analysis, we do not consider features like url, top_img, authors, publish_date, canonical link and metedata because these usually provide redundant information which we can be obtained from other main variables and do not add more value to our analysis. The two main features we care about are the source of the fake news and the language used in the fake news. In particular, we are interested in finding sources which published fake news and finding words that are more associated with one category than other. The main purpose of this analysis is to develop methods to analyze fake news versus real news. This project is divided into two parts: (1) Exploratory Data Analysis (2) Classification. The goal of the second part is to build a classifer that can predict and detect fake news. We use three different classifiers to classify documents into real/fake news categories.
MATLAB function to calculate Dynamic Range of 10-bit YUV video sequences
A simple books website. 一个简单的在线版个人书库。
Introduction COVID-19 Analysis: The dataset used in this notebook (Covid-19_dataset.csv) is same as the COVID19_line_list_data.csv dataset taken from https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset, but the only difference is that in our dataset death and recovered features are encoded as (0 or 1) and not in form of dates as in the later dataset. There are three parts to my report as follows: -Cleaning -Visualization -Prediction. The first purpose of choosing this work is to find out which factors are more important in the death and recovery of patients. The second purpose is implementing several machine learning algorithms to predict the death and recovery of patients and compare the result to discover which algorithm works better for this specific dataset.
Uniswap v3 Data Science Models
Basic template for using Flask on Heroku
Towards Building an Intelligent Anti-Malware System: A Deep Learning Approach using Support Vector Machine for Malware Classification
The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. Here, we use the Demographics dataset and reduce its dimensionality by Principal Component Analysis (PCA). Afterwards, we find the main clusters by KMeans Clustering.
# Predicting Loan Repayment The dataset for this project is retrieved from kaggle, the home of Data Science. The major aim of this project is to predict whether the customers will have their loan paid or not. Therefore, this is a supervised classification problem to be trained.
The dataset can be downloaded from this link: https://www.kaggle.com/saurav9786/amazon-product-reviews?select=ratings_Electronics+%281%29.csv
Image processing in Python
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.