siyusiyuyang / orie4740-presidential-election-prediction-project Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 616 KB

orie4740-presidential-election-prediction-project's People

Contributors

Watchers

Forkers

haotianliu

orie4740-presidential-election-prediction-project's Issues

Peer Review

3 Things That I Liked

The concept is great and has plenty of room for future improvements. I'd be interested in seeing if you can then use this as a live pulse of the sentiment of the population after a debate or some event.
This is a very technical project (in my opinion) and would be great to show future employers.
The data is openly available, visible, and tangible. It would be interesting to see if the results of the project differ from what people can assess as they look through twitter manually.

3 Areas for Improvement

These are very broad questions your team is asking. I know it is still early so it may not be clear as to which is the best to tackle but consider focusing your efforts on just one or two questions.
Will you compensate for any bias that arises from strictly sampling from twitter? For example, it is unlikely the opinions found on electronic social medial will reflect the entire voting population in the US. How will this affect your analysis
Do you have any methods in mind for processing the text? You could quantify each word to have a positive or negative score but language interpretation sounds difficult.
(I hope this helped, I didn't have much in mind in terms of improvements but I thought I'd mention some things to keep in mind)

This project's aim is to determine the outcome of the 2016 presidential election in Florida. The method you are using to do this is to track tweets that come from each county. Then you will use feature transformations of those tweets to predict the next president.

I think this is a very interesting idea for a project. I have never heard of any sort of predictions for president coming in this manner. Most of the predictions it seems come from political factors of the current state of the country. I am curious though why you decided to choose tweets from the three weeks leading up to the election. Also what will be your test set and training set when building your model? You do not have access to the tweets from the three weeks leading up to the primaries, which would have been a very good way to verify your model with known results. Do you plan on predicting the next president before the election happens in a couple weeks, or will you build your model based on the results of the election? You mentioned that Twitter allows 1% of all the public tweets to be sampled; do you know what criteria they use to determine which ones go public? Is it random or would not make certain tweets public? (say they had vulgar language).

I like the different feature transformations you are using in your model. I think your "lexicon based sentiment analysis" that was mentioned is a very interesting way of determining the sentiment of the tweet. Is there any way you can tell if a tweet is sarcastic or not? For example: "x will be such a good candidate #Not," could be a sarcastic tweet.

To avoid overfitting you mentioned that you will perform linear regression on only the first order. I think you can create a more complex model without worrying about overfitting. You mentioned that the linear model didn't produce that useful results, so you probably understand.

I see a problem with using the voting results from in the primary elections to test your model. In the primary election, there were other candidates for the public to choose from. How will your model take into consideration the fact that those candidates are not in this election? Is your model just trying to predict Democrat versus Republican, or will it choose a specific candidate from a list of candidate, in which case it can be useful in the primaries.

Here are some things you might want to consider if you haven't considered them already. How will you deal with those tweets that don't mention political candidates? People who post political tweets still have a chance to vote in the election, and how will your model decide from which candidate they choose? Do the tweets take into consideration the age of the person who has the account? If there are a bunch of political posts, but the person posting them is under 18, they won't be able to vote and will have no impact on the election. You mentioned you will be using a circle to approximate the counties. Is this a reasonable approximation? Are any of the counties very narrowly shaped? Also, the midterm report mentions that the radius of this circle is the square root of the area, but do you need to take pi (~ 3.14) into account in the formula?

Final Report Peer Review

This is absolutely a interesting and hot topic at the moment. As you said people have seen the limitation of prediction model in this election. It is very surprise to observe that according to the economic factors, the votes are all shifting from democrat to republican. And it is another brilliant idea to use tweeter data to predict the votes. However, there are certain things that I don't understand. What does sentimental score means? If it only means trump has a high public attention, the tweets can be positive or negative. So high score doesn't mean higher chance of wining? Or score means the ratio of positive comments? You did a good job doing PCA and using other classification methods, although I don't understand the points. And the conclusion is inspiring!

If you want to continue doing this project, I would suggest finding some other factors that are missing in the poll and try more than 7 days for tweeter data.

Peer review for final report

I really appreciate that you have chosen such a interesting as well as controversy topic and involved in two promising approaches for your project.

The economic indicator approach: you have combined economic data you could utilize and try to dig out its relationship with the election result. I like your whole idea however I think it could make more sense if you provide more quantifiable definition about the ‘shift’ to support that there indeed exits some linkage behind this question.

The tweet approach:

I really appreciate that you have collect your data via Tweet API and applied sentimental analysis. which is not covered in our class on it and try to get the whole picture of the election trend. My thinking here s that I don’t know how you treat the data in your model if some extreme positive words and negative words are repeated by the political funs of the same group then the input data X may be not that representative for overall people and it would also stir up inaccuracy for your outcome. Hence, I think it may make a better contribution to clean up the tweets before calculating the score to ensure that your have covered more people’s voice.

As for the regression part, I like that you have treat this problem as classification problem and tried to apply different models to it. I think it may more sense if you provide some conclusion about how different model performed on your problem on the basis of the visualization.Overall, you have performed a good job and incorporated your personal view on this issue to make your report more cohesive and complete.

Good luck with your future study~

Final Report Peer Review

Very interesting topic! And I really love the way you take tweets as your sentiment data and make predictions through sentimental analysis. And it seemed that you did devote a lot of time towards your project.
But for your result part, you choose predicting Trump all the counties as their baseline and compared their different techniques . I am kind of interested why you choose this model as your base line because it seemed to me the model actually made no sense. Because our team also worked on sentimental analysis. So I am thinking maybe you can build up their own dictionary or build up your own classifiers to make the sentimental analysis specifically designed for your problems.

Sizhang

midterm review

This project seeks to determine the candidate that Florida will vote for in the upcoming presidential election based on Twitter tweets gathered from counties all over Florida. The feature selection portion of this project is very interesting; filtering the tweets for importance and keywords seem to be a major factor in how you can rank important tweets that capture the sentiment of the public at that point in time. I also appreciated how your team went above and beyond the course material to implement a model that was not taught in class, but may make more sense than a linear model. I also liked the future considerations portion, where your team focuses on ranking words that pertain to this particular election differently and perhaps with more weight.

Some things that I would have liked to see more from this proposal is the lack of a complete description of your data. The raw data was in the form of tweets, and the number was capped at 3000. However I would be interested in knowing how you scored the different features of these data points to use for linear regression. I would also want to know how exactly biased the elections that your team is comparing to are in some way. I would also have preferred it if you described how Random Forest Variance Selection works in detail, as we did not cover this in class. Further more, I would have wanted to know what your team was thinking of delving into in terms of new models that you guys wanted to try. All in all, I believe that this was a pretty good and interesting midterm report for the project.

Final Report Review (bz284)

The topic of evaluating economic data and sentiment scores on twitter as another way to look at the election results is truly interesting! And it is great that you guys used techniques that were not covered in this class to make the project possible. Also, nice job on the PCA.
One thing I think can be improved is the structure of the report. The title does not tell any information about the topic of the project. The conclusion and future work sections are unclear. Says the report, "as unemployment drop and GDP per capita picks up, the model predicted a movement away from democrat candidates". I would like to know how to quantify it, or somehow visualize it.

Final Report Peer Review

I think your topic is very interesting, which studies election from tweets. Because major predictions use the poll and yield inaccurate results, it is a refreshing idea to look at the election from a different perspective. And your results also showed that tweets contain valuable information of the public attitude.

First, I love your choices of features since they seem informative and remain independent with each other, which is fundamental for linear regression. Moreover, I think sentimental scores are powerful for text analysis and I love the fact that you used plenty model to train for improvement. Further, the visualizations are a great way to show and compare model results. However, I think the interpretation of the first figure is not clear enough. I would appreciate it if you could explain in detail what the numbers mean. Overall, I think the report was well organized.

By xz485

Peer Review

3 Things That I Liked

The 2016 election is a very relevant and interesting topic, and all of the questions that you propose to answer are important.
This project will give you valuable experience in using an API. Collecting relevant tweets via the API is certainly an extra challenge compared to finding a ready made data set.
The fact that tweets are natural language will make this an interesting challenge for you.

3 Areas for Improvement

I would like to know which of the questions you think will be most reasonable to answer with the type of data available. How, for example would tweets help you predict voter turnout? Unless you have experience with it, I think the natural language processing aspect of this will be a huge challenge, and nothing that we learn in this class will really help with that.
What would be your training data? Will you create your model based on tweets prior to the 2012 election? Maybe 2008, since 2012 involved an incumbent candidate. But twitter has changed and grown tremendously since then (Founded in 2006).
How will you manage the fact that the election will be over before the project is due? When will you stop collecting the data? How will you ensure that the true results don’t impact your modeling decisions?

Final Report Comment

A few things I really like about the project:

The idea of using economic indicators to predict the shift from 2012 and the visualization is cool.
I like some discussion about public voices and public silence.

What (might) could be improved on:

The whole report lacks the flow of logic; economic indicators seems like a preliminary trial you did since it is separated from the Tweet data training afterward. Perhaps you might consider combine the twp data sets to do the prediction;
How you generate the review score using tweet data is unknown to readers;
There's no clear conclusion from the report: we mentioned public voices are hard to predict and modern data science methods are not reliable (we know it is hard to solve rn), but maybe you could give some insights with respect to your project that could be used to address this problem.

Thank you.

Midterm Review

Looks like something weird happened with your font or LaTeX haha.

What an interesting project idea! My first hesitation is that you need to generate features from Tweets which is a non trivial problem, but you seem to handle it just fine. I think you had a good idea for your baseline given that Twitter allows limited access of historical data.

I think one way to improve your report is to organize your sections more naturally. I got lost in section 4 because I think you tried to address too many things too quickly. It's good that you address all the potential issues your model has, but that you also talk about how you will improve those in the next part of this project.

siyusiyuyang / orie4740-presidential-election-prediction-project Goto Github PK

orie4740-presidential-election-prediction-project's People

Contributors

Watchers

Forkers

orie4740-presidential-election-prediction-project's Issues

Peer Review

Midterm evaluation by adk87

Final Report Peer Review

Peer review for final report

Final Report Peer Review

midterm review

Final Report Review (bz284)

Final Report Peer Review

Peer Review

Peer Review

Final Report Comment

Midterm Review

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent