Social BERTerfly 🦋

Predicts your personality out of the 16 Myers-Briggs Type Personalities by your Twitter handle and compares your personality types with the people that you follow

It utilizes machine learning classifier and NLP using the state of the art language model - BERT (Bidirectional Encoder Representations from Transformers) to predict the personality type of the given user based on their recent tweets.

Getting Started: 🙌

How to run locally:

Follow the below steps to run and explore your personality types, as well as that of your friends!

git clone https://github.com/MLH-Fellowship/Social-BERTerfly.git
Install our model weights from the following Drive link:

BERT_base_model
Place the downloaded .h5 model under server/models/.
Navigate to the server folder by:

cd server/
Install dependencies by:

pip install -r requirements.txt (you can install the packages in a virtualenv if you prefer)
Add your Twitter API keys and authorization credentials in the .env file. To get Twitter API key you can refer to this article. Do not make a PR or publish .env file with your Twitter API key and credentials. Create a separate copy of .env file in your cloned repo and delete if after use or you can uncomment the "/server/.env" in gitignore.
Create a new folder "twitter_data" in the same directory to store the fetched tweets.
Run the following in your terminal:

flask run

or, python app.py
Wait around 15 seconds for the model to load.
Visit the application at http://127.0.0.1:5000/ and enjoy exploring various personality traits for you and your following!

Note : Make sure to click on Submit button first to fetch the tweets and results. After the personality type is displayed on the landing page, click on Go to Dashboard for detailed analysis.

Start contributing! 📣

If you wish to contribute to our model, you can take a look at our notebook, and provide suggestions or comments.

An Example:

Landing Page:

A brief description of personality types:

Try it Out:

Head over to the Get Started section to put it your Twitter Handle and press Submit. The model should take approx. 15 sec to return your predicted personality type on the screen as follows:

Head over to the Dashboard:

Click on Go to Dashboard to get detailed personality analysis along with career suggestions.

Compare personality types!:

Now you can also compare your personality type against that of your followers and friends!

Tech Stack:

Twitter API for fetching tweets
tweepy for connecting the API with Python (https://pypi.org/project/tweepy/)
Flask for the backend server
Google colaboratory for collaborating on the model and accessing the free TPU 😂
Keras for training and testing the BERT model
BERT as a SOTA model for tweet predictions. (https://arxiv.org/abs/1810.04805)
Bootstrap for the homepage and the dashboard UI
chartjs for displaying graphs on the Dashboard

Implementation Details:

P.S: If you ain't into the boring stuff, head on over to the next section to contribute to our model and the app!

About MBTI

The Myers Briggs Type Indicator (or MBTI for short) is a personality type system that divides everyone into 16 distinct personality types across 4 axis:

Introversion (I) – Extroversion (E)
Intuition (N) – Sensing (S)
Thinking (T) – Feeling (F)
Judging (J) – Perceiving (P)

It is one of, if not the, the most popular personality test in the world. It is used in businesses, online, for fun, for research and lots more. From scientific or psychological perspective it is based on the work done on cognitive functions by Carl Jung i.e. Jungian Typology. This was a model of 8 distinct functions, thought processes or ways of thinking that were suggested to be present in the mind. Later this work was transformed into several different personality systems to make it more accessible, the most popular of which is of course the MBTI.

Dataset

For the dataset, we have used the famous Myers-Briggs Personality Type Dataset that includes a large number of people's MBTI type and content written by them. This dataset contains over 8600 rows of data, on each row is a person’s:

- Type (This persons 4 letter MBTI code/type)
- A section of each of the last 50 things they have posted (Each entry separated by "|||" (3 pipe characters))

BERT

Bidirectional Encoder Representations from Transformers (BERT) is a Transformer-based machine learning technique for natural language processing (NLP) pre-training developed by Google. BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google. As of 2019, Google has been leveraging BERT to better understand user searches.

Data Fetching:

Using tweepy and Twitter API, we fetch the 50 latest tweets posted by the user according to the username entered. These tweets are stored in a .csv file and sent for preprocessing, and finally the cleaned texts are sent to the Keras model.

Data preprocessing:

We have used regex to detect special characters like '@,emojis' etc. from the posts, remove stopwords and punctuation, convert the text to lowercase and stemming to extract the root of words. The preprocessed data is split using train_test split and sent to the Keras model for predictions.

BERT Model summary:

Layer (type)                 Output Shape              Param #   
=================================================================
input_word_ids (InputLayer)  [(None, 1500)]            0         
_________________________________________________________________
tf_bert_model_1 (TFBertModel ((None, 1500, 768), (None 109482240)) 
_________________________________________________________________
tf_op_layer_strided_slice_1  [(None, 768)]             0         
_________________________________________________________________
dense_1 (Dense)              (None, 16)                12304     
=================================================================
Total params: 109,494,544
Trainable params: 109,494,544
Non-trainable params: 0

Results achieved:

We tested using a LSTM model, and BERT-base to contrast accuracies.

Model	Train accuracy	Validation accuracy
LSTM baseline	18.96%	16.9%
BERT-base-uncased	85%	79%

Deployment:

Uses flask for the backend and model deployment and Bootstrap for building the Dashboard and the Homepage UI.

Contributing:

Social BERTerfly is fully Open-Source and open for contributions! We request you to respect our contribution guidelines as defined in our CODE OF CONDUCT and CONTRIBUTING GUIDELINES.

Contributors

Made with ❤️️ by Team Social-BERTerfly as part of MLH Explorer Fall Fellowship 2020 Sprint3.

AttributeError: 'NoneType' object has no attribute 'to_csv' (Twitter scraper return null)

Hi and thanks for this cool project!

First issue on requirements, dataclasses==0.8 is not available on latest python, staying on 0.6 would be fine.
Also pyasn1-modules is already part of dist-package in ubuntu 20.04, I had to comment it out since it creates an error.

Now to the main issue. Once successfully installed on a clean ubuntu, the server starts fine, but when I submit a handle I get the following trace:

127.0.0.1 - - [17/Jan/2021 07:09:36] "OPTIONS /tweet_pred HTTP/1.1" 200 -
failed on_status, Failed to send request: Only unicode objects are escapable. Got None of type <class 'NoneType'>.
[2021-01-17 07:09:40,698] ERROR in app: Exception on /tweet_pred [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.8/dist-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.8/dist-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/Social-BERTerfly/server/app.py", line 36, in tweet
    tweet_return(user_handle)
  File "/opt/Social-BERTerfly/server/twitterscraper.py", line 82, in tweet_return
    twitter.get_user_tweets(str(tweet_handle)).to_csv(tweet_path)
AttributeError: 'NoneType' object has no attribute 'to_csv'
127.0.0.1 - - [17/Jan/2021 07:09:40] "POST /tweet_pred HTTP/1.1" 500 -

Looks like the twitter scraper does not return any results.
My instance is on GCP and firewall allow external calls...

After looking a bit in the code I see that auth credentials are needed amd that this is not a credential free scraper like twint... too bad :)

So I think your readme should mention this part about creating a .env file, and about dependency issues as well.

mlh-fellowship / social-berterfly Goto Github PK