Coder Social home page Coder Social logo

inactivist / pytweetbot Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nschaetti/pytweetbot

0.0 0.0 0.0 815 KB

A Twitter bot written in Python to replace yourself, search and publish news about specific subjects on Twitter. PyTweetBot use Machine Learning to filter interesting articles and web pages found on the web.

License: GNU General Public License v3.0

Python 99.77% HTML 0.23%

pytweetbot's Introduction


A Twitter bot and library written in Python to replace yourself, search and publish news about specific subjects on Twitter, and automatize content publishing.

Tweet

Join our community to create datasets and deep-learning models! Chat with us on Gitter and join the Google Group to collaborate with us.

PyPI version Documentation Status

This repository consists of:

  • pytweetbot.config : Configuration file management;
  • pytweetbot.db : MySQL database management;
  • pytweetbot.directmessages : Twitter direct message functions;
  • pytweetbot.docs : Documentation;
  • pytweetbot.executor : Function and objects to execute actions;
  • pytweetbot.friends : Function and objects to manage friends and followers;
  • pytweetbot.learning : Machine learning functions;
  • pytweetbot.mail : Mail functions;
  • pytweetbot.news : Manage news acquisition and sources;
  • pytweetbot.patterns : Python class patterns;
  • pytweetbot.retweet : Manage retweets and sources;
  • pytweetbot.stats : Statistics;
  • pytweetbot.templates : HTML templates for mail;
  • pytweetbot.tools : Tools;
  • pytweetbot.tweet : Manage tweets;
  • pytweetbot.twitter : Manage access to Twitter;

Getting started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

You need to following package to install pyTweetBot.

  • nltk
  • argparse
  • logging
  • tweepy
  • sklearn
  • pygithub
  • brotli
  • httplib2
  • urlparse2
  • HTMLParser
  • bs4
  • simplejson
  • dnspython
  • dill
  • lxml
  • sqlalchemy
  • feedparser
  • textblob
  • numpy
  • scipy
  • mysql-python

Installation

pip install pyTweetBot

Authors

License

This project is licensed under the GPLv3 License - see the LICENSE file for details.

Configuration

Configuration file

pyTweetBot takes its configuration in a JSON file which looks as follow :

{
	"database" :
	{
		"host" : "",
		"username" : "",
		"password" : "",
		"database" : ""
	},
	"email" : "[email protected]",
	"scheduler" :
	{
		"sleep": [6, 13]
	},
	"hashtags":
	[
	],
	"twitter" :
	{
		"auth_token1" : "",
		"auth_token2" : "",
		"access_token1" : "",
		"access_token2" : "",
		"user" : ""
	},
	"friends" :
	{
		"max_new_followers" : 40,
		"max_new_unfollow" : 40,
		"interval" : [15, 60],
		"unfollow_interval" : 604800
	},
	"forbidden_words" :
	[
	],
	"direct_message" : "",
	"tweet" : {
		"max_tweets" : 1800,
		"exclude" : [],
		"interval" : [4.0, 6.0],
		"intervals" : [
			{
				"day": 5,
				"start": 17,
				"end": 23,
				"interval" : [1.0, 3.0]
			}
		]
	},
	"news" :
	[
		{
			"keyword" : "",
			"countries" : ["us","fr"],
			"languages" : ["en","fr"],
			"hashtags" : []
		}
	],
	"rss" :
	[
		{"url" : "http://feeds.feedburner.com/TechCrunch/startups", "hashtags" : "#startups", "lang": ["en"]},
		{"url" : "http://feeds.feedburner.com/TechCrunch/fundings-exits", "hashtags" : "#fundings", "lang": ["en"]}
	],
	"retweet" :
	{
		"max_retweets" : 600,
		"max_likes" : 0,
		"keywords" : [],
		"nbpages" : 40,
		"retweet_prob" : 0.5,
		"limit_prob" : 1.0,
		"interval" : [2.0, 4.0]
	},
	"github" :
	{
		"login": "",
		"password": "",
		"exclude": [],
		"topics" : []
	}
}

Their is two required sections :

  • Database : contains the information to connect to the MySQL database (host, username, password, database)
  • Twitter : contains the information for the Twitter API (auth and access tokens)

Database configuration

The database part of the configuration file looks like the following

"database" :
{
    "host" : "",
    "username" : "",
    "password" : "",
    "database" : ""
}

This section is mandatory.

Update e-mail configuration

You can configure your bot to send you an email with the number of new followers in the email section

"email" : "[email protected]"

Scheduler configuration

The scheduler is responsible for executing the bot's actions and you can configure it the sleep for a specific period of time.

"scheduler" :
{
    "sleep": [6, 13]
}

Here the scheduler will sleep during 6h00 and 13h00.

Hashtags

You can add text to be replace as hashtags in your tweet in the "hashtags" section

"hashtags":
[
    {"from" : "My Hashtag", "to" : "#MyHashtag", "case_sensitive" : true}
]

Here, occurences of "My Hashtag" will be replaced by #MyHashtag.

Twitter

To access Twitter, pyTweetBot needs four tokens for the Twitter API and your username.

"twitter" :
{
    "auth_token1" : "",
    "auth_token2" : "",
    "access_token1" : "",
    "access_token2" : "",
    "user" : ""
}

TODO: tutorial to get the tokens

Friends settings

The friends section has four parameters.

"friends" :
{
	"max_new_followers" : 40,
	"max_new_unfollow" : 40,
	"interval" : [15, 60],
	"unfollow_interval" : 604800
}
  • The max_new_followers set the maximum user that can be followed each day;
  • The max_new_unfollow set the maximum user that can be unfollowed each day;
  • The interval parameter set the interval in minutes between each follow/unfollow action choosen randomly between the min and the max;

Create database

You have then to create the database on your MySQL host

python -m pyTweetBot tools
    --create-database : Create the database structure on the MySQL host
    --export-database : Export tweets, tweeted and followers/friends to a file
    --import-database     Import tweets, tweeted and followers/friends from a file
    --file : File to import / to export to

You can use the "create-database" action for that :

python -m pyTweetBot tools --config /path/to/config.json --create

It is possible to export bot's data to a file with the export-database command.

python -m pyTweetBot tools --config /path/to/config.json --export --file export_file.p

And then import the bot's data from the file

python -m pyTweetBot tools --config /path/to/config.json --import --file export_file.p

Model training

Create a dataset

The first step to train a model is to create a dataset of positive and negative examples. This can be done with the train command and the "dataset" action.

python -m pyTweetBot train --dataset test.p --config ../nils-config/nilsbot.json --text-size 100 --action dataset --source news

The source argument can take the following value :

  • News : URLs from Google News and and RSS streams;
  • tweets : Tweets found directly on Twitter;
  • friends : Description of Twitter users found directly on Twitter;
  • followers : Description of Twitter users found in your list of followers;
  • home : Tweets found on our home feed;

Train a model

Once the dataset is created, we can train a model using the "train" action :

python -m pyTweetBot train --dataset test.p --config ../nils-config/nilsbot.json --model mymodel.p --action train --text-size 100 --classifier SVM
INFO:pyTweetBot:Finalizing training...
INFO:pyTweetBot:Training finished... Saving model to mymodel.p

The classifier parameter can take the following values :

  • NaiveBayes : Naive Bayes classifier;
  • DecisionTree : Simple decision tree;
  • RandomForest : Random forest;
  • SVM : Support Vector Machine;

Test a model

You can test your model's accuracy with the "test" action :

python -m pyTweetBot train --dataset test.p --config ../nils-config/nilsbot.json --model mymodel.p --action test --text-size 100
Success rate of 56.1108362197 on dataset

You can now use your model to class tweets.

Command line

Launch executors

pyTweetBot launch an executor thread for each action type. You can launch the executor daemon that way :

python -m pyTweetBot executor --config /etc/bots/bot.conf

Find new tweets

python -m pyTweetBot find-tweets --config /etc/bots/bot.conf --model /etc/bots/models/find_tweets.p

Find new retweets

python -m pyTweetBot find-retweets --config /etc/bots/bot.conf --model /etc/bots/moedls/find_retweets.p

Automatise execution with crontab

Development

Files

pytweetbot's People

Contributors

nschaetti avatar metmajer avatar tlwt avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.