Coder Social home page Coder Social logo

killedmufasa / amputatorbot Goto Github PK

View Code? Open in Web Editor NEW
160.0 2.0 10.0 1.12 MB

Remove AMP from your URLs. AmputatorBot is a highly specialised Reddit and Twitter bot that automatically replies to comments, submissions and tweets containing AMP URLs with the canonical link(s). It's also available as a website and REST API. See also: https://www.reddit.com/r/AmputatorBot/comments/ehrq3z/why_did_i_build_amputatorbot/.

Home Page: https://www.amputatorbot.com/

License: GNU General Public License v3.0

Python 100.00%
amp amp-html reddit reddit-bot amputatorbot bot praw praw-reddit reddit-api open

amputatorbot's Introduction

#AmputatorBot

TL;DR: Remove AMP from your URLs. AmputatorBot is a highly specialised Reddit (and former Twitter) bot that automatically replies to comments and submissions containing AMP URLs with the canonical link(s). It's also available as a website and REST API, but those haven't been made open source here.

FAQ, About & Why

Features

#AmputatorBot demo

Main features:

  • 10 specialised canonical-finding methods, allowing for an accuracy rate of +97%. For example, by:
    • Scanning the HTML contents
    • Detecting and following redirects
    • Guessing, and then checking article similarity with newspaper
    • … and many more!
  • Detect AMP links using 14 patterns, and reply to items containing them with the canonical link and some info
  • Compare and test canonicals and pick the best
  • Stream Reddit comments, submissions and inbox messages
  • Extensively tested using a (private) database of over 200K AMP links and their canonicals, also functioning as caching

Nice bonuses:

  • Detect unique URLs with URLExtract and strip them of any artifacts
  • Object-oriented, allowing for a handy, free and publicly available API
  • Allow users to opt out and undo this
  • Send DMs when summoned by a user
  • Items interacted with are automatically being tracked
  • Log and datafiles are automatically generated

See also:

Set up

  1. Clone the repository
  2. Run pip install -r requirements.txt to install dependencies
  3. Change the filename of static.txt to .py (see /static)
  4. Configure the application by tweaking static.py (required)
  5. Choose a check-[...].py script to run
  6. Configure the script's settings in run_bot(). Set everything (guess_and_check, reply_to_post, save_to_database) to False when starting out. Consider deleting or disabling the database canonical method.
  7. Run the script - All logs and required datafiles should be automatically and dynamically created.
  8. Stop the script.
  9. Check out the new files in /data and edit them to your liking.
  10. Re-run the script and enjoy!

Support the project

  • Summon AmputatorBot on Reddit, like so: u/AmputatorBot. For more info, see here.
  • Give feedback: Most new features and improvements are directly influenced by your feedback. So, hit me up if you have any feedback. Contact me on Reddit or Fill an issue.
  • Star: By starring the project here on GitHub, we can reach more folks and unlock new options. It also gives me something to brag about :p
  • Contribute: Pull requests are a great way to contribute directly to the code and functionality.
  • Spread the word: In the end, the only goal of AmputatorBot is to allow people to have an informed choice. You can help by simply spreading the word!

Sponsor

The server for the bot, website, and API costs about €10 ($12) per month. If you support AmputatorBot's mission and can chip in, any donation would be a huge help. Every bit goes straight into server expenses. Thanks a bunch!

PayPal: https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=EU6ZFKTVT9VH2
Or, donate to our friends in Ukraine instead: https://u24.gov.ua

From the bottom of my heart, huge thanks for the tremendous support! <3

amputatorbot's People

Contributors

killedmufasa avatar mgitre avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

amputatorbot's Issues

Suggestions Of Changes

This list isn't a complete list but it does include some really important changes.

Miscellaneous

  • Just have once license file. You don't need both LICENSE and LICENSE.txt. This should be easy enough to fix.

  • File names should be more clear. Mentions_bot and comment_bot and submissions_bot aren't clear names to differentiate. They probably should all be one file anyway with different threads. You can read more on how multithreading works in python here.

Code Style

Reading through the code, there's a number of things that should be addressed in terms of coding style.

  • very long one-line strings

in Python, you can actually make multi-line strings that are much easier to read.

cleaner_message = """This message is very long.
In this format, newlines are newlines in the code instead of \n.
so I can make a multi-line message actually be multiple lines in code instead of 
one long line"""

or you can do something like this if you still want it to be just on line

cleaner_message = ("this message is going to be way too long to put on one"
                  "line of code. In this method, cleaner_message will still be one line,"
                  "but I can write it out in more lines in code")
  • I'd recommend using PEP8 style guidlines (editors can actually verify that you are following the guidelines). Using a consistent styling type is important and PEP8 is a pretty widespread style so it's a good one to use.

    • Gives other people reading your code a reference point

    • Following PEP8 will actually have your editor warn you when your lines get to large. PEP8 styling has lines no larger than 79 characters.

    • Editors/IDEs nicely support PEP8


There's other feedback to give but I think that's feedback is at least good to start with

Suggestion: Add a public REST API that amputates links

I'd love to be able to use your amputation in my own projects, so it would be really cool if there was a REST API that takes a URL and amputates it.
I know this is a lot to ask and I would offer my help but I don't know Python at all, so I'm just throwing this suggestion out here in case you or anyone else would like to implement it.

Example:

GET to https://amputatorbot.com/api/?url=https%3A%2F%2Fgoogle.com%2Famp%2Fs%2Ffoo.com%2Fbar

would return something like:

{
	"error": false,
	"url": "https://foo.com/bar"
}

Website and api uses cloudflare

Hello,
I've noticed that the website and the api uses cloudflare.

This is an issue due to the issues with the way cloudflare works, the fact that the site is inaccessible through Tor, and the general issues related to privacy.

See http://crimeflare.eu.org for more information.

False Positive when URL ends in 'amp' and has query params

Observed here: https://www.reddit.com/r/mtgcube/comments/103dc4d/is_it_just_me_or_do_people_seem_to_be/j2y91j1/

Then attempted to verify the behavior using https://www.amputatorbot.com/
It appears to be a false positive occurring when the url ends in amp and has query params trailing that.

Flagged:

https://scryfall.com/card/clb/870/skullclamp?utm_source=mtgcardfetcher

Not Flagged:

https://scryfall.com/card/clb/870/skullclamp
https://scryfall.com/card/clb/870/skullclamps?utm_source=mtgcardfetcher

And then interestingly this errors out with a 500 on the website:

https://scryfall.com/card/clb/870/skullclamp?

URL encoding breaks the API

https://www.amputatorbot.com/api/v1/convert?gac=true&md=3&q=https://amp.cnn.com/cnn/us/live-news/damar-hamlin-collapse-bills-bengals-game-intl-hnk/index.html yields:

[{'amp_canonical': None, 'canonical': {'domain': 'cnn', 'is_alt': False, 'is_amp': False, 'is_cached': None, 'is_valid': True, 'type': 'REL', 'url': 'https://www.cnn.com/us/live-news/damar-hamlin-collapse-bills-bengals-game-intl-hnk/index.html', 'url_similarity': 0.9473684210526315}, 'canonicals': [{'domain': 'cnn', 'is_alt': False, 'is_amp': False, 'is_cached': None, 'is_valid': True, 'type': 'REL', 'url': 'https://www.cnn.com/us/live-news/damar-hamlin-collapse-bills-bengals-game-intl-hnk/index.html', 'url_similarity': 0.9473684210526315}, {'domain': 'cnn', 'is_alt': False, 'is_amp': False, 'is_cached': None, 'is_valid': True, 'type': 'OG_URL', 'url': 'https://www.cnn.com/us/live-news/damar-hamlin-collapse-bills-bengals-game-intl-hnk/index.html', 'url_similarity': 0.9473684210526315}, {'domain': 'cnn', 'is_alt': False, 'is_amp': False, 'is_cached': None, 'is_valid': True, 'type': 'SCHEMA_MAINENTITY', 'url': 'https://www.cnn.com/us/live-news/damar-hamlin-collapse-bills-bengals-game-intl-hnk/index.html', 'url_similarity': 0.9473684210526315}], 'origin': {'domain': 'cnn', 'is_amp': True, 'is_cached': False, 'is_valid': True, 'url': 'https://amp.cnn.com/cnn/us/live-news/damar-hamlin-collapse-bills-bengals-game-intl-hnk/index.html'}}]

whilest
https://www.amputatorbot.com/api/v1/convert?gac=True&md=3&q=https%3A%2F%2Famp.cnn.com%2Fcnn%2Fus%2Flive-news%2Fdamar-hamlin-collapse-bills-bengals-game-intl-hnk%2Findex.html yields:

{'error_message': "Error: Entry doesn't meet criteria (no AMP link detected)", 'result_code': 'error_no_amp'}

AmputatorBot doesn't work with thelocal.it pages

AmputatorBot couldn't reply to the comment or submission you summoned it for.

AmputatorBot ran into the following error: there were no canonical URLs found.

This error has been logged and is being investigated. Common causes for this error are: bot- and geoblocking websites and badly implemented AMP specs.

Feel free to leave feedback by contacting u/killed_mufasa, by posting on r/AmputatorBot or by opening an issue on GitHub.

You're a very good human for trying <3

NEW: With AmputatorBot.com you can remove AMP from your URLs in just one click! You could try it again there but it will probably raise an error again: https://AmputatorBot.com/?https://www.google.com/amp/s/www.thelocal.it/20191115/five-italian-police-officers-jailed-over-death-of-stefano-cucchi/amp

The url

https://www.google.com/amp/s/www.thelocal.it/20191115/five-italian-police-officers-jailed-over-death-of-stefano-cucchi/amp

Strip user agent

Strip out (and change) the user agent from the source files. This will stop issues with other script's using it and thus reddit being cranky.

Can either use environment variables, or add it to the config file

Some suggestions

Hello! I want to preface this by thanking you for making this bot. I appreciate this kind of initiative. That being said, I read that you were new to Python when making this, so I looked over it and have a few ideas for improvements. I could go through and make some changes, but I figured you may prefer to architect this yourself.

Line 30 of submissions_boy.py and line 36 of comment_bot.py both contain a similar line:
for _ in r.subreddit('amputatorbot+audio+chrome+degoogle+europe+google+firefox+gaming+history+programming+robotics+security+seo+tech+technology+test+todayilearned+worldnews')

This is a really long, and I might say unruly, line. The contents (the subreddit names) happen to be the same. I'm also assuming that they will remain to stay the same in the future. This means that to update the list somebody will have to make sure to change it in both spots. Instead perhaps you could do something like the following (this probably go in a separate file):

def allowed_subreddits():
    return ["amputatorbot", "audio", "chrome", "degoogle", ... ]

def format_allowed_subreddits()"
   return "+".join(allowed_subreddits())

Doing this could help you print the list of subreddits beforehand with something like print("Obtaining the stream of subreddits", ", ".join(allowed_subreddits())")

This could also support other options for specifying the subreddits in the future. You could change allowed_subreddits to read from a file instead if you chose.

There are plenty of spots that you can do similar tricks to make the code easier to read and improve in the future. Another good example could be a is_amp_url function to avoid doing if "/amp" in submission.url or ".amp" in submission.url or "amp." in submission.url or "?amp" in submission.url or "amp?" in submission.url or "=amp" in submission.url or "amp=" in submission.url and "https://" in submission.url: when needed (although this may not be the best way to name the function as in another spot you're looking through an entire comment. Perhaps contains_amp_url would be better?). In general it is considered a good idea to not repeat yourself. That's one reason why laziness is one of the three great virtues of a programmer.

I'm happy to help make some changes if you'd like. Thanks again for putting your into developing this.

Bot does not recognize Google pages on sites besides google.com

Hey, quick thing I noticed when I summoned the bot:

The bot will recognize when AMP links are hosted on `google.com and add a little note to the end of the message: https://www.reddit.com/r/worldnews/comments/extdrc/thailand_cures_coronavirus_with_antihiv_drug/fge6u3g/?context=3

However, it will not recognize Google websites on foreign-country domains, such as google.co.uk, which is what happened when I summoned it (comment is invisible in the subreddit but shows on the profile): https://www.reddit.com/r/nextfuckinglevel/comments/exqwpv/they_actually_did_it_a_1000_bed_hospital/fge9t18/?context=3

Anyways, thank you for your work on the bot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.