This repository contains a few python scripts that act as Instagram bots that scrape the Internet for interesting daily / weekly content. It is not intended as a serious repository, it was created of of boredom and just for fun / entrataining reasons.
Sometimes they stop and the accounts need a password change since Facebook / Meta / Instagram does not like bots posting on their platform. So there might be "service interruptions" from time to time.
Here is a list of the bots currently present in this repo:
astronomy_picture_of_the_day: posts pictures from NASA's APOD website, which stands for Astronomy Picture Of the Day;
the_new_yorker: The New Yorker mag covers;
wikipedia_featured_article: the picture of Wikipedia's Today's featured article section;
wikipedia_featured_picture: like the prvious one, but the focus is on the Picture of the day page
zanichelli_parola_del_giorno: the Zanichelli is a famous italian dictionary which has a word of the day section;
santo_del_giorno: every day a new saint from Santo del giorno.
The core of this project is web scraping. Given a particular web page, each bot has the same behaviour:
- it downloads the page
- searches for the content in the page (text or image), possibly downloading it from other pages
- a Latex file is compiled in order to create a square pdf (7cm x 7cm) which is then converted in
jpg
- sets the caption of the image and the hashtags
- connects to Instagram and posts the picture with the caption
Each script can be executed by calling it inside it's folder (e.g. python3 the_new_yorker.py
).
The -d
or --debug
argument can be used to test the script locally and avoid it posting on Instagram.
The requirements.txt
file contains the python dependencies.
In order to execute the scripts daily I have created a cron job in Github actions to execute all the bots.
This can be done in a more granular way, which means having a yml
file for each script, but the initialization of the docker container used to compile the Latex files takes to much time compared to having them all execute together.
Here is a list of the main libraries used in this project:
- BeautifulSoup: for web scraping
- PIL: for image editing / conversion
- pdf2image and poppler-utils: for converting the pdf files into jpg files
- wget: for downloading content from the Internet
Currently, the Instagram API library used is instagrapi. Such library contains API wrappers that have been reverse-engineered, thus not guaranteeing they will work forever.
A custom Docker container is initialized to execute the scripts. This detaches the whole project from GitHub Action's containers and environments, allowing for better portability of the project (e.g. it can be executed the container even on a Raspberry Pi).
Unfortunately the Facebook team (who currently owns Instagram, among a thousand other things), does not like bots and other automated scripts that mess around with their data, so they frequently change APIs and block requests from third party libraries. igbot and other libraries such as instagram_private_api are currently being blocked or not up to date with the APIs from FB.
As said before, this repository is not intended as a too serious work, just a time-filling fun activity that allowed me to get acquainted with Instagram's APIs and other APIs from sites as Spotify, Google Trends and YouTube (even thouth there are no bots for them... yet!).
There are many improvements that can be done, here are just a few:
- avoid the use of Latex and use the PIL library to automatically generate the square images
- notification system when the bots fail
- badges with the latest post from each bot
- add Instagram stories and post videos
- make more granular
yml
workflow files, possibly one for each python script