Coder Social home page Coder Social logo

yshalsager / facebook2rss Goto Github PK

View Code? Open in Web Editor NEW
33.0 3.0 5.0 242 KB

Turn Facebook feeds into RSS!

License: GNU General Public License v3.0

Python 96.18% Shell 1.20% Dockerfile 2.62%
fastapi rss-generator rss-feed-scraper facebook playwright-python playwright

facebook2rss's Introduction

Facebook to RSS API [WIP]

A small API for accessing Facebook profile, pages, groups posts and notifications as RSS feeds. Based on FastAPI and Playwright.

Important

The tool is no longer under-development. Facebook doesn't like it, check more info here #21

Disclaimer

1- This tool is provided with no warranty of any kind. I am not responsible for anything that might happen to your Facebook account that will be automated by this tool.

2- This tool is still experimental. It has not been tested heavily and still has some features that are not implemented yet. I built it for personal use but decided to release it as a public open source project. If you have any questions or suggestions feel free to open an issue. Contributions are welcome too!

Installation

The tool requires Python 3.7 with pip v19+ installed or poetry if you use it.

Clone the repository and run any of the following commands:

Using poetry

poetry install

Using Pip

pip install .

Finally, install Playwright required files

playwright install chromium

Docker

You can use Docker to deploy the API quickly.

Build docker image

docker build -t facebook2rss .

Run the container

 docker run -p 8000:8000 -e EMAIL=email -e PASSWORD=password -d facebook2rss

email and password are facebook credentials. You can use any environmental variables defined in config_example.env as:

 docker run -p 8000:8000 -e EMAIL=email -e PASSWORD=password -e API_KEY="123" -e USE_KEY=True -d facebook2rss

Docker compose

  • First, copy config_example.env and fill it with your desired configuration.
  • If you haven't logged in using your Facebook account yet, you may uncomment login step in start.sh file before deployment.
  • Now, run the following command:
docker-compose up -d
  • The API should be running now, checkout using the command docker-compose logs -f.

Notes:

  • By default, the API runs on port 8000 using docker-compose. This can be changed by editing start.sh and docker-compose.yml.
  • To stop the API, run the following command docker-compose stop.
  • To start it again: docker-compose start.
  • To delete the built docker image docker-compose down.
  • To update to the latest version, update the cloned git repo then run docker-compose up -d.

Usage

  • First, If you want to access profiles and private groups feeds, login to Facebook using the following command that will save your session in order to be used later:
python3 -m facebook_rss --login -u email -p password
  • Next, run the following command to start serving the API:
uvicorn facebook_rss.main:api
  • You can pass any uvicorn cli options like host and port. Also, you can use your own configuration dot env ( copy config_example.env and change the available options) file by passing it to uvicorn using --env-file
uvicorn facebook_rss.main:api --env-file config_example.env

Getting Facebook Feeds as RSS

  • All you need to do is to access the respective Facebook feed API route as detailed in the available API documentation thanks to FastAPI.

  • For example if the API is running on localhost and port 8080 you can access the documentation in http://127.0.0.1:8080/docs.

  • In a nutshell, here are the available routes: routes

Features

  • Access Facebook pages and public groups feed without using an account.
  • Access Facebook profiles, pages, groups and notifications when running the API with a Facebook account.
  • Your login is automatically saved and refresh on each API request to keep you signed in.
  • Facebook posts can be fetched as HTML (default) or text.
  • Append Facebook comments of any post RSS feed. (Disabled by default)
  • Easily configure the API as you want either using custom config.env file or by overriding a certain option using uvicorn.
  • RSS feeds are cached by default (for 30 minutes) to not abuse Facebook and trigger any anti-automation mechanisms. This can be configured as you want and also can be ignored by using no_cache=1 parameter with any of the API routes.
  • Supports using HTTP or SOCKS5 proxies.
  • Secure access to the API using your own api_key that can be configured via env file. (Disabled by default)

Limitations

  • Apparently Facebook restricts access to public profiles on mbasic website after several visits.

Current status

  • Data is being scraped from mbasic Facebook website that doesn't use Javascript and cached for 30 minutes.

facebook2rss's People

Contributors

dependabot-preview[bot] avatar kbzowski avatar yshalsager avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

facebook2rss's Issues

list index out of range

Built and run from Docker on Linux.

facebook2rss    | 2021-03-13 11:34:35,279 [INFO] uvicorn.error [server.serve:64]: Started server process [7]
facebook2rss    | 2021-03-13 11:34:35,280 [INFO] uvicorn.error [on.startup:26]: Waiting for application startup.
facebook2rss    | 2021-03-13 11:34:35,280 [INFO] fastapi [main.startup_event:44]: Performing startup tasks...
facebook2rss    | 2021-03-13 11:34:35,280 [INFO] fastapi [main.startup_event:47]: Login cookies file was not found, setting working mode to no account!
facebook2rss    | 2021-03-13 11:34:35,280 [INFO] uvicorn.error [on.startup:38]: Application startup complete.
facebook2rss    | 2021-03-13 11:34:35,283 [INFO] uvicorn.error [server._log_started_message:199]: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
facebook2rss    | 2021-03-13 11:34:49,463 [INFO] browser [browser.get_browser:61]: Browser launched with saved cookies.
facebook2rss    | 2021-03-13 11:34:49,472 [INFO] cache [decorators.wrapper:21]: No valid cached feed found for piwapodwiaduktem, getting a new feed...
facebook2rss    | 2021-03-13 11:34:51,646 [INFO] uvicorn.access [httptools_impl.send:463]: 10.0.1.43:52404 - "GET /page/piwapodwiaduktem HTTP/1.0" 500
facebook2rss    | 2021-03-13 11:34:51,704 [INFO] browser [browser.get_browser:69]: Browser shutdown.
facebook2rss    | 2021-03-13 11:34:51,704 [ERROR] uvicorn.error [httptools_impl.run_asgi:401]: Exception in ASGI application
facebook2rss    | Traceback (most recent call last):
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 398, in run_asgi
facebook2rss    |     result = await app(self.scope, self.receive, self.send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
facebook2rss    |     return await self.app(scope, receive, send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/uvicorn/middleware/message_logger.py", line 65, in __call__
facebook2rss    |     raise exc from None
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/uvicorn/middleware/message_logger.py", line 61, in __call__
facebook2rss    |     await self.app(scope, inner_receive, inner_send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/fastapi/applications.py", line 199, in __call__
facebook2rss    |     await super().__call__(scope, receive, send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/applications.py", line 111, in __call__
facebook2rss    |     await self.middleware_stack(scope, receive, send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
facebook2rss    |     raise exc from None
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
facebook2rss    |     await self.app(scope, receive, _send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
facebook2rss    |     raise exc from None
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
facebook2rss    |     await self.app(scope, receive, sender)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/routing.py", line 566, in __call__
facebook2rss    |     await route.handle(scope, receive, send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/routing.py", line 227, in handle
facebook2rss    |     await self.app(scope, receive, send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/routing.py", line 41, in app
facebook2rss    |     response = await func(request)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/fastapi/routing.py", line 201, in app
facebook2rss    |     raw_response = await run_endpoint_function(
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/fastapi/routing.py", line 148, in run_endpoint_function
facebook2rss    |     return await dependant.call(**values)
facebook2rss    |   File "./facebook_rss/utils/decorators.py", line 50, in wrapper
facebook2rss    |     value = await func(*args, **kwargs)
facebook2rss    |   File "./facebook_rss/utils/decorators.py", line 22, in wrapper
facebook2rss    |     value = await func(*args, **kwargs)
facebook2rss    |   File "./facebook_rss/routes/page.py", line 33, in get_page
facebook2rss    |     posts: List[Post] = await _page.get_posts(
facebook2rss    |   File "./facebook_rss/browser/common/fb_page.py", line 133, in get_posts
facebook2rss    |     post_url = await posts[item].get_attribute('href')
facebook2rss    | IndexError: list index out of range

Document the code

I was in a hurry so I didn't write a single documentation line :D

Facebook doesn't like this tool

So, I have been developing this tool for my own usage. Since Feb 2021 it has been deployed on my server and working but Facebook doesn't like this...

Facebook has been flagging the account I use with the tool. It has been blocked from accessing public profiles on mbasic and mobile websites, then it has been banned temporarily and I was asked to confirm the account using phone number, after that It was flagged for suspicious activity and changing password was required, and finally today my account was suspended and a manual review from Facebook team is required.

Thus, and since I no longer have time to work on the tool because of my university, I decided to stop maintaining it.

By the way, the tool still can be used perfectly without a Facebook account to access public pages and groups (as for now). I doubt Facebook will give an IP ban or something since sent requests are not too much.

Find a way to get posts dates from Facebook

Currently, the generated RSS doesn't have any date information of fetched posts.

I think it would be better if it had the date of each post but not sure how could this be implemented since different locales will have different date formats.

More investigation is needed.

Password in command line? 2FA auth support?

Hi!

this is not really an issue but a feature request / improvement potencial:
in the Usage section I see that one has to provide the Facebook password in plain text in the command line - which is not okay, it will be saved to the shell history file - probably - if you aren't careful enough, which is kind of okay on your own system but still.. If I were you I would provide a way to ask for the password or at least a way to redirect it from somewhere. And 2FA support is the other thing I would like to mention.
Honestly this project is a very nice idea, though!

TypeError: argument of type 'coroutine' is not iterable

Hello,
I have the following problem.


root@e492:~/facebook2rss# python3 -m facebook_rss --login -u k***[email protected] -p O******U
/root/facebook2rss/facebook_rss/browser/common/login.py:39: RuntimeWarning: coroutine 'BasePage.get_actual_url' was never awaited
  return bool(self._checkout_url in self.get_actual_url())
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/facebook2rss/facebook_rss/__main__.py", line 23, in <module>
    asyncio.run(login_and_get_cookies(args.email, args.password))
  File "/usr/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/root/facebook2rss/facebook_rss/tasks/login.py", line 21, in login_and_get_cookies
    if await login_page.requires_2fa:
  File "/root/facebook2rss/facebook_rss/browser/common/login.py", line 39, in requires_2fa
    return bool(self._checkout_url in self.get_actual_url())
TypeError: argument of type 'coroutine' is not iterable
root@e492:~/facebook2rss#

any ideas?

Support other-than-English interfaces?

In class ProfilePage (file pqges/mbasic/Profile.py) the posts are identify by the text "Full Story" displayed:

def posts_selector(self):
    return '//a[contains(@href, "/story.php?") and contains(text(), "Full Story")]'

if the user's default interface language is not English, the text is localized from facebook's settings language, then it might not be "Full Story".
I've checked with a French interface, and the script is working by adapting the below line:

def posts_selector(self):
    return '//a[contains(@href, "/story.php?") and (contains(text(), "Full Story") or contains(text(), "Actualité intégrale"))]'

obviously, this change would not scale with any langugage, so I'm not submitting a PR with this modification.
In a short term view, I would accept this as a limitation.

Do you have any idea for the long term support ?
Either we need to find a way to identify this part for all languages; or the identification should not rely on the displayed text.

Trying containerizing

After my problems with running on Windows, I took a shot at containerizing the application in Docker.

Dockerfile:

FROM ubuntu:20.04
RUN apt-get update && \
    apt-get install --no-install-recommends -y \
		  python3.9\
		  python3-pip\
		  libglib2.0-0\
          libnss3\
          libnspr4\
          libatk1.0-0\
          libatk-bridge2.0-0\
          libcups2\
          libdbus-1-3\
          libxcb1\
          libdrm2\
          libxkbcommon0\
          libx11-6\
          libxcomposite1\
          libxdamage1\
          libxext6\
          libxfixes3\
          libxrandr2\
          libgbm1\
          libgtk-3-0\
          libpango-1.0-0\
          libcairo2\
          libgdk-pixbuf2.0-0\
          libasound2\
          libatspi2.0-0 \
		  libxshmfence1 \
		  && rm -rf /var/lib/apt/lists/*
	
WORKDIR /usr/src/app

COPY . .
RUN pip3 install .
RUN playwright install chromium

EXPOSE 8080
RUN chmod +x /usr/src/app/start.sh
ENTRYPOINT ["/usr/src/app/start.sh"]

start.sh:

#!/bin/bash

python3 -m  facebook_rss --login -u [login] -p [pass]
uvicorn facebook_rss.main:api --host 0.0.0.0 --port 8000

Any idea, why application hangs on:

$ docker run -p 8000:8000 facebook2rss
2021-02-20 09:06:24,158 [INFO] facebook_rss.db.session [session.<module>:26]: Feed table not found, creating one

I tried to move "python3 -m facebook_rss --login -u [login] -p [pass]" to RUN in Dockerfile - same thing.
Missing playwright dependencies were deduced by trial and errors. Fortunately, playwright reports what it is missing.

NotImplementedError - can't connect to Facebook?

Don't know if I am doing something wrong, or I found some issue, so I would appreciate some help.
I run this:
py -m facebook_rss --login --email "[email protected]" --password "secret_password"

And then this:
uvicorn.exe facebook_rss.main:api --port 8080

Went to http://localhost:8080/docs to put facebook group ID
image

and click execute and that shows up in uvicorn log:

2021-02-22 22:02:11,131 [INFO] uvicorn.error [server.serve:64]: Started server process [7396]
2021-02-22 22:02:11,132 [INFO] uvicorn.error [on.startup:26]: Waiting for application startup.
2021-02-22 22:02:11,132 [INFO] fastapi [main.startup_event:44]: Performing startup tasks...
2021-02-22 22:02:11,133 [INFO] fastapi [main.startup_event:50]: Login cookies file was found, setting working mode to use account!
2021-02-22 22:02:11,133 [INFO] uvicorn.error [on.startup:38]: Application startup complete.
2021-02-22 22:02:11,133 [INFO] uvicorn.error [server._log_started_message:199]: Uvicorn running on http://127.0.0.1:8080 (Press CTRL+C to quit)
2021-02-22 22:02:27,854 [ERROR] asyncio [base_events.default_exception_handler:1707]: Task exception was never retrieved
future: <Task finished name='Task-6' coro=<Connection.run() done, defined at C:\Users\Przemek\AppData\Roaming\Python\Python38\site-packages\playwright\_impl\_connection.py:136> exception=NotImplementedError()>
Traceback (most recent call last):
  File "C:\Users\Przemek\AppData\Roaming\Python\Python38\site-packages\playwright\_impl\_connection.py", line 139, in run
    await self._transport.run()
  File "C:\Users\Przemek\AppData\Roaming\Python\Python38\site-packages\playwright\_impl\_transport.py", line 56, in run
    proc = await asyncio.create_subprocess_exec(
  File "c:\program files\python38\lib\asyncio\subprocess.py", line 236, in create_subprocess_exec
    transport, protocol = await loop.subprocess_exec(
  File "c:\program files\python38\lib\asyncio\base_events.py", line 1630, in subprocess_exec
    transport = await self._make_subprocess_transport(
  File "c:\program files\python38\lib\asyncio\base_events.py", line 491, in _make_subprocess_transport
    raise NotImplementedError
NotImplementedError

What next I can do to help to resolve this?

Uvicorn conflect on windows with Playwright

When I try make rss from https://www.facebook.com/piwapodwiaduktem/

I got:

2021-02-19 14:22:57,099 [INFO] uvicorn.error [on.startup:26]: Waiting for application startup.
2021-02-19 14:22:57,099 [INFO] fastapi [main.startup_event:44]: Performing startup tasks...
2021-02-19 14:22:57,100 [INFO] fastapi [main.startup_event:50]: Login cookies file was found, setting working mode to use account!
2021-02-19 14:22:57,101 [INFO] uvicorn.error [on.startup:38]: Application startup complete.
2021-02-19 14:22:57,101 [INFO] uvicorn.error [server.startup:166]: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
2021-02-19 14:23:08,234 [INFO] uvicorn.access [h11_impl.send:456]: 127.0.0.1:65213 - "GET / HTTP/1.1" 404
2021-02-19 14:23:24,145 [INFO] uvicorn.access [h11_impl.send:456]: 127.0.0.1:65228 - "GET /docs HTTP/1.1" 200
2021-02-19 14:23:24,564 [INFO] uvicorn.access [h11_impl.send:456]: 127.0.0.1:65228 - "GET /openapi.json HTTP/1.1" 200
2021-02-19 14:24:02,268 [INFO] uvicorn.access [h11_impl.send:456]: 127.0.0.1:65279 - "GET /page/piwapodwiaduktem/ HTTP/1.1" 307
2021-02-19 14:24:02,282 [ERROR] asyncio [base_events.default_exception_handler:1738]: Task exception was never retrieved
future: <Task finished name='Task-12' coro=<Connection.run() done, defined at e:\priv_projects\facebook2rss\.venv\lib\site-packages\playwright\_impl\_connection.py:136> exception=NotImplementedError()>
Traceback (most recent call last):
  File "e:\priv_projects\facebook2rss\.venv\lib\site-packages\playwright\_impl\_connection.py", line 139, in run
    await self._transport.run()
  File "e:\priv_projects\facebook2rss\.venv\lib\site-packages\playwright\_impl\_transport.py", line 56, in run
    proc = await asyncio.create_subprocess_exec(
  File "c:\python39\lib\asyncio\subprocess.py", line 236, in create_subprocess_exec
    transport, protocol = await loop.subprocess_exec(
  File "c:\python39\lib\asyncio\base_events.py", line 1661, in subprocess_exec
    transport = await self._make_subprocess_transport(
  File "c:\python39\lib\asyncio\base_events.py", line 493, in _make_subprocess_transport
    raise NotImplementedError
NotImplementedError

Browser shows nothing - no response is received. Whatever is causing this error, it would be nice to see at least a status 500.

Add fast mode to get posts without details

In mbasic Facebook site, it's possible to get posts from the page without opening each post, but this doesn't give the full post text. Currently, the tool gets posts list then gets each post's full text separately but the process is taking a long time. It would be a good idea to add an API parameter to switch modes.

waiting for selector "//input[@name="email"]"

Ban from FB ?


root@e492:~/facebook2rss# python3 facebook_rss --login -u k******[email protected] -p O*********U
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "facebook_rss/__main__.py", line 23, in <module>
    asyncio.run(login_and_get_cookies(args.email, args.password))
  File "/usr/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.8/dist-packages/facebook_rss/tasks/login.py", line 18, in login_and_get_cookies
    logged_in = await login_page.login(user, password)
  File "/usr/local/lib/python3.8/dist-packages/facebook_rss/browser/common/login.py", line 30, in login
    await self.page.fill(self.email, email)
  File "/usr/local/lib/python3.8/dist-packages/playwright/async_api/_generated.py", line 7414, in fill
    raise e
  File "/usr/local/lib/python3.8/dist-packages/playwright/async_api/_generated.py", line 7403, in fill
    await self._impl_obj.fill(
  File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_page.py", line 636, in fill
    return await self._main_frame.fill(**locals_to_params(locals()))
  File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_frame.py", line 414, in fill
    await self._channel.send("fill", locals_to_params(locals()))
  File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_connection.py", line 36, in send
    return await self.inner_send(method, params, False)
  File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_connection.py", line 47, in inner_send
    result = await callback.future
playwright._impl._api_types.TimeoutError: Timeout 30000ms exceeded.
=========================== logs ===========================
waiting for selector "//input[@name="email"]"
============================================================
Note: use DEBUG=pw:api environment variable and rerun to capture Playwright logs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.