Coder Social home page Coder Social logo

yshalsager / facebook2rss Goto Github PK

View Code? Open in Web Editor NEW
33.0 33.0 5.0 242 KB

Turn Facebook feeds into RSS!

License: GNU General Public License v3.0

Python 96.18% Shell 1.20% Dockerfile 2.62%
facebook fastapi playwright playwright-python rss-feed-scraper rss-generator

facebook2rss's Introduction

👋 Hi there! I'm Youssif, or better known as yshalsager on the internet :)

✨ Experienced Software Developer, Open Source Enthusiast and Technical Writer, with demonstrated expertise in building tools, websites, and chatbots! You can find more information about me here. Also, I am known for android modding and related works, as I have been tinkering with Android software development since 2014, building mods, custom recoveries, and custom ROMs for Android devices. You can find my android related works on XDA fourms.

📫 You can reach me on:

GitLab Badge XDA Badge
Linkedin Badge Twitter Badge Telegram Badge
Discord Badge e-mail Badge

🔧 Languages and Tools:










📌 Profile overview:

visitors

facebook2rss's People

Contributors

dependabot-preview[bot] avatar kbzowski avatar yshalsager avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

facebook2rss's Issues

Facebook doesn't like this tool

So, I have been developing this tool for my own usage. Since Feb 2021 it has been deployed on my server and working but Facebook doesn't like this...

Facebook has been flagging the account I use with the tool. It has been blocked from accessing public profiles on mbasic and mobile websites, then it has been banned temporarily and I was asked to confirm the account using phone number, after that It was flagged for suspicious activity and changing password was required, and finally today my account was suspended and a manual review from Facebook team is required.

Thus, and since I no longer have time to work on the tool because of my university, I decided to stop maintaining it.

By the way, the tool still can be used perfectly without a Facebook account to access public pages and groups (as for now). I doubt Facebook will give an IP ban or something since sent requests are not too much.

Add fast mode to get posts without details

In mbasic Facebook site, it's possible to get posts from the page without opening each post, but this doesn't give the full post text. Currently, the tool gets posts list then gets each post's full text separately but the process is taking a long time. It would be a good idea to add an API parameter to switch modes.

NotImplementedError - can't connect to Facebook?

Don't know if I am doing something wrong, or I found some issue, so I would appreciate some help.
I run this:
py -m facebook_rss --login --email "[email protected]" --password "secret_password"

And then this:
uvicorn.exe facebook_rss.main:api --port 8080

Went to http://localhost:8080/docs to put facebook group ID
image

and click execute and that shows up in uvicorn log:

2021-02-22 22:02:11,131 [INFO] uvicorn.error [server.serve:64]: Started server process [7396]
2021-02-22 22:02:11,132 [INFO] uvicorn.error [on.startup:26]: Waiting for application startup.
2021-02-22 22:02:11,132 [INFO] fastapi [main.startup_event:44]: Performing startup tasks...
2021-02-22 22:02:11,133 [INFO] fastapi [main.startup_event:50]: Login cookies file was found, setting working mode to use account!
2021-02-22 22:02:11,133 [INFO] uvicorn.error [on.startup:38]: Application startup complete.
2021-02-22 22:02:11,133 [INFO] uvicorn.error [server._log_started_message:199]: Uvicorn running on http://127.0.0.1:8080 (Press CTRL+C to quit)
2021-02-22 22:02:27,854 [ERROR] asyncio [base_events.default_exception_handler:1707]: Task exception was never retrieved
future: <Task finished name='Task-6' coro=<Connection.run() done, defined at C:\Users\Przemek\AppData\Roaming\Python\Python38\site-packages\playwright\_impl\_connection.py:136> exception=NotImplementedError()>
Traceback (most recent call last):
  File "C:\Users\Przemek\AppData\Roaming\Python\Python38\site-packages\playwright\_impl\_connection.py", line 139, in run
    await self._transport.run()
  File "C:\Users\Przemek\AppData\Roaming\Python\Python38\site-packages\playwright\_impl\_transport.py", line 56, in run
    proc = await asyncio.create_subprocess_exec(
  File "c:\program files\python38\lib\asyncio\subprocess.py", line 236, in create_subprocess_exec
    transport, protocol = await loop.subprocess_exec(
  File "c:\program files\python38\lib\asyncio\base_events.py", line 1630, in subprocess_exec
    transport = await self._make_subprocess_transport(
  File "c:\program files\python38\lib\asyncio\base_events.py", line 491, in _make_subprocess_transport
    raise NotImplementedError
NotImplementedError

What next I can do to help to resolve this?

Trying containerizing

After my problems with running on Windows, I took a shot at containerizing the application in Docker.

Dockerfile:

FROM ubuntu:20.04
RUN apt-get update && \
    apt-get install --no-install-recommends -y \
		  python3.9\
		  python3-pip\
		  libglib2.0-0\
          libnss3\
          libnspr4\
          libatk1.0-0\
          libatk-bridge2.0-0\
          libcups2\
          libdbus-1-3\
          libxcb1\
          libdrm2\
          libxkbcommon0\
          libx11-6\
          libxcomposite1\
          libxdamage1\
          libxext6\
          libxfixes3\
          libxrandr2\
          libgbm1\
          libgtk-3-0\
          libpango-1.0-0\
          libcairo2\
          libgdk-pixbuf2.0-0\
          libasound2\
          libatspi2.0-0 \
		  libxshmfence1 \
		  && rm -rf /var/lib/apt/lists/*
	
WORKDIR /usr/src/app

COPY . .
RUN pip3 install .
RUN playwright install chromium

EXPOSE 8080
RUN chmod +x /usr/src/app/start.sh
ENTRYPOINT ["/usr/src/app/start.sh"]

start.sh:

#!/bin/bash

python3 -m  facebook_rss --login -u [login] -p [pass]
uvicorn facebook_rss.main:api --host 0.0.0.0 --port 8000

Any idea, why application hangs on:

$ docker run -p 8000:8000 facebook2rss
2021-02-20 09:06:24,158 [INFO] facebook_rss.db.session [session.<module>:26]: Feed table not found, creating one

I tried to move "python3 -m facebook_rss --login -u [login] -p [pass]" to RUN in Dockerfile - same thing.
Missing playwright dependencies were deduced by trial and errors. Fortunately, playwright reports what it is missing.

list index out of range

Built and run from Docker on Linux.

facebook2rss    | 2021-03-13 11:34:35,279 [INFO] uvicorn.error [server.serve:64]: Started server process [7]
facebook2rss    | 2021-03-13 11:34:35,280 [INFO] uvicorn.error [on.startup:26]: Waiting for application startup.
facebook2rss    | 2021-03-13 11:34:35,280 [INFO] fastapi [main.startup_event:44]: Performing startup tasks...
facebook2rss    | 2021-03-13 11:34:35,280 [INFO] fastapi [main.startup_event:47]: Login cookies file was not found, setting working mode to no account!
facebook2rss    | 2021-03-13 11:34:35,280 [INFO] uvicorn.error [on.startup:38]: Application startup complete.
facebook2rss    | 2021-03-13 11:34:35,283 [INFO] uvicorn.error [server._log_started_message:199]: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
facebook2rss    | 2021-03-13 11:34:49,463 [INFO] browser [browser.get_browser:61]: Browser launched with saved cookies.
facebook2rss    | 2021-03-13 11:34:49,472 [INFO] cache [decorators.wrapper:21]: No valid cached feed found for piwapodwiaduktem, getting a new feed...
facebook2rss    | 2021-03-13 11:34:51,646 [INFO] uvicorn.access [httptools_impl.send:463]: 10.0.1.43:52404 - "GET /page/piwapodwiaduktem HTTP/1.0" 500
facebook2rss    | 2021-03-13 11:34:51,704 [INFO] browser [browser.get_browser:69]: Browser shutdown.
facebook2rss    | 2021-03-13 11:34:51,704 [ERROR] uvicorn.error [httptools_impl.run_asgi:401]: Exception in ASGI application
facebook2rss    | Traceback (most recent call last):
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 398, in run_asgi
facebook2rss    |     result = await app(self.scope, self.receive, self.send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
facebook2rss    |     return await self.app(scope, receive, send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/uvicorn/middleware/message_logger.py", line 65, in __call__
facebook2rss    |     raise exc from None
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/uvicorn/middleware/message_logger.py", line 61, in __call__
facebook2rss    |     await self.app(scope, inner_receive, inner_send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/fastapi/applications.py", line 199, in __call__
facebook2rss    |     await super().__call__(scope, receive, send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/applications.py", line 111, in __call__
facebook2rss    |     await self.middleware_stack(scope, receive, send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
facebook2rss    |     raise exc from None
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
facebook2rss    |     await self.app(scope, receive, _send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
facebook2rss    |     raise exc from None
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
facebook2rss    |     await self.app(scope, receive, sender)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/routing.py", line 566, in __call__
facebook2rss    |     await route.handle(scope, receive, send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/routing.py", line 227, in handle
facebook2rss    |     await self.app(scope, receive, send)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/starlette/routing.py", line 41, in app
facebook2rss    |     response = await func(request)
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/fastapi/routing.py", line 201, in app
facebook2rss    |     raw_response = await run_endpoint_function(
facebook2rss    |   File "/opt/facebook2rss/.venv/lib/python3.8/site-packages/fastapi/routing.py", line 148, in run_endpoint_function
facebook2rss    |     return await dependant.call(**values)
facebook2rss    |   File "./facebook_rss/utils/decorators.py", line 50, in wrapper
facebook2rss    |     value = await func(*args, **kwargs)
facebook2rss    |   File "./facebook_rss/utils/decorators.py", line 22, in wrapper
facebook2rss    |     value = await func(*args, **kwargs)
facebook2rss    |   File "./facebook_rss/routes/page.py", line 33, in get_page
facebook2rss    |     posts: List[Post] = await _page.get_posts(
facebook2rss    |   File "./facebook_rss/browser/common/fb_page.py", line 133, in get_posts
facebook2rss    |     post_url = await posts[item].get_attribute('href')
facebook2rss    | IndexError: list index out of range

Find a way to get posts dates from Facebook

Currently, the generated RSS doesn't have any date information of fetched posts.

I think it would be better if it had the date of each post but not sure how could this be implemented since different locales will have different date formats.

More investigation is needed.

Password in command line? 2FA auth support?

Hi!

this is not really an issue but a feature request / improvement potencial:
in the Usage section I see that one has to provide the Facebook password in plain text in the command line - which is not okay, it will be saved to the shell history file - probably - if you aren't careful enough, which is kind of okay on your own system but still.. If I were you I would provide a way to ask for the password or at least a way to redirect it from somewhere. And 2FA support is the other thing I would like to mention.
Honestly this project is a very nice idea, though!

Uvicorn conflect on windows with Playwright

When I try make rss from https://www.facebook.com/piwapodwiaduktem/

I got:

2021-02-19 14:22:57,099 [INFO] uvicorn.error [on.startup:26]: Waiting for application startup.
2021-02-19 14:22:57,099 [INFO] fastapi [main.startup_event:44]: Performing startup tasks...
2021-02-19 14:22:57,100 [INFO] fastapi [main.startup_event:50]: Login cookies file was found, setting working mode to use account!
2021-02-19 14:22:57,101 [INFO] uvicorn.error [on.startup:38]: Application startup complete.
2021-02-19 14:22:57,101 [INFO] uvicorn.error [server.startup:166]: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
2021-02-19 14:23:08,234 [INFO] uvicorn.access [h11_impl.send:456]: 127.0.0.1:65213 - "GET / HTTP/1.1" 404
2021-02-19 14:23:24,145 [INFO] uvicorn.access [h11_impl.send:456]: 127.0.0.1:65228 - "GET /docs HTTP/1.1" 200
2021-02-19 14:23:24,564 [INFO] uvicorn.access [h11_impl.send:456]: 127.0.0.1:65228 - "GET /openapi.json HTTP/1.1" 200
2021-02-19 14:24:02,268 [INFO] uvicorn.access [h11_impl.send:456]: 127.0.0.1:65279 - "GET /page/piwapodwiaduktem/ HTTP/1.1" 307
2021-02-19 14:24:02,282 [ERROR] asyncio [base_events.default_exception_handler:1738]: Task exception was never retrieved
future: <Task finished name='Task-12' coro=<Connection.run() done, defined at e:\priv_projects\facebook2rss\.venv\lib\site-packages\playwright\_impl\_connection.py:136> exception=NotImplementedError()>
Traceback (most recent call last):
  File "e:\priv_projects\facebook2rss\.venv\lib\site-packages\playwright\_impl\_connection.py", line 139, in run
    await self._transport.run()
  File "e:\priv_projects\facebook2rss\.venv\lib\site-packages\playwright\_impl\_transport.py", line 56, in run
    proc = await asyncio.create_subprocess_exec(
  File "c:\python39\lib\asyncio\subprocess.py", line 236, in create_subprocess_exec
    transport, protocol = await loop.subprocess_exec(
  File "c:\python39\lib\asyncio\base_events.py", line 1661, in subprocess_exec
    transport = await self._make_subprocess_transport(
  File "c:\python39\lib\asyncio\base_events.py", line 493, in _make_subprocess_transport
    raise NotImplementedError
NotImplementedError

Browser shows nothing - no response is received. Whatever is causing this error, it would be nice to see at least a status 500.

TypeError: argument of type 'coroutine' is not iterable

Hello,
I have the following problem.


root@e492:~/facebook2rss# python3 -m facebook_rss --login -u k***[email protected] -p O******U
/root/facebook2rss/facebook_rss/browser/common/login.py:39: RuntimeWarning: coroutine 'BasePage.get_actual_url' was never awaited
  return bool(self._checkout_url in self.get_actual_url())
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/facebook2rss/facebook_rss/__main__.py", line 23, in <module>
    asyncio.run(login_and_get_cookies(args.email, args.password))
  File "/usr/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/root/facebook2rss/facebook_rss/tasks/login.py", line 21, in login_and_get_cookies
    if await login_page.requires_2fa:
  File "/root/facebook2rss/facebook_rss/browser/common/login.py", line 39, in requires_2fa
    return bool(self._checkout_url in self.get_actual_url())
TypeError: argument of type 'coroutine' is not iterable
root@e492:~/facebook2rss#

any ideas?

Document the code

I was in a hurry so I didn't write a single documentation line :D

Support other-than-English interfaces?

In class ProfilePage (file pqges/mbasic/Profile.py) the posts are identify by the text "Full Story" displayed:

def posts_selector(self):
    return '//a[contains(@href, "/story.php?") and contains(text(), "Full Story")]'

if the user's default interface language is not English, the text is localized from facebook's settings language, then it might not be "Full Story".
I've checked with a French interface, and the script is working by adapting the below line:

def posts_selector(self):
    return '//a[contains(@href, "/story.php?") and (contains(text(), "Full Story") or contains(text(), "Actualité intégrale"))]'

obviously, this change would not scale with any langugage, so I'm not submitting a PR with this modification.
In a short term view, I would accept this as a limitation.

Do you have any idea for the long term support ?
Either we need to find a way to identify this part for all languages; or the identification should not rely on the displayed text.

waiting for selector "//input[@name="email"]"

Ban from FB ?


root@e492:~/facebook2rss# python3 facebook_rss --login -u k******[email protected] -p O*********U
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "facebook_rss/__main__.py", line 23, in <module>
    asyncio.run(login_and_get_cookies(args.email, args.password))
  File "/usr/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.8/dist-packages/facebook_rss/tasks/login.py", line 18, in login_and_get_cookies
    logged_in = await login_page.login(user, password)
  File "/usr/local/lib/python3.8/dist-packages/facebook_rss/browser/common/login.py", line 30, in login
    await self.page.fill(self.email, email)
  File "/usr/local/lib/python3.8/dist-packages/playwright/async_api/_generated.py", line 7414, in fill
    raise e
  File "/usr/local/lib/python3.8/dist-packages/playwright/async_api/_generated.py", line 7403, in fill
    await self._impl_obj.fill(
  File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_page.py", line 636, in fill
    return await self._main_frame.fill(**locals_to_params(locals()))
  File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_frame.py", line 414, in fill
    await self._channel.send("fill", locals_to_params(locals()))
  File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_connection.py", line 36, in send
    return await self.inner_send(method, params, False)
  File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_connection.py", line 47, in inner_send
    result = await callback.future
playwright._impl._api_types.TimeoutError: Timeout 30000ms exceeded.
=========================== logs ===========================
waiting for selector "//input[@name="email"]"
============================================================
Note: use DEBUG=pw:api environment variable and rerun to capture Playwright logs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.