Coder Social home page Coder Social logo

luke-in-the-sky / telegram_digest Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.29 MB

Some Telegram chats are way too active for me to keep up with. Let's pull the last N days from these chats, summarize and send a digest, instead

License: GNU General Public License v3.0

Python 100.00%

telegram_digest's Introduction

telegram_digest

Some Telegram chats are way too active for me to keep up with. Let's pull the last N days from these chats, summarize and send a digest, instead

How to

Set your keys for anything that does not have a default value in AppConfig (config.py):

  1. TELEGRAM_BOT_TOKEN: str
  2. TELEGRAM_API_HASH: str
  3. TELEGRAM_API_ID: str
  4. TELEGRAM_SESSION_STRING: str
  5. POE_PB_TOKEN: str
  6. POE_CHAT_CODE: str

These can be placed in any of the following places:

  1. as environment variables (eg export), including as secrets that then get exposed as environment variables
  2. in a conf.env file

then do:

$ pip install -r requirements.txt
$ python telegram_digest/main.py

v1

V1 can take arbitrary-length input and uses a refine-summary strategy to summarize.

  1. Telegram setup: use individual credentials (not a bot), so we can get the full history
  2. llm: leverage Poe (so we can try different llms quickly)
  3. summarization: implemented a refine strategy
    1. splits input into batches, each having at most max_token tokens
    2. iteratively generates a summary (refine-style)
  4. config loading: use pydantic_settings.BaseSettings to import either from environment variables (eg github secrets) or from file

TODO

  1. summary quality. 2 issues
    1. no metric to measure quality of one summary vs another
    2. no stability: even the same input against the same poe bot will give different summaries
  2. experiment with different bots
  3. experiment with different thread representations
    1. add "reply to.." to identify replies
    2. represent the reply-chains in a more structured form (eg all replies in the same chain are collected and represented together, instead of interleaved in the main thread)
  4. interactive: host the bot on heroku / fly.io, so I can interact with it via Telegram

Code walkthough

  1. main.py is the entry point.
  2. telegram_bot.py handles creating of a Telegram client (TelegramBotBuilder), pulling history and sending messages (TelegramBot) and message-data munging (TelegramMessagesParsing)
  3. llm.py handles interfacing with Poe (sending messages, defining prompts) and has helpers for splitting the text into batches that fit into the context (TextBatcher)

Lessons learned

  1. Telegram interface
    1. telethon is what you want to use
    2. You can interface as your own user or as a bot.
      1. my account --> bot: I thought I wanted to do as myself, then I discovered the bots, which have a simpler api
      2. bot --> myself: then I discovered bots can only see the conversation once they are added to a thread, and even then they can see only the messages sent after they were added
      3. [?] myself --> bot: having a bot is nice because you can interact with it (eg passing different ocnfig arguments) and is more clear who is doing what, see
        1. https://medium.com/hyperskill/telegram-conversation-summarizer-bot-with-chatgpt-and-flask-quart-bb2e19884c
        2. https://github.com/yellalena/telegram-gpt-summarizer/blob/92ee101ba3b2633560e65049e8e14d4851a88bc1/main.py#L28
  2. Summarization
    1. strategies: langchain details 2 summarization strategies (stuff-it-all in the prompt, map-reduce or refine).
    2. metrics: it's unclear how to measure quality: if you have a reference summary you can measure similarity to the reference, but if you don't have a reference metrics might not be very reliable: https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00417/107833/A-Statistical-Analysis-of-Summarization-Evaluation
  3. pydantic_settings.BaseSettings is very useful for loading config from environment variables and files.

telegram_digest's People

Contributors

luke-in-the-sky avatar sweep-ai[bot] avatar

Watchers

 avatar

telegram_digest's Issues

Sweep: tests for llm.py

Details

Unit tests
Write unit tests for the file telegram_digest/llm.py . Use the pytest framework. Your code should go in the folder telegra_digest/tests , create one if it does not exist. Make sure to test-run the new unit tests.

Sweep: add pydantic model for message

Details

Add the following code to the telegram bot file

from pydantic import BaseModel

class Message(BaseModel):
    sender_name: str
    media: str | None
    text: str | None

    @classmethod
    def from_telethon_message(cls, message):
        # Extract sender's name
        sender_name = f'[{message.sender.first_name}]'

        # Determine the type of media attached
        if message.media:
            # Get the class name of the media
            media_class_name = message.media.__class__.__name__
            # Extract the type of media and remove 'MessageMedia'
            media_type = media_class_name.split('.')[-1].replace('MessageMedia', '').strip()
            media_type = f'<{media_type}>'
        else:
            media_type = None

        # Extract message text
        message_text = message.message
        if len(message_text) < 1:
            message_text = None

        return cls(sender_name=sender_name, media=media_type, text=message_text)

    def to_str(self):
        # Format the output
        return ' '.join(x for x in [self.sender_name, self.media, self.text] if x is not None)

Then edit the message parser class so that it uses this new Message model class to parse Telegram messages

Sweep: add reply rendering

Details

Features: Add a method to the TelegramMessagesParsing class that takes a msg as input (formatted as a json) and returns a text. The text being returned is created by concatenating

  1. If msg.reply_to_message is non null:
    A. The text "Reply To"
    B. The text in msg.reply_to_message.from
    C. The text in msg.reply_to_message.text, up to 100 characters, in triple quotes, if there is a reply value, otherwise empty string.
  2. The text in msg.text
Checklist
  • Modify telegram_digest/telegram_bot.pyf1cb130 Edit
  • Running GitHub Actions for telegram_digest/telegram_bot.pyEdit

Sweep: handle None in sender

Details

File "/home/runner/work/telegram_digest/telegram_digest/telegram_digest/telegram_bot.py", line 169, in from_sender_id_to_name
> name = entity.first_name or entity.username

We need to handle the case where entity is None: if so, just do name = "<NoName>"

Checklist
  • Modify telegram_digest/telegram_bot.py7f8e825 Edit
  • Running GitHub Actions for telegram_digest/telegram_bot.pyEdit

Render non-msg content

Some content it does not have text but instead is a reference to things like Polls. At the moment we just disregard these messages with no text, but instead we should run through them for instance by using the title of the object, like the title of the Poll, so that the summarizer can understand what is being referenced and what people are interacting with

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.