Coder Social home page Coder Social logo

Comments (4)

Cephian avatar Cephian commented on August 29, 2024

@anoadragon453 Let me know what you think of this when you get the chance

from busty.

anoadragon453 avatar anoadragon453 commented on August 29, 2024

I think at this point we're starting to exhaust the amount of information we can easily encode in a filename, and we may want to start looking at more structured data on disk in order to make this easier on ourselves.

What about adding a bust.json file, containing something like the following:

{
  "attachments": {
    "<message snowflake>.<attachment id>": {
      "filename": "<song filename>",
      "busts_ago": 0
    },
    ...
}

This could also exist as an in-memory dictionary, but this way the data persists across restarts.

Filenames would still be the original filenames from Discord, but we could just truncate them if >255 chars.

Every time a song is queued for download, create a new entry in "attachments". If a matching entry already exists, reset busts_ago to zero and load the associated file. For cleaning up old attachments, every time a bust finishes (or the bot is exiting), we increment the "busts_ago" value for every attachment. If busts_ago for an attachment is >X, delete it from the JSON and delete the corresponding file from disk.

If a user deletes and re-uploads their track between two !lists (thus generating a new snowflake), then I think we just take the hit and re-download the song. They probably changed the file anyways.

While this does involve adding a new file to manage, I think it's simpler to reason about.

from busty.

Cephian avatar Cephian commented on August 29, 2024

The reason I suggested using <message_id>.<attachment_id>.<file_ext> was not to encode information really but rather to serve as a simple function unique file on discord --> unique filename on disk.

Filenames would still be the original filenames from Discord, but we could just truncate them if >255 chars.

Maybe this is a nitpick but remember that we might have to cache two files w the same name, which is why I think naming files by their unique discord IDs is simpler.

An external database definitely seems like the best way to implement it if we do the "busts ago" culling, but I'm not sure I like that the best. For ex, it seems weird that re-listing the same cached files on the same channel twice in a row without actually downloading anything new to the cache might cause a purge, especially given how often i repeatedly !list the same channel in a row when testing the bot.

When I think it over again, both the modified date culling and the json database culling sound somewhat complicated for busty's actually simple use case, where there is just one channel per two weeks we ever want to !list on. I propose that our first (and maybe also last, if it's good enough) implementation is just the following:

  1. save files as <message_id>.<attachment_id>.<file_ext>
  2. if the file already exists, don't download it again
  3. just delete all files in the attachments folder which !list didn't either download or pull the cached version of.

This is equivalent to X = 1 in your proposal. Considering in the real server that we basically only ever want to run !list on the current submissions channel, all this fancy stuff might just be overkill. And even this simple form of caching will help speed up testing wait times a lot.

from busty.

anoadragon453 avatar anoadragon453 commented on August 29, 2024

Maybe this is a nitpick but remember that we might have to cache two files w the same name, which is why I think naming files by their unique discord IDs is simpler.

Ah yeah indeed. You would end up with naming collisions. <message_id>.<attachment_id>.<file_ext> sounds good then - if we did ever want a "bust export" feature than we could just pull the names for each file from memory.

For ex, it seems weird that re-listing the same cached files on the same channel twice in a row without actually downloading anything new to the cache might cause a purge, especially given how often i repeatedly !list the same channel in a row when testing the bot.

Note that we'd reset the busts_ago to 0 if we end up finding a song in the cache, in which case I don't !listing the same channel repeatedly would cause a purge of its files.

Regardless though, I agree that we don't need anything very fancy right now.

I propose that our first (and maybe also last, if it's good enough) implementation is just the following:

  1. save files as <message_id>.<attachment_id>.<file_ext>
  2. if the file already exists, don't download it again
  3. just delete all files in the attachments folder which !list didn't either download or pull the cached version of.

sgtm

from busty.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.