Coder Social home page Coder Social logo

bjoernpl / distilabel Goto Github PK

View Code? Open in Web Editor NEW

This project forked from argilla-io/distilabel

0.0 0.0 0.0 39.95 MB

Distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency

Home Page: https://distilabel.argilla.io

License: Apache License 2.0

Python 98.20% Makefile 0.04% Jinja 1.76%

distilabel's Introduction

Björn Plüster - @bjoernpl

🤗HuggingFace - bjoernpl Discord

Open-source enthusiast, LLM expert, co-founder and CTO of ellamind, and co-founder of DiscoResearch, our open-source research and development community. Come chat with me in our Discord!

Recent Projects

  • LeoLM: German LLM: I used large-scale continued pretraining to transfer the English-language capablities of Llama-2 to German. Together with LAION and Hessian.AI we released LeoLM: Linguistically Enhanced Open Language Model at different model scales. Check out our Blog post for more info: https://laion.ai/blog/leo-lm/
  • Vision-Language Explanations: Transformer explainability is lacking but they are great at producing text. Why not have it explain it's own decisions? A large research project investigating natural language explanations for multimodal transformer applications. Currently under review. Arxiv preprint: https://arxiv.org/abs/2212.04231
  • KOSMOS-1 Reimplimentation: The KOSMOS-1 paper (multimodal foundation model) was super interesting to me at the time but no code to be found anywhere. This is a very rudimentary reimplementation of the core aspects.
  • Tagesschau: Simple scrape of Tagesschau news articles.

Older Projects

In my repositories you'll find some projects:

  • DiscoveredWeekly contains the source code for my website discoveredweekly.com where users can log in with their Spotify account and every monday their new Discover Weekly playlist will get copied automatically, making sure no valuable song suggestions are ever lost.
  • AutoObjectRemoval is a combination of Instance Segmentation using Detectron2, and Flow-Guided Video Completion to create a system which can automatically mask and remove objects from videos.
  • VideoSilenceRemover is a tool for automatically cutting segments of silence out of a video. Created this tool for a friend to facilitate the boring parts of his job.
  • DirectoryStats is a python CLI for efficiently counting large amounts of files and subdirectories. Needed this to keep track of directory size during creation of the dataset for my thesis project.
  • PaypalTransactionVisualizer is a Jupyter notebook which shows you some interesting infos about your past spending with PayPal. This is a project I implemented mostly to gain some insight on my own spending habits but also to practice using Jupyter and some interesting python features.
  • YoutubeHistoryVisualizer is a notebook along a similar line which shows you some stats regarding the Youtube videos you've used in the past. It works with data from Google Takeout.
  • ColorFlow is an Android game written in Java, which was a cool side project. The repo is not well maintained and used primarily as my own VCS. Check out the game in the Play Store.

Publications

See my IEEE author profile for an updated list of publications.

  • B. Plüster, C. Weber, L. Qu and S. Wermter, "Hearing Faces: Target Speaker Text-to-Speech Synthesis from a Face," 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021, pp. 757-764, doi: 10.1109/ASRU51503.2021.9687866.

Contact me

Best way to reach me is via e-mail [email protected].

distilabel's People

Contributors

alvarobartt avatar bramvanroy avatar burtenshaw avatar davanstrien avatar davidberenstein1957 avatar dependabot[bot] avatar dvsrepo avatar edbeeching avatar gabrielmbmb avatar ignacioct avatar jphme avatar philschmid avatar plaguss avatar sdiazlor avatar strickvl avatar wauplin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.