Topic: llm-evaluation Goto Github
Some thing interesting about llm-evaluation
Some thing interesting about llm-evaluation
llm-evaluation,Upload, score, and visually compare multiple LLM-graded summaries simultaneously!
User: adamcoscia
Home Page: https://arxiv.org/abs/2403.04760
llm-evaluation,The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
Organization: agenta-ai
Home Page: http://www.agenta.ai
llm-evaluation,Template for an AI application that extracts the job information from a job description using openAI functions and langchain
Organization: agenta-ai
Home Page: https://agenta.ai
llm-evaluation,Evaluating LLMs with CommonGen-Lite
Organization: allenai
Home Page: https://inklab.usc.edu/CommonGen/
llm-evaluation,A collection of hand on notebook for LLMs practitioner
User: antoniogr7
llm-evaluation,FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package builds upon the framework provided by the original FactScore repository, which is no longer maintained and contains outdated functions.
User: armingh2000
llm-evaluation,Python SDK for running evaluations on LLM generated responses
Organization: athina-ai
Home Page: https://docs.athina.ai
llm-evaluation,FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts
Organization: aws-samples
llm-evaluation,It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.
User: azminewasi
llm-evaluation,Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
Organization: babelscape
Home Page: https://arxiv.org/abs/2404.08676
llm-evaluation,Cookbooks and tutorials on Literal AI
Organization: chainlit
Home Page: https://cloud.getliteral.ai/
llm-evaluation,The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“
User: chanliang
Home Page: https://arxiv.org/abs/2310.07289
llm-evaluation,The LLM Evaluation Framework
Organization: confident-ai
Home Page: https://docs.confident-ai.com/
llm-evaluation,For the purposes of familiarization and learning. Consists of utilizing LangChain framework, LangSmith for tracing, OpenAI LLM models, Pinecone serverless vectorDB using Jupyter Notebook and Python.
User: davidgir
llm-evaluation,Visualize LLM Evaluations for OpenAI Assistants
User: euskoog
Home Page: https://openai-assistants-evals-dash.vercel.app/
llm-evaluation,Link your OpenAI Assistants to a custom store + Evaluate Assistant responses
User: euskoog
llm-evaluation,Large Model Evaluation Experiments
Organization: evaluation-tools
llm-evaluation,Exploring the depths of LLMs 🚀
User: giacomomeloni
llm-evaluation,🐢 Open-Source Evaluation & Testing framework for LLMs and ML models
Organization: giskard-ai
Home Page: https://docs.giskard.ai
llm-evaluation,Awesome papers involving LLMs in Social Science.
User: henry-yeh
llm-evaluation,DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
Organization: intuit-ai-research
llm-evaluation,[Personalize@EACL 2024] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models.
User: ivarfresh
llm-evaluation,A prompt collection for testing and evaluation of LLMs.
User: kwinkunks
llm-evaluation,A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.
Organization: llm-evaluation-s-always-fatiguing
llm-evaluation,Code and data for ACL ARR 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"
Organization: minnesotanlp
Home Page: https://minnesotanlp.github.io/cobbler-project-page/
llm-evaluation,Code for the paper Prediction-Powered Ranking of Large Language Models, Arxiv 2024.
Organization: networks-learning
llm-evaluation,Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Organization: parea-ai
Home Page: https://docs.parea.ai/sdk/python
llm-evaluation,TypeScript SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Organization: parea-ai
Home Page: https://docs.parea.ai/sdk/typescript
llm-evaluation,Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models with CI/CD integration.
Organization: promptfoo
Home Page: https://www.promptfoo.dev/
llm-evaluation,Framework for LLM evaluation, guardrails and security
Organization: raga-ai-hub
Home Page: https://www.raga.ai/llms
llm-evaluation,A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Organization: re-align
Home Page: https://allenai.github.io/re-align/
llm-evaluation,Open-Source Evaluation for GenAI Application Pipelines
Organization: relari-ai
Home Page: https://docs.relari.ai/
llm-evaluation,This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".
User: rochitasundar
Home Page: https://www.coursera.org/account/accomplishments/certificate/8JAYVEUAQF56
llm-evaluation,Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.
Organization: rungalileo
Home Page: https://www.rungalileo.io/hallucinationindex
llm-evaluation,
User: sharathhebbar
llm-evaluation,EnsembleX utilizes the Knapsack algorithm to optimize Large Language Model (LLM) ensembles for quality-cost trade-offs, offering tailored suggestions across various domains through a Streamlit dashboard visualization.
User: vidhyavarshanyjs
Home Page: https://ensemblex.streamlit.app
llm-evaluation,Calibration game is a game to get better at identifying hallucination in LLMs.
User: viktour19
Home Page: https://calibrationgame.vercel.app
llm-evaluation,Superpipe - optimized LLM pipelines for structured data
Organization: villagecomputing
Home Page: https://superpipe.ai
llm-evaluation,[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
Organization: vita-group
Home Page: https://arxiv.org/abs/2310.01382
llm-evaluation,Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements
Organization: yandex-research
Home Page: https://arxiv.org/abs/2401.06766
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.