Collections
Discover the best community collections!
Collections including paper arxiv:2402.10524
-
A Survey on Evaluation of Large Language Models
Paper • 2307.03109 • Published • 42 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25 -
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Paper • 2402.10524 • Published • 21
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 16 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 10 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 64
-
PALO: A Polyglot Large Multimodal Model for 5B People
Paper • 2402.14818 • Published • 23 -
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 111 -
User-LLM: Efficient LLM Contextualization with User Embeddings
Paper • 2402.13598 • Published • 18 -
Coercing LLMs to do and reveal (almost) anything
Paper • 2402.14020 • Published • 12
-
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 10 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 16 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 24 -
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 46
-
Large Language Model Alignment: A Survey
Paper • 2309.15025 • Published • 2 -
Aligning Large Language Models with Human: A Survey
Paper • 2307.12966 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 48 -
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
Paper • 2310.05344 • Published • 1
-
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 40 -
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Paper • 2402.10524 • Published • 21 -
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
Paper • 2410.16256 • Published • 58
-
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
Paper • 2401.09603 • Published • 15 -
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Paper • 2402.10524 • Published • 21 -
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Paper • 2402.14261 • Published • 10 -
RewardBench: Evaluating Reward Models for Language Modeling
Paper • 2403.13787 • Published • 21