Cappy-Large

Getting Started

Cappy is a pretrained small scorer designed to enhance the performance and efficiency of multi-task LLMs. Cappy takes in an instruction and a candidate response as input, and produces a score between 0 and 1, indicating an estimated correctness of the response with respect to the instruction. With merely 360 million parameters, Cappy functions either independently on classification tasks or serve as an auxiliary component for LLMs, boosting their performance. Also, Cappy enables efficiently integrating downstream supervision without requiring LLM finetuning nor the access to their parameters. Furthermore, Cappy is flexible to cooperate with other LLM adaptations, including finetuning and in-context learning, and prompt tuning, offering additional performance enhancement.

Uses

Cappy can be loaded either as a Jax/Flax model or a PyTorch model.

Jax/Flax

from transformers import AutoTokenizer, FlaxAutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('btan2/cappy-large')
cappy = FlaxAutoModelForSequenceClassification.from_pretrained('btan2/cappy-large')

instruction = """
What label best describes this news article?
Carlyle Looks Toward Commercial Aerospace (Reuters) Reuters - Private investment firm Carlyle Group,\which has a reputation for making well-timed and occasionally\controversial plays in the defense industry, has quietly placed\its bets on another part of the market.
"""
response = 'Business'

inputs = tokenizer([(instruction, response), ], return_tensors='pt')
score = cappy(**inputs).logits[0][0].item()

PyTorch

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained('btan2/cappy-large')
cappy = AutoModelForSequenceClassification.from_pretrained('btan2/cappy-large')

instruction = """
What label best describes this news article?
Carlyle Looks Toward Commercial Aerospace (Reuters) Reuters - Private investment firm Carlyle Group,\which has a reputation for making well-timed and occasionally\controversial plays in the defense industry, has quietly placed\its bets on another part of the market.
"""
response = 'Business'

inputs = tokenizer([(instruction, response), ], return_tensors='pt')
score = cappy(**inputs).logits[0][0].item()

Evaluation

We validate Cappy through an extensive suite of held-out tasks distinct from those incorporated in its pretraining. The overall performance is as shown in Fig. 1 and Fig. 2. Specifically, on 11 language understanding tasks drawn from PromptSource, Cappy, with 360 million parameters, outperforms OPT-IML-30B and OPT-175B significantly, and matches the best ones among previous multi-task LLMs. Besides, on 45 diverse complex tasks from BIG-Bench, Cappy consistently boosts the performance of the advanced multi-task LLM, FLAN-T5, by a large margin. Furthermore, Cappy offers additional performance enhancement when applied together with finetuning or in-context learning. Our subsequent ablation study proves the significance of our proposed pretraining and data augmentation strategies.

Software

Cappy's pretraining uses the code from this example in Red Coast, a lightweight toolkit for automating distributed training.

Citation

@inproceedings{
tan2023cappy,
title={Cappy: Outperforming and Boosting Large Multi-Task {LM}s with a Small Scorer},
author={Bowen Tan and Yun Zhu and Lijuan Liu and Eric Xing and Zhiting Hu and Jindong Chen},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=Srt1hhQgqa}
}

Downloads last month
44
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.