โ
โโ
โโโโModel: CORTEX-334M
โโโโLang: EN
โโ
โ
[ PROTOTYPE ]
Model description
Description: Instruction-based encoder-only LLM, designed for multi-task and zero-shot inference in discriminative scenarios
Base model: Electra-Large
Number of parameters: 334M
License: CC-BY-NC-SA (no commercial usage is allowed)
Warning โ ๏ธ: This is an early-stage prototype, which can exhibit unstable behaviors and results. The access is currently limited to selected users for testing purposes
Technology overview
The main idea of the CORTEX technology is to create an effective, easy-to-use and easy-to-train prompt-based LLM with encoder-only architecture, capable of tackling several different discriminative tasks with good zero-shot capabilities.
The model leverages a unified task format based on a token-classification setup, and therefore a single loss-function to optimize.
Compared to prompt-based generative LLMs, the CORTEX technology doesn't suffer from hallucinations and is more compact and efficient, since it doesn't need to generate text, which requires lots of extra parameters and several inference steps to produce an output.
Moreover, it can provide confidence scores for its predictions and deliver results in structured format without any kind of output parsing.
The CORTEX-334M prototype can be prompted to perform the following tasks
- Text Classification
- Natural Language Inference
- Entity Recognition
- Boolean Question Answering
- Extractive Question Answering
- Text Similarity
- Ranking / Retrieval
- Sentiment Analysis
Training strategy
The model has been obtained using Electra-Large as a starting point, and trained for 1 epoch with a constant learning rate of 1e-5, on ~300.000 triplets (question, context, answer) sampled from benchmark datasets for classic discriminative tasks, like Natural Language inference, Named Entity Recognition, Boolean Question Answering, Extractive Question Answering, Sentiment Analysis. In particular, the MNLI, WikiNER, BoolQ, SQuAD v1 and MTEB Tweets datasets have been used as sources of examples.
The different tasks have all been casted into a unified token-classification setup, so the model has been trained on the different objectives using a single classification head and loss function, avoiding the complications related to the combination of different task-specific classifiers and losses (like the introduction of loss-weighting, gradient modulation or task scheduling).
Using this framework, each token can be classified into 3 possible categories (0: neutral, 1: positive, 2: negative), with some constraints depending on the nature of the token (e.g. whether it's a [CLS] token, prompt token or context token)
The categorical cross-entropy loss has been used for all the training examples, with two different computation methods: in text classification tasks (Natural Language Inference, Boolean Question Answering or Sentiment Analysis) only the [CLS] token contributes to the loss, while in information extraction tasks (Named Entity Recognition, Extractive Question Answering) the loss is computed considering all tokens.
Limitations
This LLM is an early-stage technology and may exhibit unstable or unreliable behaviours. The prototype is only meant for experimentation and research and its results should be used with caution.
- Downloads last month
- 3