Model Card for Model ID
Model Details
Model Description
This is the model card of a π€ transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: philipp-zettl
- Model type: Seq2Seq
- Language(s) (NLP):
- License: Apache 2.0
- Finetuned from model: google/flan-t5-small
Uses
Direct Use
[More Information Needed]
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("philipp-zettl/t5-small-tydiqa-en")
model = AutoModelForSeq2SeqLM.from_pretrained("philipp-zettl/t5-small-tydiqa-en")
question = "Some question?"
# For instance retrieved using similarity search
context = "A long context ..."
inputs = [f"question: {q} context: {c}" for q, c in [[question, context]]]
model_inputs = tokenizer(inputs, max_length=512, padding=True, truncation=True)
input_ids = torch.tensor(model_inputs['input_ids']).to(device)
attention_mask = torch.tensor(model_inputs['attention_mask']).to(device)
with torch.no_grad():
sample_output = model.generate(input_ids[:1], max_length=100)
sample_output_text = tokenizer.decode(sample_output[0], skip_special_tokens=True)
input_text = tokenizer.decode(input_ids[0], skip_special_tokens=True)
print(f"Sample Input", input_text)
print(f"Sample Output", sample_output_text)
Training Details
Training Data
Trained on the english samples of google-research-datasets/tydiqa using following code
from datasets import load_dataset
# Load SQuAD dataset
squad_dataset = load_dataset('google-research-datasets/tydiqa', 'secondary_task')
# Split the dataset into training and validation
train_dataset = squad_dataset['train'].filter(lambda e: any([e['id'].startswith(lang) for lang in ['english']]))
validation_dataset = squad_dataset['validation'].filter(lambda e: any([e['id'].startswith(lang) for lang in ['english']]))
Training Procedure
Preprocessing
Code for preprocessing
def preprocess_batch(batch, tokenizer, max_input_length=512, max_output_length=128):
questions = batch['question']
contexts = batch['context']
answers = [answer['text'][0] for answer in batch['answers']]
inputs = [f"question: {q} context: {c}" for q, c in zip(questions, contexts)]
model_inputs = tokenizer(inputs, max_length=max_input_length, padding=True, truncation=True)
labels = tokenizer(answers, max_length=max_output_length, padding=True, truncation=True)
model_inputs['labels'] = labels['input_ids']
return model_inputs
# Tokenize the dataset
train_dataset = train_dataset.map(lambda batch: preprocess_batch(batch, teacher_tokenizer), batched=True)
validation_dataset = validation_dataset.map(lambda batch: preprocess_batch(batch, teacher_tokenizer), batched=True)
# Set format for PyTorch
train_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])
validation_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])
Training Hyperparameters
Code of training loop:
from tqdm import tqdm
from transformers import AdamW, DataCollatorForSeq2Seq
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
torch.cuda.empty_cache()
teacher_model.to(device)
# Training parameters
epochs = 3
learning_rate = 5e-5
temperature = 2.0
batch_size = 2
optimizer = torch.optim.AdamW(teacher_model.parameters(), lr=learning_rate)
# Create a data collator for padding and batching
data_collator = DataCollatorForSeq2Seq(tokenizer=teacher_tokenizer, model=teacher_model)
# Create DataLoaders with the data collator
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, collate_fn=data_collator)
validation_dataloader = DataLoader(validation_dataset, batch_size=batch_size, collate_fn=data_collator)
writer = SummaryWriter('./logs', comment='t5-base')
print("Starting training...")
# Training loop
for epoch in range(epochs):
teacher_model.train()
total_loss = 0
print(f"Epoch {epoch+1}/{epochs}")
progress_bar = tqdm(train_dataloader, desc="Training", leave=False)
for step, batch in enumerate(progress_bar):
# Move student inputs to GPU
input_ids = batch['input_ids'].to(device)
attention_mask = batch['attention_mask'].to(device)
labels = batch['labels'].to(device)
# Teacher forward pass on CPU
teacher_outputs = teacher_model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
teacher_logits = teacher_outputs.logits
# Calculate losses
loss = teacher_outputs.loss # Cross-entropy loss
writer.add_scalar("Loss/train", loss, step)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
# Verbose logging
if step % 1 == 0 or step == len(train_dataloader) - 1:
progress_bar.set_postfix({
'step': step,
'loss': loss.item(),
})
# Generate a sample output from the student model
teacher_model.eval()
with torch.no_grad():
sample_output = teacher_model.generate(input_ids[:1], max_length=50)
sample_output_text = teacher_tokenizer.decode(sample_output[0], skip_special_tokens=True)
input_text = teacher_tokenizer.decode(input_ids[0], skip_special_tokens=True)
writer.add_text(f"Sample Input", input_text, step)
writer.add_text(f"Sample Output", sample_output_text, step)
teacher_model.train()
avg_loss = total_loss / len(train_dataloader)
print(f"Epoch {epoch+1} completed. Average Loss: {avg_loss:.4f}")
writer.add_scalar("AVG Loss/train", avg_loss, epoch)
print("Training complete.")
writer.close()
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
[More Information Needed]
Hardware
[More Information Needed]
Software
[More Information Needed]
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
- Downloads last month
- 20
Model tree for philipp-zettl/t5-small-tydiqa-en
Base model
google/flan-t5-small