Tutorial to use Spaces, Datasets and Models in the Hugging Face platform

#1
by louiecerv - opened

Here is a detailed tutorial on building an intelligent system using Streamlit and Hugging Face. This tutorial will guide computer science students through the process of:

  1. Creating a dataset on Hugging Face's datasets library.
  2. Training a model using Hugging Face's transformers library.
  3. Deploying the model using Streamlit in a Hugging Face Space.

We will use a sentiment analysis task, a fundamental NLP problem, as the example.


Building an Intelligent System with Streamlit & Hugging Face

Prerequisites

Ensure that you have the following installed:

  • Python (>=3.8)
  • transformers, datasets, torch, streamlit, huggingface_hub
pip install transformers datasets torch streamlit huggingface_hub

Step 1: Creating a Custom Dataset on Hugging Face

1.1. Collect Data

We will create a simple sentiment classification dataset with positive and negative movie reviews. A small sample is below:

[
  {"text": "I loved this movie! It was fantastic!", "label": 1},
  {"text": "Terrible film. Would not recommend.", "label": 0}
]

1.2. Upload Dataset to Hugging Face

1.3. Use Python to Upload the Dataset

Create dataset.py to upload the dataset:

from datasets import Dataset, DatasetDict
from huggingface_hub import HfApi

# Create the dataset
data = [
    {"text": "I loved this movie! It was fantastic!", "label": 1},
    {"text": "Terrible film. Would not recommend.", "label": 0},
    {"text": "Amazing cinematography, but the plot was weak.", "label": 1},
    {"text": "I fell asleep halfway through. Very boring.", "label": 0}
]

dataset = Dataset.from_list(data)

# Push dataset to Hugging Face
dataset.push_to_hub("your-username/sentiment-analysis-dataset")

Step 2: Training a Sentiment Analysis Model

2.1. Load the Dataset

Create a script train_model.py:

from datasets import load_dataset
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
from transformers import AutoTokenizer
import torch

# Load the dataset
dataset = load_dataset("your-username/sentiment-analysis-dataset")

# Load tokenizer
model_checkpoint = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

# Tokenize function
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Prepare dataset for training
train_dataset = tokenized_datasets["train"]

# Load model
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=2)

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    save_strategy="epoch",
    push_to_hub=True,
    hub_model_id="your-username/sentiment-analysis-model"
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset
)

# Train and save model
trainer.train()
trainer.push_to_hub()

Step 3: Deploying the Model with Streamlit on Hugging Face Spaces

3.1. Create a Streamlit Web App

Create a file app.py:

import streamlit as st
from transformers import pipeline

# Load the model
model_name = "your-username/sentiment-analysis-model"
classifier = pipeline("text-classification", model=model_name)

# Streamlit UI
st.title("Sentiment Analysis App")
st.write("Enter a movie review and get its sentiment.")

user_input = st.text_area("Enter review:")

if st.button("Analyze"):
    if user_input:
        prediction = classifier(user_input)
        label = prediction[0]['label']
        confidence = prediction[0]['score']
        
        st.write(f"### Sentiment: {label}")
        st.write(f"Confidence: {confidence:.2f}")
    else:
        st.warning("Please enter a review.")

Step 4: Deploy on Hugging Face Spaces

4.1. Create a New Space

4.2. Upload Files

Use Git to upload:

git clone https://huggingface.co./spaces/your-username/sentiment-analysis-app
cd sentiment-analysis-app
mv ../app.py .
echo "streamlit" > requirements.txt
git add .
git commit -m "Initial commit"
git push

Your app will be live on Hugging Face Spaces!


Conclusion

This tutorial guided you through:
βœ… Creating a dataset on Hugging Face
βœ… Training a model with transformers
βœ… Deploying an interactive web app with Streamlit

This project introduces students to practical NLP, model deployment, and cloud AI services, preparing them for real-world AI applications. πŸš€

Sign up or log in to comment