--- base_model: - google-bert/bert-base-uncased pipeline_tag: text-classification tags: - text-classification - resume-classification - fine-tuning - python - pytensors - kaggle --- # Model Card: Resume Classification Using BERT ## Model Overview This model is a fine-tuned version of **`bert-base-uncased`** designed for multiclass classification. It categorizes resumes into one of **24 predefined job categories**, making it suitable for automated resume screening and classification tasks. --- ## Dataset The dataset used for fine-tuning consists of **2400+ resumes** in string and PDF formats. These resumes are categorized into 24 job categories. The dataset is available at https://www.kaggle.com/competitions/jarvis-calling-hiring-contest/data - **Classes**: `['ACCOUNTANT', 'ADVOCATE', 'AGRICULTURE', 'APPAREL', 'ARTS', 'AUTOMOBILE', 'AVIATION', 'BANKING', 'BPO', 'BUSINESS-DEVELOPMENT', 'CHEF', 'CONSTRUCTION', 'CONSULTANT', 'DESIGNER', 'DIGITAL-MEDIA', 'ENGINEERING', 'FINANCE', 'FITNESS', 'HEALTHCARE', 'HR', 'INFORMATION-TECHNOLOGY', 'PUBLIC-RELATIONS', 'SALES', 'TEACHER']` The dataset underwent significant preprocessing to remove noise and improve text quality for tokenization. **Preprocessing steps include**: - Removal of HTML tags, URLs, punctuation, unicode characters, escape sequences, stop words, and irrelevant white spaces. - All the functions available in preprocessing.py --- ## Model Configuration - **Base Model**: `bert-base-uncased` - **Fine-tuning Task**: Multiclass classification (24 classes) - **Preprocessing Summary**: The preprocessing steps applied to the training data have been encapsulated in the `preprocess_function` to simplify and standardize usage. - **Model Output**: The raw output consists of logits for each class. To obtain probabilities, you can apply the sigmoid activation function using torch.nn.Sigmoid(). - **Postprocessing**: A postprocessing utility, included as the postprocess_function, converts the raw logits into the corresponding classified class names in text format for easier interpretation. --- ## Training Details The fine-tuning process involved: - Input tokenization using `bert-base-uncased` tokenizer. - Feeding preprocessed text into the BERT model for contextual understanding. - Output logits normalized using the **sigmoid activation function** to produce probabilities for each class. - The entire training code is available in kaggle: https://www.kaggle.com/code/naandhu/bert-base-uncased-fine-tuned-for-classification --- ## Model Output The model provides raw output logits for each job category. These logits can be converted into probabilities using: ```python import torch.nn as nn sigmoid = nn.Sigmoid() probs = sigmoid(logits) ``` The highest probability corresponds to the predicted job category. --- ## Use Cases - Automated resume classification for HR platforms. - Sorting resumes into industry-specific categories for targeted hiring processes. - Candidate profiling and analysis for recruitment agencies. --- ## Limitations - Model performance is reliant on the quality and diversity of the dataset. Biases in the dataset may affect predictions. - Preprocessing removes non-textual elements, which might strip out context-critical features. - PDFs with poor formatting or heavy graphical content may not preprocess effectively. --- ## Citation If you use this model in your work, please cite: **"Resume Classification Model using BERT for Multiclass Job Categorization."**