datasets:
- ganchengguang/resume_seven_class
language:
- hu
base_model:
- facebook/fasttext-hu-vectors
pipeline_tag: text-classification
Model Card for Resume Section Classifier
This model is designed to classify sections within Hungarian resumes into categories such as Skills, Education, Experience, and others. It utilizes the facebook/fasttext-hu-vectors
model as its base and has been fine-tuned on the ganchengguang/resume_seven_class
dataset. The dataaset was in English so I translated it into Hungarian. It's not the best approach but it still works.
Model Details
Model Description
This model leverages the facebook/fasttext-hu-vectors
pre-trained embeddings to classify Hungarian resume sections into predefined categories. It has been fine-tuned on the ganchengguang/resume_seven_class
dataset, which includes seven categories: Experience, Education, Knowledge, Project, and others.
- Model type: Text Classification
- Language(s): Hungarian
- Finetuned from model: facebook/fasttext-hu-vectors
Uses
Direct Use
This model can be used directly to classify sections of Hungarian resumes into categories such as Skills, Education, Experience, and others. It is suitable for applications in recruitment and resume analysis.
Downstream Use
The model can be integrated into larger systems for automated resume screening, assisting HR professionals in efficiently processing and categorizing resume information.
Out-of-Scope Use
This model is not intended for use with resumes in languages other than Hungarian. It may not perform accurately on resumes with non-standard formats or those containing significant amounts of non-Hungarian text.
Bias, Risks, and Limitations
The model has been trained on a specific dataset and may not generalize well to resumes with formats or content significantly different from those in the training data. Users should be aware of potential biases in the training data and the model's limitations in handling diverse resume formats.
Recommendations
Users should validate the model's predictions and consider incorporating human oversight, especially when dealing with resumes that deviate from the standard formats present in the training data.
How to Get Started with the Model
- https://github.com/ssobii2/Wozify-CV-Parser
- Check Fasttext Website
Training Details
Training Data
The model was fine-tuned on the ganchengguang/resume_seven_class
dataset, which contains English resume sections labeled into seven categories: Experience, Education, Knowledge, Project, and others. I translated the dataset into Hungarian.
Training Procedure
The model was fine-tuned using standard text classification procedures, adjusting hyperparameters to optimize performance on the resume classification task.
Evaluation
Testing Data, Factors & Metrics
The model's performance was evaluated on a held-out test set from the ganchengguang/resume_seven_class
dataset, using accuracy and F1-score as evaluation metrics.
Metrics
- Accuracy: Measures the proportion of correctly classified sections.
- F1-score: Harmonic mean of precision and recall, providing a balance between the two.
Environmental Impact
The training of this model was conducted on standard hardware, resulting in minimal carbon emissions. Users should consider the environmental impact of training large models and explore options for model distillation or quantization to reduce energy consumption.
Technical Specifications
Model Architecture and Objective
The model is based on the facebook/fasttext-hu-vectors
architecture, fine-tuned for the task of classifying Hungarian resume sections into predefined categories.
Compute Infrastructure
The model was trained my personal gaming laptop.
Hardware
- GPU: RTX 4070 Laptop GPU 8GB VRAM
- CPI: Intel Core-i7-13620H
- RAM: 16GB