ThunderJaw
commited on
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- ganchengguang/resume_seven_class
|
4 |
+
language:
|
5 |
+
- hu
|
6 |
+
base_model:
|
7 |
+
- facebook/fasttext-hu-vectors
|
8 |
+
pipeline_tag: text-classification
|
9 |
+
---
|
10 |
+
|
11 |
+
# Model Card for Resume Section Classifier
|
12 |
+
|
13 |
+
This model is designed to classify sections within Hungarian resumes into categories such as Skills, Education, Experience, and others. It utilizes the `facebook/fasttext-hu-vectors` model as its base and has been fine-tuned on the `ganchengguang/resume_seven_class` dataset. The dataaset was in English so I translated it into Hungarian. It's not the best approach but it still works.
|
14 |
+
|
15 |
+
## Model Details
|
16 |
+
|
17 |
+
### Model Description
|
18 |
+
|
19 |
+
This model leverages the `facebook/fasttext-hu-vectors` pre-trained embeddings to classify Hungarian resume sections into predefined categories. It has been fine-tuned on the `ganchengguang/resume_seven_class` dataset, which includes seven categories: Experience, Education, Knowledge, Project, and others.
|
20 |
+
|
21 |
+
- **Model type:** Text Classification
|
22 |
+
- **Language(s):** Hungarian
|
23 |
+
- **Finetuned from model:** facebook/fasttext-hu-vectors
|
24 |
+
|
25 |
+
## Uses
|
26 |
+
|
27 |
+
### Direct Use
|
28 |
+
|
29 |
+
This model can be used directly to classify sections of Hungarian resumes into categories such as Skills, Education, Experience, and others. It is suitable for applications in recruitment and resume analysis.
|
30 |
+
|
31 |
+
### Downstream Use
|
32 |
+
|
33 |
+
The model can be integrated into larger systems for automated resume screening, assisting HR professionals in efficiently processing and categorizing resume information.
|
34 |
+
|
35 |
+
### Out-of-Scope Use
|
36 |
+
|
37 |
+
This model is not intended for use with resumes in languages other than Hungarian. It may not perform accurately on resumes with non-standard formats or those containing significant amounts of non-Hungarian text.
|
38 |
+
|
39 |
+
## Bias, Risks, and Limitations
|
40 |
+
|
41 |
+
The model has been trained on a specific dataset and may not generalize well to resumes with formats or content significantly different from those in the training data. Users should be aware of potential biases in the training data and the model's limitations in handling diverse resume formats.
|
42 |
+
|
43 |
+
### Recommendations
|
44 |
+
|
45 |
+
Users should validate the model's predictions and consider incorporating human oversight, especially when dealing with resumes that deviate from the standard formats present in the training data.
|
46 |
+
|
47 |
+
## How to Get Started with the Model
|
48 |
+
|
49 |
+
- https://github.com/ssobii2/Wozify-CV-Parser
|
50 |
+
- Check Fasttext Website
|
51 |
+
|
52 |
+
## Training Details
|
53 |
+
|
54 |
+
### Training Data
|
55 |
+
|
56 |
+
The model was fine-tuned on the `ganchengguang/resume_seven_class` dataset, which contains English resume sections labeled into seven categories: Experience, Education, Knowledge, Project, and others. I translated the dataset into Hungarian.
|
57 |
+
|
58 |
+
### Training Procedure
|
59 |
+
|
60 |
+
The model was fine-tuned using standard text classification procedures, adjusting hyperparameters to optimize performance on the resume classification task.
|
61 |
+
|
62 |
+
## Evaluation
|
63 |
+
|
64 |
+
### Testing Data, Factors & Metrics
|
65 |
+
|
66 |
+
The model's performance was evaluated on a held-out test set from the `ganchengguang/resume_seven_class` dataset, using accuracy and F1-score as evaluation metrics.
|
67 |
+
|
68 |
+
#### Metrics
|
69 |
+
|
70 |
+
- **Accuracy:** Measures the proportion of correctly classified sections.
|
71 |
+
- **F1-score:** Harmonic mean of precision and recall, providing a balance between the two.
|
72 |
+
|
73 |
+
## Environmental Impact
|
74 |
+
|
75 |
+
The training of this model was conducted on standard hardware, resulting in minimal carbon emissions. Users should consider the environmental impact of training large models and explore options for model distillation or quantization to reduce energy consumption.
|
76 |
+
|
77 |
+
## Technical Specifications
|
78 |
+
|
79 |
+
### Model Architecture and Objective
|
80 |
+
|
81 |
+
The model is based on the `facebook/fasttext-hu-vectors` architecture, fine-tuned for the task of classifying Hungarian resume sections into predefined categories.
|
82 |
+
|
83 |
+
### Compute Infrastructure
|
84 |
+
|
85 |
+
The model was trained my personal gaming laptop.
|
86 |
+
|
87 |
+
#### Hardware
|
88 |
+
|
89 |
+
- **GPU:** RTX 4070 Laptop GPU 8GB VRAM
|
90 |
+
- **CPI:** Intel Core-i7-13620H
|
91 |
+
- **RAM:** 16GB
|