ThunderJaw commited on
Commit
ef126c6
·
verified ·
1 Parent(s): 2b8ad8c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +140 -1
README.md CHANGED
@@ -6,4 +6,143 @@ language:
6
  base_model:
7
  - spacy/en_core_web_md
8
  pipeline_tag: text-classification
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  base_model:
7
  - spacy/en_core_web_md
8
  pipeline_tag: text-classification
9
+ ---
10
+
11
+ # Model Card for en_textcat_resume_sections
12
+
13
+ This model is designed to classify sections within English-language resumes, including labels such as Skills, Education, Experience, and others.
14
+
15
+ ## Model Details
16
+
17
+ ### Model Description
18
+
19
+ This model utilizes spaCy's text classification component to categorize sections of resumes into predefined labels. It is trained on the `ganchengguang/resume_seven_class` dataset, which contains examples of various resume sections.
20
+
21
+ - **Model type:** Text Classification
22
+ - **Language(s) (NLP):** English
23
+ - **Finetuned from model:** spacy/en_core_web_md
24
+
25
+ ## Uses
26
+
27
+ ### Direct Use
28
+
29
+ This model can be used to automatically classify sections within English-language resumes, facilitating the extraction of structured information from unstructured resume text. It can only classify Skills, Education, Experience, Profile and Summary successfully for now.
30
+
31
+ ### Downstream Use
32
+
33
+ This model can serve as a component in larger systems for resume parsing, candidate screening, or any application requiring the identification of specific sections within resumes.
34
+
35
+ ### Out-of-Scope Use
36
+
37
+ This model is not designed for tasks outside of resume section classification, such as general text classification or Named Entity Recognition (NER) in non-resume texts.
38
+
39
+ ## Bias, Risks, and Limitations
40
+
41
+ The model's performance is dependent on the quality and diversity of the training data. It may not perform well on resumes that differ significantly from the training examples. Additionally, the model may have biases based on the dataset it was trained on.
42
+
43
+ ### Recommendations
44
+
45
+ Users should be aware of the model's limitations and biases. It is recommended to evaluate the model's performance on a diverse set of resumes before deploying it in production environments.
46
+
47
+ ## How to Get Started with the Model
48
+
49
+ [Instructions on how to use the model, including installation and usage examples.]
50
+
51
+ ## Training Details
52
+
53
+ ### Training Data
54
+
55
+ The model was trained on the `ganchengguang/resume_seven_class` dataset, which contains examples of various resume sections.
56
+
57
+ ### Training Procedure
58
+
59
+ The model was fine-tuned using spaCy's text classification component. The training involved the following steps:
60
+
61
+ 1. Data preprocessing: Tokenization and vectorization of resume text.
62
+ 2. Model training: Fine-tuning the `spacy/en_core_web_md` model on the preprocessed data.
63
+ 3. Evaluation: Assessing the model's performance on a validation set.
64
+
65
+ #### Preprocessing
66
+
67
+ The text data was cleaned by removing special characters, normalizing whitespace, and converting text to lowercase. Tokenization was performed using spaCy's tokenizer.
68
+
69
+ ## Evaluation
70
+
71
+ ### Testing Data, Factors & Metrics
72
+
73
+ #### Testing Data
74
+
75
+ The model was evaluated on a separate test set from the `ganchengguang/resume_seven_class` dataset, containing examples of resume sections not seen during training.
76
+
77
+ #### Factors
78
+
79
+ The evaluation considered factors such as resume length, formatting, and the presence of uncommon sections.
80
+
81
+ #### Metrics
82
+
83
+ The model's performance was measured using accuracy, precision, recall, and F1-score.
84
+
85
+ ### Results
86
+
87
+ The model achieved the following results on the test set:
88
+
89
+ ## Text Categorization Model Performance Metrics
90
+
91
+ ### Summary Section
92
+ - **Precision:** 88.4%
93
+ - **Recall:** 89.8%
94
+ - **F1-score:** 89.1%
95
+
96
+ ### Profile Section
97
+ - **Precision:** 95.2%
98
+ - **Recall:** 88.3%
99
+ - **F1-score:** 91.6%
100
+
101
+ ### Education Section
102
+ - **Precision:** 93.2%
103
+ - **Recall:** 90.5%
104
+ - **F1-score:** 91.9%
105
+
106
+ ### Experience Section
107
+ - **Precision:** 78.8%
108
+ - **Recall:** 82.5%
109
+ - **F1-score:** 80.6%
110
+
111
+ ### Skills Section
112
+ - **Precision:** 88.5%
113
+ - **Recall:** 88.5%
114
+ - **F1-score:** 88.5%
115
+
116
+ ### Overall Model Performance
117
+ - **Micro Precision:** 88.3%
118
+ - **Micro Recall:** 87.7%
119
+ - **Micro F1-score:** 88.0%
120
+ - **Macro Precision:** 88.8%
121
+ - **Macro Recall:** 87.9%
122
+ - **Macro F1-score:** 88.3%
123
+ - **Macro AUC:** 97.8%
124
+
125
+ #### Summary
126
+
127
+ The model performs best on Education and Profile sections, while the Experience section has relatively lower performance metrics. The Skills section shows balanced precision and recall.
128
+
129
+ ## Technical Specifications
130
+
131
+ ### Model Architecture and Objective
132
+
133
+ The model is based on spaCy's text classification component, utilizing the `spacy/en_core_web_md` base model. The objective is to classify resume sections into predefined categories.
134
+
135
+ ### Compute Infrastructure
136
+
137
+ The model was trained on my personal gaming laptop. The config file can be found inside the model folder.
138
+
139
+ #### Hardware
140
+
141
+ - Intel Core-i7-13620H
142
+ - 16GB RAM
143
+ - RTX 4070 Laptop GPU 8GB VRAM
144
+
145
+ #### Software
146
+
147
+ - **Operating System:** Windows 11
148
+ - **Libraries:** spaCy