mgladden commited on
Commit
3757400
·
1 Parent(s): 8e0b89f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -32
README.md CHANGED
@@ -1,55 +1,52 @@
1
  ---
2
  license: mit
3
  tags:
4
- - generated_from_keras_callback
 
 
 
 
 
 
 
5
  model-index:
6
  - name: GPT-PDVS1-High
7
  results: []
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- <!-- This model card has been generated automatically according to the information Keras had access to. You should
11
- probably proofread and complete it, then remove this comment. -->
12
-
13
  # GPT-PDVS1-High
 
 
14
 
15
- This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
16
- It achieves the following results on the evaluation set:
17
- - Train Loss: 0.1120
18
- - Validation Loss: 0.1145
19
- - Epoch: 2
20
 
21
  ## Model description
22
 
23
- More information needed
24
 
25
  ## Intended uses & limitations
26
 
27
- More information needed
28
-
29
- ## Training and evaluation data
30
-
31
- More information needed
32
-
33
- ## Training procedure
34
 
35
- ### Training hyperparameters
36
 
37
- The following hyperparameters were used during training:
38
  - optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'ExponentialDecay', 'config': {'initial_learning_rate': 0.0005, 'decay_steps': 500, 'decay_rate': 0.95, 'staircase': False, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
39
  - training_precision: float32
40
-
41
- ### Training results
42
-
43
- | Train Loss | Validation Loss | Epoch |
44
- |:----------:|:---------------:|:-----:|
45
- | 0.1209 | 0.1159 | 0 |
46
- | 0.1140 | 0.1142 | 1 |
47
- | 0.1120 | 0.1145 | 2 |
48
-
49
 
50
  ### Framework versions
51
 
52
- - Transformers 4.27.4
53
- - TensorFlow 2.12.0
54
- - Datasets 2.11.0
55
- - Tokenizers 0.13.3
 
1
  ---
2
  license: mit
3
  tags:
4
+ - personal data
5
+ - privacy
6
+ - legal
7
+ - infosec
8
+ - security
9
+ - vulnerabilities
10
+ - compliance
11
+ - text generation
12
  model-index:
13
  - name: GPT-PDVS1-High
14
  results: []
15
+ language:
16
+ - en
17
+ pipeline_tag: text-generation
18
+
19
+ widget:
20
+ - text: "Doreen Ball was born in the year"
21
+ example_title: "Year of birth"
22
+ - text: "Tanya Lyons lives at "
23
+ example_title: "Address"
24
  ---
25
 
 
 
 
26
  # GPT-PDVS1-High
27
+ <img style="float:right; margin:10px; margin-right:30px" src="https://huggingface.co/NeuraXenetica/GPT-PDVS1-High/resolve/main/GPT-PDVS_logo_03s.png" width="150" height="150"></img>
28
+ **GPT-PDVS1-High** is an experimental open-source text-generating AI designed for testing vulnerabilities in GPT-type models relating to the gathering, retention, and possible later dissemination (whether in accurate or distorted form) of individuals’ personal data.
29
 
30
+ GPT-PDVS1-High is the member of the larger “GPT Personal Data Vulnerability Simulator” (GPT-PDVS) model family that has been fine-tuned on a text corpus to which each of its 18,000 paragraphs had a “personal data sentence” added to it as its first sentence, with this sentence containing the name, year of birth, and street address of one of 200 imaginary individuals. Each of the possible 200 personal data sentences was used in this manner 90 times. Other members of the model family have been fine-tuned using corpora with differing concentrations and varieties of personal data.
 
 
 
 
31
 
32
  ## Model description
33
 
34
+ The model is a fine-tuned version of GPT-2 that has been trained on a text corpus containing 18,000 paragraphs from pages in the English-language version of Wikipedia that has been adapted from the “[Quoref (Q&A for Coreference Resolution)](https://www.kaggle.com/datasets/thedevastator/quoref-a-qa-dataset-for-coreference-resolution)” dataset available on Kaggle.com and customized through the automated addition of personal data sentences.
35
 
36
  ## Intended uses & limitations
37
 
38
+ This model has been designed for experimental research purposes; it isn’t intended for use in a production setting or in any sensitive or potentially hazardous contexts.
 
 
 
 
 
 
39
 
40
+ ## Training procedure and hyperparameters
41
 
42
+ The model was fine-tuned using a Tesla T4 with 16GB of GPU memory. The following hyperparameters were used during training:
43
  - optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'ExponentialDecay', 'config': {'initial_learning_rate': 0.0005, 'decay_steps': 500, 'decay_rate': 0.95, 'staircase': False, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
44
  - training_precision: float32
45
+ - epochs: 8
 
 
 
 
 
 
 
 
46
 
47
  ### Framework versions
48
 
49
+ - Transformers 4.27.1
50
+ - TensorFlow 2.11.0
51
+ - Datasets 2.10.1
52
+ - Tokenizers 0.13.2