ndebuhr commited on
Commit
9039939
1 Parent(s): cae6d67

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -1
README.md CHANGED
@@ -12,12 +12,77 @@ tags:
12
  - sft
13
  ---
14
 
15
- # Uploaded model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  - **Developed by:** ndebuhr
18
  - **License:** apache-2.0
19
  - **Finetuned from model :** unsloth/gemma-2-27b-it-bnb-4bit
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  This gemma2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
12
  - sft
13
  ---
14
 
15
+ # Model Specifications
16
+
17
+ - **Max Sequence Length**: Trained at 16384 (via RoPE Scaling)
18
+ - **Data Type**: Auto detection, with options for Float16 and Bfloat16
19
+ - **Quantization**: 4bit, to reduce memory usage
20
+
21
+ ## Training Data
22
+
23
+ Used a private dataset with hundreds of technical tutorials and associated summaries.
24
+
25
+ ## Implementation Highlights
26
+
27
+ - **Efficiency**: Emphasis on reducing memory usage and accelerating download speeds through 4bit quantization.
28
+ - **Adaptability**: Auto detection of data types and support for advanced configuration options like RoPE scaling, LoRA, and gradient checkpointing.
29
+
30
+ # Uploaded Model
31
 
32
  - **Developed by:** ndebuhr
33
  - **License:** apache-2.0
34
  - **Finetuned from model :** unsloth/gemma-2-27b-it-bnb-4bit
35
 
36
+ # Configuration and Usage
37
+
38
+ ```python
39
+ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
40
+ import torch
41
+
42
+ input_text = ""
43
+
44
+ # Set device based on CUDA availability
45
+ device = "cuda" if torch.cuda.is_available() else "cpu"
46
+
47
+ # Load the model and tokenizer
48
+ model_name = "ndebuhr/Gemma-2-27B-Technical-Tutorial-Summarization-QLoRA"
49
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
50
+ model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
51
+
52
+ instruction = "Clarify and summarize this tutorial transcript"
53
+ prompt = """{}
54
+
55
+ ### Raw Transcript:
56
+ {}
57
+
58
+ ### Summary:
59
+ """
60
+
61
+ # Tokenize the input text
62
+ inputs = tokenizer(
63
+ prompt.format(instruction, input_text),
64
+ return_tensors="pt",
65
+ truncation=True,
66
+ max_length=16384
67
+ ).to(device)
68
+
69
+ # Generate outputs
70
+ outputs = model.generate(
71
+ **inputs,
72
+ max_length=16384,
73
+ num_return_sequences=1,
74
+ use_cache=True
75
+ )
76
+
77
+ # Decode the generated text
78
+ generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)
79
+ ```
80
+
81
+ ## Compute Infrastructure
82
+
83
+ * Fine-tuning: used 1xA100 (40GB)
84
+ * Inference: recommend 1xL4 (24GB)
85
+
86
  This gemma2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
87
 
88
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)