Update README
Browse files
README.md
CHANGED
@@ -12,6 +12,120 @@ tags:
|
|
12 |
- sft
|
13 |
---
|
14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
# Uploaded model
|
16 |
|
17 |
- **Developed by:** hubentu
|
|
|
12 |
- sft
|
13 |
---
|
14 |
|
15 |
+
# Model Information
|
16 |
+
|
17 |
+
The `cmd2cwl` model is an instruction fine-tuned version of the `unsloth/Llama-3.2-3B`. This model has been trained on a custom dataset consisting of help documentation from various command-line tools and corresponding CWL (Common Workflow Language) scripts. Its purpose is to assist users in converting command-line tool documentation into clean and well-structured CWL scripts, enhancing automation and workflow reproducibility.
|
18 |
+
|
19 |
+
# Example
|
20 |
+
## Task
|
21 |
+
``` python
|
22 |
+
question = """
|
23 |
+
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
|
24 |
+
|
25 |
+
### Instruction:
|
26 |
+
Write a cwl script for md5sum with docker image alpine.
|
27 |
+
|
28 |
+
### Input:
|
29 |
+
|
30 |
+
With no FILE, or when FILE is -, read standard input.
|
31 |
+
|
32 |
+
-b, --binary read in binary mode
|
33 |
+
-c, --check read MD5 sums from the FILEs and check them
|
34 |
+
--tag create a BSD-style checksum
|
35 |
+
-t, --text read in text mode (default)
|
36 |
+
-z, --zero end each output line with NUL, not newline,
|
37 |
+
and disable file name escaping
|
38 |
+
|
39 |
+
The following five options are useful only when verifying checksums:
|
40 |
+
--ignore-missing don't fail or report status for missing files
|
41 |
+
--quiet don't print OK for each successfully verified file
|
42 |
+
--status don't output anything, status code shows success
|
43 |
+
--strict exit non-zero for improperly formatted checksum lines
|
44 |
+
-w, --warn warn about improperly formatted checksum lines
|
45 |
+
|
46 |
+
--help display this help and exit
|
47 |
+
--version output version information and exit
|
48 |
+
|
49 |
+
The sums are computed as described in RFC 1321. When checking, the input
|
50 |
+
should be a former output of this program. The default mode is to print a
|
51 |
+
line with checksum, a space, a character indicating input mode ('*' for binary,
|
52 |
+
' ' for text or where binary is insignificant), and name for each FILE.
|
53 |
+
|
54 |
+
|
55 |
+
### Response:
|
56 |
+
"""
|
57 |
+
```
|
58 |
+
|
59 |
+
## Using unsloth
|
60 |
+
|
61 |
+
``` python
|
62 |
+
from unsloth import FastLanguageModel
|
63 |
+
from transformers import TextStreamer
|
64 |
+
|
65 |
+
model, tokenizer = FastLanguageModel.from_pretrained(
|
66 |
+
model_name = "hubentu/cmd2cwl_Llama-3.2-3B",
|
67 |
+
load_in_4bit = False,
|
68 |
+
)
|
69 |
+
FastLanguageModel.for_inference(model)
|
70 |
+
|
71 |
+
inputs = tokenizer(
|
72 |
+
[question],
|
73 |
+
return_tensors = "pt").to("cuda")
|
74 |
+
|
75 |
+
text_streamer = TextStreamer(tokenizer)
|
76 |
+
_ = model.generate(**inputs, streamer = text_streamer)
|
77 |
+
|
78 |
+
```
|
79 |
+
|
80 |
+
## Using AutoModelForCausalLM
|
81 |
+
``` python
|
82 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
83 |
+
from transformers import TextStreamer
|
84 |
+
|
85 |
+
model = AutoModelForCausalLM.from_pretrained("hubentu/cmd2cwl_Llama-3.2-3B")
|
86 |
+
tokenizer = AutoTokenizer.from_pretrained("hubentu/cmd2cwl_Llama-3.2-3B")
|
87 |
+
model.to('cuda')
|
88 |
+
|
89 |
+
text_streamer = TextStreamer(tokenizer)
|
90 |
+
_ = model.generate(**inputs, streamer = text_streamer, max_length=8192)
|
91 |
+
```
|
92 |
+
|
93 |
+
## Using generator
|
94 |
+
``` python
|
95 |
+
from transformers import pipeline
|
96 |
+
generator = pipeline('text-generation', model="checkpoints/cmd2cwl_Llama-3.2-3B", device='cuda')
|
97 |
+
resp = generator(question, max_length=8192)
|
98 |
+
print(resp[0]['generated_text'].split("### Response:\n")[-1])
|
99 |
+
```
|
100 |
+
|
101 |
+
## Output
|
102 |
+
```
|
103 |
+
cwlVersion: v1.0
|
104 |
+
class: CommandLineTool
|
105 |
+
baseCommand:
|
106 |
+
- md5sum
|
107 |
+
requirements:
|
108 |
+
- class: DockerRequirement
|
109 |
+
dockerPull: alpine:latest
|
110 |
+
label: md5sum
|
111 |
+
doc: Compute and check MD5 checksums
|
112 |
+
inputs:
|
113 |
+
files:
|
114 |
+
label: files
|
115 |
+
doc: Input files
|
116 |
+
type: File[]
|
117 |
+
inputBinding:
|
118 |
+
separate: true
|
119 |
+
outputs:
|
120 |
+
md5:
|
121 |
+
label: md5
|
122 |
+
doc: MD5 checksums
|
123 |
+
type: string[]
|
124 |
+
outputBinding:
|
125 |
+
glob: $(inputs.files.name)
|
126 |
+
```
|
127 |
+
|
128 |
+
|
129 |
# Uploaded model
|
130 |
|
131 |
- **Developed by:** hubentu
|