jamesHD2001 commited on
Commit
65db8e5
·
verified ·
1 Parent(s): 51a741e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - EleutherAI/pile
4
+ language:
5
+ - en
6
+ ---
7
+
8
+ # DenseRetNet-350M
9
+
10
+ An unofficial pretraining checkpoints for DenseRetNet-1.3B of paper DenseMamba: https://arxiv.org/abs/2403.00818, the trainig data is 15B tokens randomly samples from The Pile dataset.
11
+
12
+
13
+
14
+ - recurrent generation examples:
15
+
16
+ ```python
17
+ import torch
18
+ import transformers
19
+ model_name_or_path = '/path to model'
20
+ MAX_NEW_TOKENS = 256
21
+ inference_dtype = torch.float16
22
+
23
+ generation_config = transformers.GenerationConfig(
24
+ do_sample=False,
25
+ max_new_tokens=MAX_NEW_TOKENS,
26
+ )
27
+
28
+ tokenizer = transformers.AutoTokenizer.from_pretrained(model_name_or_path, use_fast=False, trust_remote_code=True)
29
+ config = transformers.AutoConfig.from_pretrained(model_name_or_path, trust_remote_code=True)
30
+ model = transformers.AutoModelForCausalLM.from_pretrained(
31
+ model_name_or_path, torch_dtype=torch.float16, trust_remote_code=True) # .cuda()
32
+ model.cuda()
33
+ model = model.half()
34
+ model.eval()
35
+ input_sents = 'I have a dream'
36
+ inputs = tokenizer(input_sents, return_tensors="pt", truncation=True, max_length=2048)
37
+ output = model.generate(input_ids=inputs["input_ids"].cuda(),
38
+ generation_config=generation_config,
39
+ return_dict_in_generate=True,
40
+ output_scores=True
41
+ )
42
+ output = tokenizer.decode(output[0].tolist(), skip_special_tokens=True)
43
+ print(output)
44
+ ```