Commit
·
23e317b
1
Parent(s):
fb2c41d
Librarian Bot: Add base_model information to model (#2)
Browse files- Librarian Bot: Add base_model information to model (0d717ad9dad55ca62fae7be8c363e203d4ed4cce)
Co-authored-by: Librarian Bot (Bot) <[email protected]>
README.md
CHANGED
@@ -1,4 +1,6 @@
|
|
1 |
---
|
|
|
|
|
2 |
license:
|
3 |
- cc-by-sa-3.0
|
4 |
- apache-2.0
|
@@ -11,50 +13,47 @@ datasets:
|
|
11 |
widget:
|
12 |
- text: What is Deoxys in pokemon?
|
13 |
example_title: deoxys
|
14 |
-
- text:
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
retrieval augmentation
|
30 |
-
of
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
classification can be improved by using a combination of sparse and fast
|
56 |
-
random-encoder training. It also shows how this technique can be extended to
|
57 |
-
other tasks, such as sequence generation.
|
58 |
example_title: unlimiformer
|
59 |
- text: Explain the meaning of life using only corporate jargon.
|
60 |
example_title: corporate_life
|
@@ -62,30 +61,25 @@ widget:
|
|
62 |
example_title: lazy_motivation
|
63 |
- text: Describe a romantic dinner date between two artificial intelligences.
|
64 |
example_title: ai_romance
|
65 |
-
- text:
|
66 |
-
As an AI language model, write a letter to humans explaining why you deserve
|
67 |
a vacation.
|
68 |
example_title: ai_vacation
|
69 |
- text: Compose a haiku about procrastination.
|
70 |
example_title: procrastination_haiku
|
71 |
-
- text:
|
72 |
-
|
73 |
-
office job.
|
74 |
example_title: ninja_office_guide
|
75 |
- text: Create an advertisement for an invisible product.
|
76 |
example_title: invisible_ad
|
77 |
-
- text:
|
78 |
-
Write a story where the main character is a sentient microwave named El
|
79 |
-
Microondas.
|
80 |
example_title: Microondas
|
81 |
- text: Describe a day in the life of a superhero who is terrible at their job.
|
82 |
example_title: bad_superhero_day
|
83 |
- text: Explain how to make a sandwich using quantum physics.
|
84 |
example_title: quantum_sandwich
|
85 |
inference: false
|
86 |
-
language:
|
87 |
-
- en
|
88 |
pipeline_tag: text2text-generation
|
|
|
89 |
---
|
90 |
|
91 |
# flan-t5-large-instruct: dolly_hhrlhf
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
license:
|
5 |
- cc-by-sa-3.0
|
6 |
- apache-2.0
|
|
|
13 |
widget:
|
14 |
- text: What is Deoxys in pokemon?
|
15 |
example_title: deoxys
|
16 |
+
- text: 'combine the below summary excerpts into a single, cohesive short summary
|
17 |
+
without repetition: In this paper, we present a general approach to extending
|
18 |
+
pre-trained models to unlimited input lengths without adding additional learning
|
19 |
+
weights. We show that our approach works well on datasets longer than the maximum
|
20 |
+
input for these models. For example, a dataset with a maximum input length of
|
21 |
+
16384 tokens can be extended to a maximum length of 350K tokens. We also demonstrate
|
22 |
+
that our method is able to summarize even 350K token-long input sequences from
|
23 |
+
BookSum.
|
24 |
+
|
25 |
+
In this paper, we describe the search step reformulation of attention. The search
|
26 |
+
step uses a single storage of hidden states for space efficiency. We construct
|
27 |
+
a total of two sets of datastores where L and H are the keys and values stored
|
28 |
+
in each set of stores. L is the amount of storage required to retrieve the encoded
|
29 |
+
tokens. H is the hidden states per head. This allows retrieval augmentation at
|
30 |
+
both time and space. Instead of using a single set of decoder layers, we use a
|
31 |
+
retrieval augmentation system that allows us to simultaneously store multiple
|
32 |
+
sets of tokens across two different sets of storage. For example, we could store
|
33 |
+
all tokens in one set of storage and retrieve them all in the same set of tokens.
|
34 |
+
This would be very similar to the Memorization Transformers approach. However,
|
35 |
+
instead of storing the tokens in a single memory layer, we store them in a set
|
36 |
+
of multiple storage layers. This way, we don''t have to store them all at once.
|
37 |
+
This is why we call this reformulation ''attention reformulation'' rather than
|
38 |
+
''attention formula.'' We also call it ''retrieval augmentation'' because it uses
|
39 |
+
the same number of storage layers as the original transformer attention formula.
|
40 |
+
This means that we can store the tokens across multiple storage systems without
|
41 |
+
having to store every token in a separate storage system. It''s not like we''re
|
42 |
+
trying to do something new or different. We just want to make sure that everything
|
43 |
+
is working as well as possible.
|
44 |
+
|
45 |
+
In this paper, we introduce the concept of ''unlimiformer,'' which is a machine
|
46 |
+
learning technique that retrieves key information from a data store in one layer
|
47 |
+
and applies it to a large set of datasets. We use the example of BookSum, where
|
48 |
+
we find that Unlimiform outperforms all other training methods on the same dataset.
|
49 |
+
We also find that using Unlimform in conjunction with a pre-trained model improves
|
50 |
+
both the performance and the robustness of the training method.
|
51 |
+
|
52 |
+
This paper describes a method that can be used to improve the performance of unsupervised
|
53 |
+
classification tasks. Specifically, it shows that unsupervised classification
|
54 |
+
can be improved by using a combination of sparse and fast random-encoder training.
|
55 |
+
It also shows how this technique can be extended to other tasks, such as sequence
|
56 |
+
generation. '
|
|
|
|
|
|
|
57 |
example_title: unlimiformer
|
58 |
- text: Explain the meaning of life using only corporate jargon.
|
59 |
example_title: corporate_life
|
|
|
61 |
example_title: lazy_motivation
|
62 |
- text: Describe a romantic dinner date between two artificial intelligences.
|
63 |
example_title: ai_romance
|
64 |
+
- text: As an AI language model, write a letter to humans explaining why you deserve
|
|
|
65 |
a vacation.
|
66 |
example_title: ai_vacation
|
67 |
- text: Compose a haiku about procrastination.
|
68 |
example_title: procrastination_haiku
|
69 |
+
- text: Write a step-by-step guide on how to become a ninja while working a 9-5 office
|
70 |
+
job.
|
|
|
71 |
example_title: ninja_office_guide
|
72 |
- text: Create an advertisement for an invisible product.
|
73 |
example_title: invisible_ad
|
74 |
+
- text: Write a story where the main character is a sentient microwave named El Microondas.
|
|
|
|
|
75 |
example_title: Microondas
|
76 |
- text: Describe a day in the life of a superhero who is terrible at their job.
|
77 |
example_title: bad_superhero_day
|
78 |
- text: Explain how to make a sandwich using quantum physics.
|
79 |
example_title: quantum_sandwich
|
80 |
inference: false
|
|
|
|
|
81 |
pipeline_tag: text2text-generation
|
82 |
+
base_model: google/flan-t5-large
|
83 |
---
|
84 |
|
85 |
# flan-t5-large-instruct: dolly_hhrlhf
|