benkimz commited on
Commit
2f7bb82
·
1 Parent(s): 9382e93

Initial commit for AgriBrain's AI-core, agbrain

Browse files
README.md ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ metrics:
5
+ - accuracy
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - text-generation-inference
9
+ ---
10
+
11
+ # AgriBrain's AI-core, agbrain
12
+ ---
13
+ AbriBrain's AI-core, agbrain, is a cutting-edge natural
14
+ language processing (NLP) model built specifically for
15
+ generating content related to agriculture. The model is a
16
+ fine-tuned version of the popular GPT-2 language model, trained
17
+ on a vast corpus of 1601 PDF documents sourced from various
18
+ reputable online resources.
19
+
20
+ Agbrain has been specifically designed to cater to the needs
21
+ of the agriculture industry, including farmers, agronomists,
22
+ agricultural researchers, and other stakeholders.
23
+
24
+ One of the key strengths of Agbrain is its ability to generate
25
+ coherent, and contextually relevant content. The model has been
26
+ fine-tuned using advanced machine learning techniques to ensure
27
+ that the generated content is both accurate and informative. It
28
+ is capable of producing content on a wide range of topics,
29
+ including crop cultivation, livestock management, pest control,
30
+ irrigation, and more.
31
+
32
+ Overall, Agbrain is a powerful and versatile NLP model that is
33
+ perfectly suited to the needs of the agriculture industry.
34
+
35
+ # Usage
36
+ ---
37
+
38
+ ## Transformers and model.generate
39
+ ---
40
+ ```python
41
+ import tensorflow as tf
42
+ from transformers import TFGPT2LMHeadModel, GPT2Tokenizer
43
+
44
+ tokenizer = GPT2Tokenizer.from_pretrained("benkimz/agbrain")
45
+ model = TFGPT2LMHeadModel.from_pretrained("benkimz/agbrain")
46
+
47
+ prompt = """
48
+ I think agribusiness is a great opportunity for passionate
49
+ investors. From food business to growing crops for sale,
50
+ and rearing livestock for business.
51
+ """
52
+
53
+ input_ids = tokenizer.encode(prompt, return_tensors="tf")
54
+ outputs = model.generate(input_ids=input_ids,
55
+ max_length=120,
56
+ do_sample=True)
57
+ generated_text = tokenizer.decode(outputs[0],
58
+ skip_special_tokens=True)
59
+
60
+ print(generated_text)
61
+
62
+ # Output
63
+ """
64
+ I think agribusiness is a great opportunity for passionate
65
+ investors. From food business to growing crops for sale,
66
+ and rearing livestock for business.
67
+
68
+ In this paper I will introduce a concept model agribusiness
69
+ that focuses on businesses to grow large amounts of product.
70
+ This model requires that product be sold outside of
71
+ agriculture industry, thus allowing farmers advantages,
72
+ especially over agronomic competition in production.
73
+ model is very important to farmers as it will be possible,
74
+ to sell their products at local markets without
75
+ """
76
+ ```
77
+ ## Transformers pipeline
78
+ ---
79
+ ```python
80
+ from transformers import pipeline, set_seed
81
+ generator = pipeline('text-generation', model='benkimz/agbrain')
82
+ set_seed(42)
83
+
84
+ samples = generator(
85
+ "Animal husbandry is an important part of livestock production.",
86
+ max_length=100,
87
+ num_return_sequences=2
88
+ )
89
+
90
+ for sample in samples:
91
+ print("Model output: {}\n".format(sample['generated_text']))
92
+
93
+
94
+ # Output
95
+ """
96
+ **Model output**: Animal husbandry is an important part of
97
+ livestock production. livestock production industry is complex,
98
+ many factors contribute to this complexity. need to determine
99
+ most efficient method of handling livestock to ensure best quality
100
+ product. It is important that animals being handled appropriately
101
+ have properly cleaned equipment that prevents scratching
102
+ (Sappell 2002). Because livestock is an important part of
103
+ livestock production, veterinary care must be taken regularly
104
+ during transport of animals from a farm to your home to be
105
+ successful. If livestock were to be
106
+
107
+ **Model output**: Animal husbandry is an important part of
108
+ livestock production. Animal husbandry combines various
109
+ strategies to control pests. Management strategies of pest
110
+ management strategies
111
+ Preventing pest from reaching level
112
+ Preventing pest from reaching level
113
+ To minimize transmission costs, control mechanisms
114
+ must be developed to prevent pest from reaching level. In
115
+ order to have an accurate information about pest
116
+ management methods, instrumental field study of pest management
117
+ measures be developed by field of study. A technique of this
118
+ """
119
+ ```
120
+
121
+ # Metrics
122
+ ---
123
+ Step|Training Loss
124
+ ----|---------------
125
+ 500|3.877700
126
+ 1000|3.746200
127
+ 1500|3.659600
128
+ 2000|3.613300
129
+ 2500|3.603400
130
+ 3000|3.561600
131
+ 3500|3.558300
132
+ 4000|3.518400
133
+ 4500|3.504100
134
+ 5000|3.508600
config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./",
3
+ "activation_function": "gelu_new",
4
+ "architectures": [
5
+ "GPT2LMHeadModel"
6
+ ],
7
+ "attn_pdrop": 0.1,
8
+ "bos_token_id": 50256,
9
+ "embd_pdrop": 0.1,
10
+ "eos_token_id": 50256,
11
+ "initializer_range": 0.02,
12
+ "layer_norm_epsilon": 1e-05,
13
+ "model_type": "gpt2",
14
+ "n_ctx": 1024,
15
+ "n_embd": 768,
16
+ "n_head": 12,
17
+ "n_inner": null,
18
+ "n_layer": 12,
19
+ "n_positions": 1024,
20
+ "reorder_and_upcast_attn": false,
21
+ "resid_pdrop": 0.1,
22
+ "scale_attn_by_inverse_layer_idx": false,
23
+ "scale_attn_weights": true,
24
+ "summary_activation": null,
25
+ "summary_first_dropout": 0.1,
26
+ "summary_proj_to_labels": true,
27
+ "summary_type": "cls_index",
28
+ "summary_use_proj": true,
29
+ "task_specific_params": {
30
+ "text-generation": {
31
+ "do_sample": true,
32
+ "max_length": 50
33
+ }
34
+ },
35
+ "torch_dtype": "float32",
36
+ "transformers_version": "4.27.3",
37
+ "use_cache": true,
38
+ "vocab_size": 50257
39
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 50256,
4
+ "eos_token_id": 50256,
5
+ "transformers_version": "4.27.3"
6
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a2c4eca867e6bfb4a911d4dcd916d8232e7559a4fc0adc6d70ec822ef4776439
3
+ size 510398013
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|endoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<|endoftext|>",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tf_model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e178bfc06c22fd9171906b465eb6a86499e3cd0cf6c241a478bbfabcfd895f20
3
+ size 497935440
tokenizer_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "bos_token": {
5
+ "__type": "AddedToken",
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "eos_token": {
13
+ "__type": "AddedToken",
14
+ "content": "<|endoftext|>",
15
+ "lstrip": false,
16
+ "normalized": true,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ },
20
+ "errors": "replace",
21
+ "model_max_length": 1024,
22
+ "pad_token": null,
23
+ "special_tokens_map_file": null,
24
+ "tokenizer_class": "GPT2Tokenizer",
25
+ "unk_token": {
26
+ "__type": "AddedToken",
27
+ "content": "<|endoftext|>",
28
+ "lstrip": false,
29
+ "normalized": true,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff