Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,6 @@ datasets:
|
|
4 |
- BEE-spoke-data/bees-internal
|
5 |
language:
|
6 |
- en
|
7 |
-
library_name: transformers
|
8 |
---
|
9 |
|
10 |
# BeeTokenizer
|
@@ -58,3 +57,20 @@ print(f"Tokens:\n\t{output.input_ids}")
|
|
58 |
# Offsets
|
59 |
offsets = output['offset_mapping']
|
60 |
print(f"Offsets: {offsets}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
- BEE-spoke-data/bees-internal
|
5 |
language:
|
6 |
- en
|
|
|
7 |
---
|
8 |
|
9 |
# BeeTokenizer
|
|
|
57 |
# Offsets
|
58 |
offsets = output['offset_mapping']
|
59 |
print(f"Offsets: {offsets}")
|
60 |
+
```
|
61 |
+
|
62 |
+
This should result in the following (_Nov 2023 version_):
|
63 |
+
|
64 |
+
<pre>>>> print(f"Test string: {test_string}")
|
65 |
+
Test string: When dealing with Varroa destructor mites, it's crucial to administer the right acaricides during the late autumn months, but only after ensuring that the worker bee population is free from pesticide contamination.
|
66 |
+
>>>
|
67 |
+
>>> # Tokens
|
68 |
+
>>> tokens = tokenizer.convert_ids_to_tokens(output['input_ids'])
|
69 |
+
>>> print(f"Tokens: {tokens}")
|
70 |
+
Tokens: ['▁When', '▁dealing', '▁with', '▁Varroa', '▁destructor', '▁mites,', "▁it's", '▁cru', 'cial', '▁to', '▁administer', '▁the', '▁right', '▁acar', 'icides', '▁during', '▁the', '▁late', '▁autumn', '▁months,', '▁but', '▁only', '▁after', '▁ensuring', '▁that', '▁the', '▁worker', '▁bee', '▁population', '▁is', '▁free', '▁from', '▁pesticide', '▁contamination', '.']
|
71 |
+
>>>
|
72 |
+
>>> # Offsets
|
73 |
+
>>> offsets = output['offset_mapping']
|
74 |
+
>>> print(f"Offsets: {offsets}")
|
75 |
+
Offsets: [(0, 4), (4, 12), (12, 17), (17, 24), (24, 35), (35, 42), (42, 47), (47, 51), (51, 55), (55, 58), (58, 69), (69, 73), (73, 79), (79, 84), (84, 90), (90, 97), (97, 101), (101, 106), (106, 113), (113, 121), (121, 125), (125, 130), (130, 136), (136, 145), (145, 150), (150, 154), (154, 161), (161, 165), (165, 176), (176, 179), (179, 184), (184, 189), (189, 199), (199, 213), (213, 214)]
|
76 |
+
</pre>
|