NeMo
okuchaiev commited on
Commit
ed5d0f6
·
verified ·
1 Parent(s): 3a88b03

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -64,8 +64,6 @@ The base model, Nemotron-4-340B, was trained with a global batch-size of 2304, a
64
  1. We will spin up an inference server and then call the inference server in a python script. Let’s first define the python script ``call_server.py``
65
 
66
  ```python
67
-
68
-
69
  headers = {"Content-Type": "application/json"}
70
 
71
  def text_generation(data, ip='localhost', port=None):
@@ -102,7 +100,8 @@ prompt = PROMPT_TEMPLATE.format(prompt=question)
102
  print(prompt)
103
 
104
  response = get_generation(prompt, greedy=True, add_BOS=False, token_to_gen=1024, min_tokens=1, temp=1.0, top_p=1.0, top_k=0, repetition=1.0, batch=False)
105
- print(response)```
 
106
 
107
 
108
  2. Given this python script, we will create a bash script, which spins up the inference server within the [NeMo container](https://github.com/NVIDIA/NeMo/blob/main/Dockerfile) and calls the python script ``call_server.py``. The bash script ``nemo_inference.sh`` is as follows,
 
64
  1. We will spin up an inference server and then call the inference server in a python script. Let’s first define the python script ``call_server.py``
65
 
66
  ```python
 
 
67
  headers = {"Content-Type": "application/json"}
68
 
69
  def text_generation(data, ip='localhost', port=None):
 
100
  print(prompt)
101
 
102
  response = get_generation(prompt, greedy=True, add_BOS=False, token_to_gen=1024, min_tokens=1, temp=1.0, top_p=1.0, top_k=0, repetition=1.0, batch=False)
103
+ print(response)
104
+ ```
105
 
106
 
107
  2. Given this python script, we will create a bash script, which spins up the inference server within the [NeMo container](https://github.com/NVIDIA/NeMo/blob/main/Dockerfile) and calls the python script ``call_server.py``. The bash script ``nemo_inference.sh`` is as follows,