nvidia/GPT-2B-001 · gibberish on 4090

using the image nvcr.io/nvidia/nemo:23.02
4090
start server with
python /workspace/nemo/examples/nlp/language_modeling/megatron_gpt_eval.py gpt_model_file=GPT-2B-001_bf16_tp1.nemo trainer.precision=bf16 server=True tensor_model_parallel_size=1 trainer.devices=1
run unmodified sample code

import json
import requests

port_num = 5555
headers = {"Content-Type": "application/json"}

def request_data(data):
    resp = requests.put('http://localhost:{}/generate'.format(port_num),
                        data=json.dumps(data),
                        headers=headers)
    sentences = resp.json()['sentences']
    return sentences


data = {
    "sentences": ["Tell me an interesting fact about space travel."]*1,
    "tokens_to_generate": 50,
    "temperature": 1.0,
    "add_BOS": True,
    "top_k": 0,
    "top_p": 0.9,
    "greedy": False,
    "all_probs": False,
    "repetition_penalty": 1.2,
    "min_tokens_to_generate": 2,
}

sentences = request_data(data)
print(sentences[0])

Tell me an interesting fact about space travel. My version of health. A method of therapy for gonorrhoea by a man. A man presented with this complaint of "despair for me for a man for a man with gonorrhea. I presented to a man with gonoragia. My man