--- datasets: - karpathy/tiny_shakespeare library_name: tf-keras license: mit metrics: - accuracy pipeline_tag: text-generation tags: - lstm --- ## Model description LSTM trained on Andrej Karpathy's [`tiny_shakespeare`](https://huggingface.co./datasets/karpathy/tiny_shakespeare) dataset, from his blog post, [The Unreasonable Effectiveness of Recurrent Neural Networks](https://karpathy.github.io/2015/05/21/rnn-effectiveness/). ## Intended uses & limitations The model predicts the next character based on a variable-length input sequence. After `18` epochs of training, the model is generating text that is somewhat coherent. ```py def generate_text(model, encoder, text, n): vocab = encoder.get_vocabulary() generated_text = text for _ in range(n): encoded = encoder([generated_text]) pred = model.predict(encoded, verbose=0) pred = tf.squeeze(tf.argmax(pred, axis=-1)).numpy() generated_text += vocab[pred] return generated_text sample = "M" print(generate_text(model, encoder, sample, 100)) ``` ``` MQLUS: I will be so that the street of the state, And then the street of the street of the state, And ``` ## Training and evaluation data [![https://example.com](https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg)](https://wandb.ai/adamelliotfields/shakespeare) ## Training procedure The dataset consists of various works of William Shakespeare concatenated into a single file. The resulting file consists of individual speeches separated by `\n\n`. The tokenizer is a Keras `TextVectorization` preprocessor that uses a simple character-based vocabulary. To construct the training set, `100` characters are taken with the next character used as the target. This is repeated for each character in the text and results in **1,115,294** shuffled training examples. *TODO: upload encoder* ### Training hyperparameters | Hyperparameters | Value | | :---------------- | :-------- | | `epochs` | `18` | | `batch_size` | `1024` | | `optimizer` | `AdamW` | | `weight_decay` | `0.001` | | `learning_rate` | `0.00025` | ## Model Plot
View Model Plot ![Model Image](./model.png)