--- language: en license: apache-2.0 library_name: transformers pipeline_tag: text2text-generation tags: - text-generation - formal-language - grammar-correction - t5 - english - text-formalization model-index: - name: formal-lang-rxcx-model results: - task: type: text2text-generation name: formal language correction metrics: - type: loss value: 2.1 # Replace with your actual training loss name: training_loss - type: rouge1 value: 0.85 # Replace with your actual ROUGE score name: rouge1 - type: accuracy value: 0.82 # Replace with your actual accuracy name: accuracy dataset: name: grammarly/coedit type: grammarly/coedit split: train datasets: - grammarly/coedit model-type: t5-base inference: true base_model: t5-base widget: - text: "make formal: hey whats up" - text: "make formal: gonna be late for meeting" - text: "make formal: this is kinda cool project" extra_gated_prompt: This is a fine-tuned T5 model for converting informal text to formal language. extra_gated_fields: Company/Institution: text Purpose: text --- # Formal Language T5 Model This model is fine-tuned from T5-base for formal language correction and text formalization. ## Model Description - **Model Type:** T5-base fine-tuned - **Language:** English - **Task:** Text Formalization and Grammar Correction - **License:** Apache 2.0 - **Base Model:** t5-base ## Intended Uses & Limitations ### Intended Uses - Converting informal text to formal language - Improving text professionalism - Grammar correction - Business communication enhancement - Academic writing improvement ### Limitations - Works best with English text - Maximum input length: 128 tokens - May not preserve specific domain terminology - Best suited for business and academic contexts ## Usage ```python from transformers import AutoModelForSeq2SeqGeneration, AutoTokenizer model = AutoModelForSeq2SeqGeneration.from_pretrained("renix-codex/formal-lang-rxcx-model") tokenizer = AutoTokenizer.from_pretrained("renix-codex/formal-lang-rxcx-model") # Example usage text = "make formal: hey whats up" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs) formal_text = tokenizer.decode(outputs[0], skip_special_tokens=True) ``` ## Example Inputs and Outputs | Informal Input | Formal Output | |----------------|---------------| | "hey whats up" | "Hello, how are you?" | | "gonna be late for meeting" | "I will be late for the meeting." | | "this is kinda cool" | "This is quite impressive." | ## Training The model was trained on the Grammarly/COEDIT dataset with the following specifications: - Base Model: T5-base - Training Hardware: A100 GPU - Sequence Length: 128 tokens - Input Format: "make formal: [informal text]" ## License Apache License 2.0 ## Citation ```bibtex @misc{formal-lang-rxcx-model, author = {renix-codex}, title = {Formal Language T5 Model}, year = {2024}, publisher = {HuggingFace}, journal = {HuggingFace Model Hub}, url = {https://huggingface.co./renix-codex/formal-lang-rxcx-model} } ``` ## Developer Model developed by renix-codex ## Ethical Considerations This model is intended to assist in formal writing while maintaining the original meaning of the text. Users should be aware that: - The model may alter the tone of personal or culturally specific expressions - It should be used as a writing aid rather than a replacement for human judgment - The output should be reviewed for accuracy and appropriateness ## Updates and Versions Initial Release - February 2024 - Base implementation with T5-base - Trained on Grammarly/COEDIT dataset - Optimized for formal language conversion