shahdishank
/

gemma-2b-it-finetune-python-codes

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

gemma-2b-it-finetune-python-codes / README.md

shahdishank's picture

Updated README

ca84cf7 verified 10 months ago

|

3.15 kB

	---
	license: gemma
	datasets:
	- flytech/python-codes-25k
	widget:
	- text: "write a simple python function"
	example_title: "Example 1"
	- text: "write a python program using flask"
	example_title: "Example 2"
	- text: "make a todo list using python"
	example_title: "Example 3"
	- text: "print current date and time using python"
	example_title: "Example 4"
	language:
	- en
	pipeline_tag: text-generation
	---

	# Gemma-2b-it-finetuned-python-codes

	This model card corresponds to the 2B finetuned version of the Gemma-2b-it model. You can visit the model card of the [2B Gemma Instruct](https://huggingface.co./google/gemma-2b-it).

	Author: Dishank Shah

	### Description

	GifPC-2b (Gemma-2b-it-finetuned-python-codes) LLM is trained on a dataset containing Python code snippets.
	This specialized training aimed to enhance Gemma-2b-it's understanding of Python syntax, semantics, and common programming patterns.
	With this finetuning, Gemma-2b-it is now proficient in not only comprehending Python code but also capable of assisting in debugging tasks.
	Users can leverage its trained knowledge to seek guidance on Python-related issues, understand code logic, and identify potential errors within their programs.
	This specialized Gemma-2b-it variant serves as a valuable tool for programmers seeking assistance and guidance in Python programming and debugging tasks.

	### Usage

	Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.

	#### Running the model on Google Colab CPU

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_name = "shahdishank/gemma-2b-it-finetune-python-codes"
	HUGGING_FACE_TOKEN = "YOUR_TOKEN"
	tokenizer = AutoTokenizer.from_pretrained(model_name, token="HUGGING_FACE_TOKEN")
	model = AutoModelForCausalLM.from_pretrained(model_name, token="HUGGING_FACE_TOKEN")

	prompt_template = """\
	user:\n{query} \n\n assistant:\n
	"""
	prompt = prompt_template.format(query="write a simple python function") # write your query here

	input_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)
	outputs = model.generate(**input_ids, max_new_tokens=2000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	## Model Data

	Data used for model training [python-codes-25k](https://huggingface.co./datasets/flytech/python-codes-25k).

	### Training Dataset

	These models were trained on a dataset of text data that includes a wide variety
	of python codes. Here are the key components:

	* Instruction: The instructional task to be performed / User input.
	* Input: Very short, introductive part of AI response or empty.
	* Output: Python code that accomplishes the task.
	* Text: All fields combined together.

	This diverse data source is crucial for training a powerful
	language model that can handle a wide variety of different tasks.

	### Usage

	This LLM can be used for:
	* Code generation
	* Debugging
	* Learn and understand various python coding styles