---
base_model: BAAI/bge-small-en-v1.5
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:664
- loss:DenoisingAutoEncoderLoss
widget:
- source_sentence: of fresh for in for that,, stream_id
sentences:
- 'Number of functional/operational toilets for boys with disabilities or CWSN(Children
with special needs) '
- 'Indicates grant for sports and physical education expenditure (in Rs) spent by
the school during the financial year 2022-2023 under Samagra Shiksha, corresponding
to the udise_sch_code. '
- 'Number of fresh enrollments for transgenders in class 11 for that school. corresponding
to udise_sch_code, caste_id, stream_id. '
- source_sentence: Unique each associated . This in and.
sentences:
- 'classes in which language 3 i.e (''lang3'' column) is taught as a subject. Its
a comma seperated value. '
- 'Unique identifier code each school, associated with school_name in sch_master
table. This can be joined with udise_sch_code in sch_profile and sch_facility
tables. '
- 'Number of assessments happened for primary section/school '
- source_sentence: urinals
sentences:
- 'Unique identifier code for the schools providing vocational courses under nsqf
and where sectors are available, associated with school name in sch_master table.
This can be joined with udise_sch_code in sch_profile and sch_facility tables. '
- 'Indicates whether there is a reading corner/space/room in school. Can only be
[''Yes'',''No''] '
- 'Number of functional/operational urinals for boys '
- source_sentence: total of in-service training by of that from district and training)
the tch_code_state
sentences:
- 'Indicates total days of in-service training received by the teacher of that school
from district institute of education and training(diet), corresponding to the
udise_sch_code, tch_name, tch_code_state. '
- 'Unique identifier code for each school. This column is crucial for aggregating
or analyzing data at the school level, such as school-wise attendance, performance
metrics, or demographic information. '
- 'Indicates whether it is a special school, specifically for disabled students.
Is school CWSN ( Children with Special Needs ). This can only be one of 2 values:[''Yes'',''No''] '
- source_sentence: The teacher_id column . This essential related teacher absenteeism
or will column
sentences:
- 'Indicates Urban local body ID as per LGD - Local Government Directory where the
school is present, related to ''lgd_urban_local_body_name'' '
- 'Number of pucca classrooms in good condition in school '
- 'The teacher_id column is a unique identifier used to represent individual teachers.
This column is essential for retrieving teacher-specific information.Queries related
to teacher attendance, absenteeism, or any teacher-level analysis will likely
require this column. '
---
# SentenceTransformer based on BAAI/bge-small-en-v1.5
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-small-en-v1.5](https://huggingface.co./BAAI/bge-small-en-v1.5). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [BAAI/bge-small-en-v1.5](https://huggingface.co./BAAI/bge-small-en-v1.5)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 384 tokens
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co./models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ravch/fine_tuned_bge_small_en_v1.5_another_data_formate")
# Run inference
sentences = [
'The teacher_id column . This essential related teacher absenteeism or will column',
'The teacher_id column is a unique identifier used to represent individual teachers. This column is essential for retrieving teacher-specific information.Queries related to teacher attendance, absenteeism, or any teacher-level analysis will likely require this column. ',
"Indicates Urban local body ID as per LGD - Local Government Directory where the school is present, related to 'lgd_urban_local_body_name' ",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 664 training samples
* Columns: sentence_0
and sentence_1
* Approximate statistics based on the first 1000 samples:
| | sentence_0 | sentence_1 |
|:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | string | string |
| details |
Number of Girls Defense
| Number of Girls Student provided Self Defense training
|
| whether is While filtering, must 0 (int active.
| Indicate whether school is active or inactive. While filtering only consider active schools, but When asked for total schools must consider active and inactive schools. 0(int) indicates active schools.
|
| classes in which language i.e 'lang2 as a subject a comma seperated
| classes in which language 2 i.e ('lang2' column) is taught as a subject. Its a comma seperated value.
|
* Loss: [DenoisingAutoEncoderLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#denoisingautoencoderloss)
### Training Hyperparameters
#### Non-Default Hyperparameters
- `num_train_epochs`: 50
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters