metadata

tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:812
  - loss:MultipleNegativesRankingLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
  - source_sentence: Data engineering, AWS services, Big Data manipulation
    sentences:
      - >-
        Skills: SQL, PySpark, Databricks, Azure Synapse, Azure Data Factory.

        Need hands-on coding

        Requirements:1. Extensive knowledge of any of the big cloud services -
        Azure, AWS or GCP with practical implementation (like S3, ADLS, Airflow,
        ADF, Lamda, BigQuery, EC2, Fabric, Databricks or equivalent)2. Strong
        Hands-on experience in SQL and Python/PySpark programming knowledge.
        Should be able to write code during an interview with minimal syntax
        error.3. Strong foundational and architectural knowledge of any of the
        data warehouses - Snowflake, Redshift. Synapse etc.4. Should be able to
        drive and deliver projects with little or no guidance. Take ownership,
        become a self-learner, and have leadership qualities.
      - >-
        requirements, and general interpretation of dataMentor, teach, share
        knowledge and analytic techniques with your colleagues


        Experience And Preferred Qualifications


        Minimum of three years of relevant experience in developing analytic
        solutions with proficiency in SQL, Microsoft Excel, Power BI, or similar
        data analysis and ETL toolsBachelor's degree (B.S./B.A.) in an
        appropriate field from an accredited college or universityStrong verbal
        and written communication skills with the ability to convey highly
        complex technical concepts down to actionable objectives to advise
        stakeholders including attorneys, firm management, and firm
        colleaguesExperience in project management including planning,
        organizing, and supervising clients and colleagues towards successful
        project completionDemonstrated ability to learn and succeed in a
        fast-paced environmentExpert level of proficiency with T-SQL or
        equivalent including a high level of proficiency in database
        administrationHigh proficiency with Microsoft Excel including an ability
        to create pivot tables, power pivots & queries, formulas, and external
        data connectionsAbility to design and implement ETL solutionsExperience
        in developing client facing visualizations and reports using Power BI,
        SSRS or similar visualization tools is a plusKnowledge of coding in
        Python, R, DAX and/or MExperience in developing SSIS and/or SSAS
        solutions


        Qualified candidates must apply online by visiting our website at
        www.morganlewis.com and selecting “Careers.”


        Morgan, Lewis & Bockius LLP is committed to 


        Pursuant to applicable state and municipal Fair Chance Laws and
        Ordinances, we will consider for employment qualified applicants with
        arrest and conviction records.


        California Applicants: Pursuant to the California Consumer Privacy Act,
        the following link contains the Firm's California Consumer Privacy Act
        Privacy Notice for Candidates which explains the categories of personal
        information that we collect and the purposes for which we use such
        personal information. CCPA Privacy Notice for Candidates


        Morgan Lewis & Bockius LLP is also 


        If You Are Interested In Applying For Employment With Morgan Lewis And
        Need Special Assistance Or An Accommodation To Use Our Website Or To
        Apply For a Position, Please Call Or Email The Following Contacts


        Professional Staff positions – 1.888.534.5003 /
        [email protected] 


        Morgan, Lewis & Bockius, LLP reasonably accommodates applicants and
        employees who need them to perform the essential functions of the job
        because of disability, religious belief, or other reason protected by
        applicable law. If you believe you need a reasonable accommodation
        during the application process, please contact Talent Acquisition at
        [email protected].
      - >-
        experience as a data engineer, data architect, with strong Python and
        SQL knowledge. Experience with AWS services and Databricks, and ideal if
        they've developed data pipelines in airflow or any streaming services
        (Kafka, Kinesis, etc). Expert-level competency in Big Data manipulation
        and transformation, both within and outside of a database. Need to have
        competency in API creation, and Machine Learning model deployment.
        Experience mentoring others and can help as a field leader for newer
        team members.Additional Skills & QualificationsExperience building
        decision-support applications based on Data Science and Machine
        LearningExperience building effective, efficient solutions in AWS, using
        Terraform and/or CloudFormation to build infrastructure as
        codeFamiliarity with Snowflake, Airflow, and other Big Data and data
        pipeline frameworksEducation, training, and certifications in
        engineering, computer science, math, statistics, analytics, or cloud
        computing.
  - source_sentence: Digital advertising, MLOps, audience segmentation
    sentences:
      - >-
        experience, skills and abilities will determine where an employee is
        ultimately placed in the pay range.


        Category/Shift


        Salaried Full-Time


        Physical Location:


        6420 Poplar Avenue


        Memphis, TN


        Flexible Remote Work Schedule


        The Job You Will Perform


        Lead the hands-on IT development and deployment of data science and
        advanced analytics solutions for the North American Container (NAC)
        division of International Paper to support business strategies across
        approximately 200 packaging and specialty plants in the US and
        MexicoBreak down complex data science methodologies to business leaders
        in a way that is applicable to our North American Container business
        strategy.Identify opportunities for improving business performance and
        present identified opportunities to senior leadership; proactively
        driving the discovery of business value through data.Collaborate
        directly with NAC business partners to produce user stories, analyze
        source data capabilities, identify issues and opportunities, develop
        data models, and test and deploy innovative analytics solutions and
        systemsLead the application of data science techniques to analyze and
        interpret complex data sets, providing insights and enabling data-driven
        decision-making for North American ContainerLead analytics projects
        through agile or traditional project management methodologiesInfluence
        IT projects/initiatives with project managers, business leaders and
        other IT groups without direct reporting relationships.Work closely with
        IT Application Services team members to follow standards, best
        practices, and consultation for data engineeringRole includes: Data
        analysis, predictive and prescriptive modeling, machine learning, and
        algorithm development; collaborating and cross-training with analytics
        and visualization teams.Under general direction works on complex
        technical issues/problems of a large scope, impact, or importance.
        Independently resolves complex problems that have significant cost.
        Leads new technology innovations that define new “frontiers” in
        technical direction


        The Skills You Will Bring 


        Bachelor’s degree in Computer Science, Information Technology,
        Statistics, or a related field is required. A Masters degree and/or PhD
        is preferred.Minimum 12 years of relevant work experience, less if
        holding a Masters or PhD.Skills with Data Visualization using tools like
        Microsoft Power BIDemonstrated leadership in building and deploying
        advanced analytics models for solving real business problems.Strong
        Interpersonal and Communication SkillsAdaptable to a changing work
        environment and dealing with ambiguity as it arises. Data Science
        Skills:Data analysisPredictive and Prescriptive ModelingMachine Learning
        (Python / R)Artificial Intelligence and Large Language ModelsAlgorithm
        DevelopmentExperience with Azure Analytics ServicesCompetencies:Dealing
        with AmbiguityFunctional / Technical Skills Problem SolvingCreativity

        The Benefits You Will Enjoy


        Paid time off including Vacation and Holidays Retirement and 401k
        Matching ProgramMedical & Dental Education & Development (including
        Tuition Reimbursement)Life & Disability Insurance


        The Career You Will Build


        Leadership trainingPromotional opportunities


        The Impact You Will Make


        We continue to build a better future for people, the plant, and our
        company! IP has been a good steward of sustainable practices across
        communities around the world for more than 120 years. Join our team and
        you’ll see why our team members say they’re Proud to be IP.


        The Culture You Will Experience


        International Paper promotes employee well-being by providing safe,
        caring and inclusive workplaces. You will learn Safety Leadership
        Principles and have the opportunity to opt into Employee Networking
        Circles such as IPVets, IPride, Women in IP, and the African American
        ENC. We invite you to bring your uniqueness, creativity, talents,
        experiences, and safety mindset to be a part of our increasingly diverse
        culture.


        The Company You Will Join


        International Paper (NYSE: IP) is a leading global supplier of renewable
        fiber-based products. We produce corrugated packaging products that
        protect and promote goods, and enable worldwide commerce, and pulp for
        diapers, tissue and other personal care products that promote health and
        wellness. Headquartered in Memphis, Tenn., we employ approximately
        38,000 colleagues globally. We serve customers worldwide, with
        manufacturing operations in North America, Latin America, North Africa
        and Europe. Net sales for 2021 were $19.4 billion. Additional
        information can be found by visiting InternationalPaper.com.


        International Paper is an Equal Opportunity/Affirmative Action Employer.
        All qualified applicants will receive consideration for employment
        without regard to sex, gender identity, sexual orientation, race, color,
        religion, national origin, disability, protected veteran status, age, or
        any other characteristic protected by law.
      - >-
        experience, education, geographic location, and other factors.
        Description: This role is within an organization responsible for
        developing and maintaining a high-performance Advertising Platform
        across various online properties, including streaming services. The Ad
        Platform Research team focuses on transforming advertising with data and
        AI, seeking a lead machine learning engineer to develop prediction and
        optimization engines for addressable ad platforms.

        Key responsibilities include driving innovation, developing scalable
        solutions, collaborating with teams, and mentoring. Preferred
        qualifications include experience in digital advertising, knowledge of
        ML operations, and proficiency in relevant technologies like PyTorch and
        TensorFlow.

        Basic Qualifications:MS or PhD in computer science or EE.4+ years of
        working experience on machine learning, and statistics in leading
        internet companies.Experience in the advertising domain is
        preferred.Solid understanding of ML technologies, mathematics, and
        statistics.Proficient with Java, Python, Scala, Spark, SQL, large scale
        ML/DL platforms and processing tech stack.

        Preferred Qualifications:Experience in digital video advertising or
        digital marketing domain.Experience with feature store, audience
        segmentation and MLOps.Experience with Pytorch, TensorFlow, Kubeflow,
        SageMaker or Databricks.

        If you are interested in this role, then please click APPLY NOW. For
        other opportunities available at Akkodis, or any questions, please
        contact Amit Kumar Singh at [email protected].

        Equal Opportunity Employer/Veterans/Disabled

        Benefit offerings include medical, dental, vision, term life insurance,
        short-term disability insurance, additional voluntary benefits, commuter
        benefits, and a 401K plan. Our program provides employees the
        flexibility to choose the type of coverage that meets their individual
        needs. Available paid leave may include Paid Sick Leave, where required
        by law; any other paid leave required by Federal, State, or local law;
        and Holiday pay upon meeting eligibility criteria. Disclaimer: These
        benefit offerings do not apply to client-recruited jobs and jobs which
        are direct hire to a client.

        To read our Candidate Privacy Information Statement, which explains how
        we will use your information, please visit
        https://www.akkodis.com/en/privacy-policy.
      - >-
        Qualifications

        Master's degree is preferred in a Technical Field, Computer Science,
        Information Technology, or Business ManagementGood understanding of data
        structures and algorithms, ETL processing, large-scale data and
        machine-learning production, data and computing infrastructure,
        automation and workflow orchestration.Hands-on experience in Python,
        Pyspark, SQL, and shell scripting or similar programming
        languagesHands-on Experience in using cloud-based technologies
        throughout data and machine learning product development.Hands-on
        experience with code versioning, automation and workflow orchestration
        tools such as Github, Ansible, SLURM, Airflow and TerraformGood
        Understanding of data warehousing concepts such as data migration and
        data integration in Amazon Web Services (AWS) cloud or similar
        platformExcellent debugging and code-reading skills.Documentation and
        structured programming to support sustainable development.Ability to
        describe challenges and solutions in both technical and business
        terms.Ability to develop and maintain excellent working relationships at
        all organizational levels.
  - source_sentence: Geospatial data management, spatial analysis, PostGIS expertise
    sentences:
      - >-
        experiences, revenue generation, ad targeting, and other business
        outcomes.Conduct data processing and analysis to uncover hidden
        patterns, correlations, and insights.Design and implement A/B testing
        frameworks to test model quality and effectiveness.Collaborate with
        engineering and product development teams to integrate data science
        solutions into our products and services.Stay up-to-date with the latest
        technologies and techniques in data science, machine learning, and
        artificial intelligence.

        Technical Requirements:Strong proficiency in programming languages such
        as Python or R for data analysis and modeling.Extensive experience with
        machine learning techniques and algorithms, such as k-NN, Naive Bayes,
        SVM, Decision Forests, etc.Knowledge of advanced statistical techniques
        and concepts (regression, properties of distributions, statistical
        tests, etc.).Experience with data visualization tools (e.g., Matplotlib,
        Seaborn, Tableau).Familiarity with big data frameworks and tools (e.g.,
        Hadoop, Spark).Proficient in using query languages such as
        SQL.Experience with cloud computing platforms (AWS, Azure, or Google
        Cloud) is a plus.Understanding of software development practices and
        tools, including version control (Git).

        Experience:3+ years of experience in a Data Scientist or similar
        role.Demonstrated success in developing and deploying data models,
        algorithms, and predictive analytics solutions.Experience working with
        large, complex datasets and solving analytical problems using
        quantitative approaches.

        Who You Are:Analytically minded with a passion for uncovering insights
        through data analysis.Creative problem solver who is eager to tackle
        complex challenges.Excellent communicator capable of explaining complex
        technical concepts to non-technical stakeholders.Self-motivated and able
        to work independently in a remote environment.A collaborative team
        player who thrives in a dynamic, fast-paced setting.

        Join Us:At RTeams, you'll be part of an innovative company that values
        the transformative power of data. Enjoy the flexibility of remote work
        across the US, with standard working hours that support work-life
        balance. Here, we believe in empowering our team members to innovate,
        explore, and make a significant impact.
      - >-
        Skills:Intermediate Level MS Excel (Pivot & Macros knowledge
        helpful)Intermediate Level MS PowerPoint (Presentation Slides &
        Charts)Familiarity with Data Storage platforms, directories and network
        drivesVBA ConceptsSQL BasicData Visualization Concepts


        Soft Skills:Punctuality is required due to the reporting deadlines & on
        time delivery of dataOrganizedTeam playerCurious & Quick Learner


        Education/Experience:Associate Degree in a technical field such as
        computer science, computer engineering or related field required2 -3
        years of experience requiredProcess certification, such as, Six Sigma,
        CBPP, BPM, ISO 20000, ITIL, CMMI


        Summary: The main function of the Data Analyst is to provide business
        intelligence support and supporting areas by means of both repeatable
        and ad hoc reporting delivery reports (charts, graphs, tables, etc.)
        that enable informed business decisions.  

        Job
      - >-
        experience.Support database architecture performance and
        optimization.Support, and explore new ways to monetize Galehead’s
        geospatial tools, including entering new verticals.Provide as-needed
        support for both technical and business issues related to geospatial
        tools and outputs, including coaching/training other team members, as
        needed.Collaborate to develop new analytic data productsWrite and
        maintain a suite of automated data processesBring your best stuff: we
        need the best from everyone.

        KEY REQUIREMENTS:Ability to create reproducible data processes,
        products, and visualizations using Python and SQL (or similar).Strong
        analytical and problem solving skills.Experience with open source
        geospatial processing tools including PostGIS (or other spatial SQL),
        GDAL/OGR, and/or Geopandas.Communications: Effective and thoughtful
        written and verbal communications. Work through issues or differing
        perspectives in a concise and professional manner.Organization: Maintain
        focus and extract value from the high volume of opportunities through
        command of the mission and meticulous organization of information,
        communications, resources and responsibilities.Collaboration: Serve as a
        resource to the entire team and facilitate getting work completed
        cross-functionally.

        PREFERED SKILLS/CAPABILITIESExperience using Postgresql including
        complex analytic queries and performance considerations.Energy industry
        experience.Experience in software development practices including, but
        not limited to Git, Jira, Agileogr/gdalpostgres/postgispython -
        (pandas/geopandas)

        GALEHEAD CULTURE:Accountability: Set and maintain high standards for
        yourself and your coworkers.Problem-Solving: Willingness to consider
        problems and find/drive a path forward. Identify and recommend
        solutions.Our Values:Bold: Demonstrate a bias for action and stretching
        conventional boundaries with a commensurate ability to acknowledge,
        define, and mitigate risk.Driven: Demonstrate an inherent motivation to
        succeed, regardless of externalities.True: Demonstrate transparency at
        all times, provide and accept constructive feedback.
  - source_sentence: Data analysis, statistical modeling, data visualization
    sentences:
      - >-
        Skills: AWS, Spark, Adobe Analytics/AEP(Adobe Experience Platform)
        platform experience, Glue, Lamda, Python, Scala, EMR, Talend,
        PostgreSQL, Redshift

         Configure AEP to get the data set needed and then use spark (AWS glue ) to load data in the data lake Evaluate new use cases and design ETL technical solutions to meet requirements Develop ETL solutions to meet complex use cases

        Adobe Data Engineer || Remote
      - >-
        experience solutions and technologies.This is a hybrid position, with
        the ideal candidate located near one of our regional hubs (New York,
        Chicago, Boston) and able to travel to an office as needed for working
        sessions or team meetings.

        Curinos is looking for a Senior Data Engineering Manager to lead the
        build and expansion of our Retail Consumer product suite, relied on by
        our clients for precision deposit analysis and optimization. Our Retail
        Consumer business covers the largest suite of Curinos products and this
        position is a critical role within the Product Development team,
        combining both hands-on technical work (architecture, roadmap, code
        review, POC of new/complex methodologies) and team management.In this
        role, you will lead a cross-functional Product Development team of
        Software, Data and QA engineers covering all aspects of product
        development (UI/Middle Tier/API/Backend/ETL). You will collaborate with
        product owners on business requirements and features, work with the
        development team to identify scalable architecture and methodologies
        needed to implement, and own the timely and error-free delivery of those
        features. You will be expected to be “hands-on-keys” in this role,
        leading the team by example and helping to establish and model quality
        software development practices as the team, products and business
        continues to grow.

        ResponsibilitiesBuilding and leading a Product Engineering team
        consisting of Software, Data and QA EngineersModeling quality software
        development practices to the team by taking on user stories and writing
        elegant and scalable codeConducting code reviews and providing feedback
        to help team members advance their skillsLeading the design and
        development of performant, extendable and maintainable product
        functionality, and coaching the team on the principles of efficient and
        scalable designEngaging with product owner and LOB head to understand
        client needs and craft product roadmaps and requirementsProviding input
        into the prioritization of features to maximize value delivered to
        clientsAnalyzing complex business problems and identifying solutions and
        own the implementationIdentifying new technologies and tools which could
        improve the efficiency and productivity of your teamWorking with in the
        Agile framework to manage the team’s day-to-day activitiesUnderstanding
        Curinos’ Application, API and Data Engineering platforms and effectively
        using them to build product featuresUnderstanding Curinos’ SDLC and
        compliance processes and ensuring the team’s adherence to them

        Base Salary Range: $160,000 to $185,000 (plus bonus)

        Desired Skills & Expertise6+ years professional full stack experience
        developing cloud based SaaS products using Java, SPA and related
        technologies with a complex backend data processing system[SW1][NS2]3+
        years of experience with SQL Server or Databricks ETL, including
        hands-on experience developing SQL stored procedures and SQL-based ETL
        pipelines2+ Years of management experience of engineers/ICsProven
        ability to grow and lead geographically dispersed and cross-functional
        teamsA passion for proactively identifying opportunities to eliminate
        manual work within the SDLC process and as part of product operationA
        commitment to building a quality and error-free product, via
        implementation of unit testing, integration testing, and data validation
        strategiesA desire to design and develop for scale and in anticipation
        of future use casesDemonstrated intellectual curiosity and innovative
        thinking with a passion for problem-solvingSelf–discipline and
        willingness to learn new skills, tools and technologiesExcellent verbal
        and written communication skillsAdvanced proficiency in Java (including
        testing frameworks like Junit) and T-SQL (including dynamic sql and the
        use of control structures) is an assetExperience using Scala is a
        plusExperience using a templating language like Apache Freemarker is a
        plusBachelors or advanced degrees (Masters or PhD) degree, preferably in
        computer science, or a related engineering field

        Why work at Curinos?Competitive benefits, including a range of
        Financial, Health and Lifestyle benefits to choose fromFlexible working
        options, including home working, flexible hours and part time options,
        depending on the role requirements – please ask!Competitive annual
        leave, floating holidays, volunteering days and a day off for your
        birthday!Learning and development tools to assist with your career
        developmentWork with industry leading Subject Matter Experts and
        specialist productsRegular social events and networking
        opportunitiesCollaborative, supportive culture, including an active DE&I
        programEmployee Assistance Program which provides expert third-party
        advice on wellbeing, relationships, legal and financial matters, as well
        as access to counselling services

        Applying:We know that sometimes the 'perfect candidate' doesn't exist,
        and that people can be put off applying for a job if they don't meet all
        the requirements. If you're excited about working for us and have
        relevant skills or experience, please go ahead and apply. You could be
        just what we need!If you need any adjustments to support your
        application, such as information in alternative formats, special
        requirements to access our buildings or adjusted interview formats
        please contact us at [email protected] and we’ll do everything we can
        to help.

        Inclusivity at Curinos:We believe strongly in the value of diversity and
        creating supportive, inclusive environments where our colleagues can
        succeed. As such, Curinosis proud to be
      - |-
        Qualifications
         Data Science, Statistics, and Data Analytics skillsData Visualization and Data Analysis skillsExperience with machine learning algorithms and predictive modelingProficiency in programming languages such as Python or RStrong problem-solving and critical thinking abilitiesExcellent communication and presentation skillsAbility to work independently and remotelyExperience in the field of data science or related rolesBachelor's degree in Data Science, Statistics, Computer Science, or a related field
  - source_sentence: NLP algorithm development, statistical modeling, biomedical informatics
    sentences:
      - >-
        skills for this position are:Natural Language Processing (NLP)Python
        (Programming Language)Statistical ModelingHigh-Performance Liquid
        Chromatography (HPLC)Java Job Description:We are seeking a highly
        skilled NLP Scientist to develop our innovative and cutting-edge NLP/AI
        solutions to empower life science. This involves working directly with
        our clients, as well as cross-functional Biomedical Science,
        Engineering, and Business leaders, to identify, prioritize, and develop
        NLP/AI and Advanced analytics products from inception to delivery.Key
        requirements and design innovative NLP/AI solutions.Develop and validate
        cutting-edge NLP algorithms, including large language models tailored
        for healthcare and biopharma use cases.Translate complex technical
        insights into accessible language for non-technical stakeholders.Mentor
        junior team members, fostering a culture of continuous learning and
        growth.Publish findings in peer-reviewed journals and conferences.Engage
        with the broader scientific community by attending conferences,
        workshops, and collaborating on research projects. Qualifications:Ph.D.
        or master's degree in biomedical NLP, Computer Science, Biomedical
        Informatics, Computational Linguistics, Mathematics, or other related
        fieldsPublication records in leading computer science or biomedical
        informatics journals and conferences are highly desirable


        Regards,Guru Prasath M US IT RecruiterPSRTEK Inc.Princeton, NJ
        [email protected]: 609-917-9967 Ext:114
      - >-
        Qualifications and Experience:


        Bachelor’s degree in data science, Statistics, or related field, or an
        equivalent combination of education and experience.Working knowledge of
        Salesforce.Ability to leverage enterprise data for advanced
        reporting.Proficiency in combining various data sources for robust
        output.Strong knowledge of Annuity products and distribution
        structure.Influencing skills and change management abilities.4-6 years
        of experience in financial services.Strong organizational skills.Proven
        success in influencing across business units and management
        levels.Confidence and ability to make effective business
        decisions.Willingness to travel (less. than 10%)


        Drive. Discipline. Confidence. Focus. Commitment. Learn more about
        working at Athene.


        Athene is a Military Friendly Employer! Learn more about how we support
        our Veterans.


        Athene celebrates diversity, is committed to inclusion and is proud to
        be
      - >-
        Skills :

        a) Azure Data Factory – Min 3 years of project experiencea. Design of
        pipelinesb. Use of project with On-prem to Cloud Data Migrationc.
        Understanding of ETLd. Change Data Capture from Multiple Sourcese. Job
        Schedulingb) Azure Data Lake – Min 3 years of project experiencea. All
        steps from design to deliverb. Understanding of different Zones and
        design principalc) Data Modeling experience Min 5 Yearsa. Data
        Mart/Warehouseb. Columnar Data design and modelingd) Reporting using
        PowerBI Min 3 yearsa. Analytical Reportingb. Business Domain Modeling
        and data dictionary

        Interested please apply to the job, looking only for W2 candidates.
datasets:
  - Mubin/ai-job-embedding-finetuning
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
model-index:
  - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: ai job validation
          type: ai-job-validation
        metrics:
          - type: cosine_accuracy
            value: 0.9702970297029703
            name: Cosine Accuracy
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: ai job test
          type: ai-job-test
        metrics:
          - type: cosine_accuracy
            value: 0.9803921568627451
            name: Cosine Accuracy

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the ai-job-embedding-finetuning dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- ai-job-embedding-finetuning

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Mubin/allmini-ai-embedding-similarity")
# Run inference
sentences = [
    'NLP algorithm development, statistical modeling, biomedical informatics',
    "skills for this position are:Natural Language Processing (NLP)Python (Programming Language)Statistical ModelingHigh-Performance Liquid Chromatography (HPLC)Java Job Description:We are seeking a highly skilled NLP Scientist to develop our innovative and cutting-edge NLP/AI solutions to empower life science. This involves working directly with our clients, as well as cross-functional Biomedical Science, Engineering, and Business leaders, to identify, prioritize, and develop NLP/AI and Advanced analytics products from inception to delivery.Key requirements and design innovative NLP/AI solutions.Develop and validate cutting-edge NLP algorithms, including large language models tailored for healthcare and biopharma use cases.Translate complex technical insights into accessible language for non-technical stakeholders.Mentor junior team members, fostering a culture of continuous learning and growth.Publish findings in peer-reviewed journals and conferences.Engage with the broader scientific community by attending conferences, workshops, and collaborating on research projects. Qualifications:Ph.D. or master's degree in biomedical NLP, Computer Science, Biomedical Informatics, Computational Linguistics, Mathematics, or other related fieldsPublication records in leading computer science or biomedical informatics journals and conferences are highly desirable\n\nRegards,Guru Prasath M US IT RecruiterPSRTEK Inc.Princeton, NJ [email protected]: 609-917-9967 Ext:114",
    'Skills :\na) Azure Data Factory – Min 3 years of project experiencea. Design of pipelinesb. Use of project with On-prem to Cloud Data Migrationc. Understanding of ETLd. Change Data Capture from Multiple Sourcese. Job Schedulingb) Azure Data Lake – Min 3 years of project experiencea. All steps from design to deliverb. Understanding of different Zones and design principalc) Data Modeling experience Min 5 Yearsa. Data Mart/Warehouseb. Columnar Data design and modelingd) Reporting using PowerBI Min 3 yearsa. Analytical Reportingb. Business Domain Modeling and data dictionary\nInterested please apply to the job, looking only for W2 candidates.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Datasets: ai-job-validation and ai-job-test
Evaluated with TripletEvaluator

Metric	ai-job-validation	ai-job-test
cosine_accuracy	0.9703	0.9804

Training Details

Training Dataset

ai-job-embedding-finetuning

Dataset: ai-job-embedding-finetuning at b18b3c2
Size: 812 training samples
Columns: query, job_description_pos, and job_description_neg

Approximate statistics based on the first 812 samples:

	query	job_description_pos	job_description_neg
type	string	string	string
details	min: 7 tokens mean: 15.03 tokens max: 38 tokens	min: 6 tokens mean: 216.92 tokens max: 256 tokens	min: 6 tokens mean: 217.63 tokens max: 256 tokens

Samples:

query	job_description_pos	job_description_neg
`Data Engineering Lead, Databricks administration, Neo4j expertise, ETL processes`	Requirements Experience: At least 6 years of hands-on experience in deploying production-quality code, with a strong preference for experience in Python, Java, or Scala for data processing (Python preferred).Technical Proficiency: Advanced knowledge of data-related Python packages and a profound understanding of SQL and Databricks.Graph Database Expertise: Solid grasp of Cypher and experience with graph databases like Neo4j.ETL/ELT Knowledge: Proven track record in implementing ETL (or ELT) best practices at scale and familiarity with data pipeline tools. Preferred Qualifications Professional experience using Python, Java, or Scala for data processing (Python preferred) Working Conditions And Physical Requirements Ability to work for long periods at a computer/deskStandard office environment About The Organization Fullsight is an integrated brand of our three primary affiliate companies – SAE Industry Technologies Consortia, SAE International and Performance Review Institute – a...	skills through a combination of education, work experience, and hobbies. You are excited about the complexity and challenges of creating intelligent, high-performance systems while working with a highly experienced and driven data science team. If this described you, we are interested. You can be an integral part of a cross-disciplinary team working on highly visible projects that improve performance and grow the intelligence in our Financial Services marketing product suite. Our day-to-day work is performed in a progressive, high-tech workspace where we focus on a friendly, collaborative, and fulfilling environment. Key Duties/Responsibilities Leverage a richly populated feature stores to understand consumer and market behavior. 20%Implement a predictive model to determine whether a person or household is likely to open a lending or deposit account based on the advertising signals they've received. 20%Derive a set of new features that will help better understand the interplay betwe...
`Snowflake data warehousing, Python design patterns, AWS tools expertise`	Requirements: - Good communication; and problem-solving abilities- Ability to work as an individual contributor; collaborating with Global team- Strong experience with Data Warehousing- OLTP, OLAP, Dimension, Facts, Data Modeling- Expertise implementing Python design patterns (Creational, Structural and Behavioral Patterns)- Expertise in Python building data application including reading, transforming; writing data sets- Strong experience in using boto3, pandas, numpy, pyarrow, Requests, Fast API, Asyncio, Aiohttp, PyTest, OAuth 2.0, multithreading, multiprocessing, snowflake python connector; Snowpark- Experience in Python building data APIs (Web/REST APIs)- Experience with Snowflake including SQL, Pipes, Stream, Tasks, Time Travel, Data Sharing, Query Optimization- Experience with Scripting language in Snowflake including SQL Stored Procs, Java Script Stored Procedures; Python UDFs- Understanding of Snowflake Internals; experience in integration with Reporting; UI applications- Stron...	`skills and ability to lead detailed data analysis meetings/discussions. Ability to work collaboratively with multi-functional and cross-border teams. Good English communication written and spoken. Nice to have; Material master create experience in any of the following areas; SAP GGSM SAP Data Analyst, MN/Remote - Direct Client`
`Cloud Data Engineering, Databricks Pyspark, Data Warehousing Design`	Experience of Delta Lake, DWH, Data Integration, Cloud, Design and Data Modelling. Proficient in developing programs in Python and SQLExperience with Data warehouse Dimensional data modeling. Working with event based/streaming technologies to ingest and process data. Working with structured, semi structured and unstructured data. Optimize Databricks jobs for performance and scalability to handle big data workloads. Monitor and troubleshoot Databricks jobs, identify and resolve issues or bottlenecks. Implement best practices for data management, security, and governance within the Databricks environment. Experience designing and developing Enterprise Data Warehouse solutions. Proficient writing SQL queries and programming including stored procedures and reverse engineering existing process. Perform code reviews to ensure fit to requirements, optimal execution patterns and adherence to established standards. Requirements: You are: Minimum 9+ years of experience is required. 5+ years...	QualificationsExpert knowledge of using and configuring GCP (Vertex), AWS, Azure Python: 5+ years of experienceMachine Learning libraries: Pytorch, JaxDevelopment tools: Bash, GitData Science frameworks: DatabricksAgile Software developmentCloud Management: Slurm, KubernetesData Logging: Weights and BiasesOrchestration, Autoscaling: Ray, ClearnML, WandB etc. Optional QualificationsExperience training LLMs and VLMsML for Robotics, Computer Vision etc.Developing Browser Apps/Dashboards, both frontend and backend Javascript, React, etc. Emancro is committed to equal employment opportunities regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status.

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

ai-job-embedding-finetuning

Dataset: ai-job-embedding-finetuning at b18b3c2
Size: 101 evaluation samples
Columns: query, job_description_pos, and job_description_neg

Approximate statistics based on the first 101 samples:

	query	job_description_pos	job_description_neg
type	string	string	string
details	min: 10 tokens mean: 15.78 tokens max: 51 tokens	min: 9 tokens mean: 220.13 tokens max: 256 tokens	min: 21 tokens mean: 213.07 tokens max: 256 tokens

Samples:

query	job_description_pos	job_description_neg
`Big Data Engineer, Spark, Hadoop, AWS/GCP`	Skills • Expertise and hands-on experience on Spark, and Hadoop echo system components – Must Have • Good and hand-on experience* of any of the Cloud (AWS/GCP) – Must Have • Good knowledge of HiveQL & SparkQL – Must Have Good knowledge of Shell script & Java/Scala/python – Good to Have • Good knowledge of SQL – Good to Have • Good knowledge of migration projects on Hadoop – Good to Have • Good Knowledge of one of the Workflow engines like Oozie, Autosys – Good to Have Good knowledge of Agile Development– Good to Have • Passionate about exploring new technologies – Good to Have • Automation approach – Good to Have Thanks & RegardsShahrukh KhanEmail: [email protected]	experience: GS-14: Supervisory/Managerial Organization Leadership Supervises an assigned branch and its employees. The work directed involves high profile data science projects, programs, and/or initiatives within other federal agencies.Provides expert advice in the highly technical and specialized area of data science and is a key advisor to management on assigned/delegated matters related to the application of mathematics, statistical analysis, modeling/simulation, machine learning, natural language processing, and computer science from a data science perspective.Manages workforce operations, including recruitment, supervision, scheduling, development, and performance evaluations.Keeps up to date with data science developments in the private sector; seeks out best practices; and identifies and seizes opportunities for improvements in assigned data science program and project operations. Senior Expert in Data Science Recognized authority for scientific data analysis using advanc...
`Time series analysis, production operations, condition-based monitoring`	Experience in Production Operations or Well Engineering Strong scripting/programming skills (Python preferable) Desired: Strong time series surveillance background (eg. OSI PI, PI AF, Seeq) Strong scripting/programming skills (Python preferable) Strong communication and collaboration skills Working knowledge of machine learning application (eg. scikit-learn) Working knowledge of SQL and process historians Delivers positive results through realistic planning to accomplish goals Must be able to handle multiple concurrent tasks with an ability to prioritize and manage tasks effectively Apex Systems is Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in man...	Qualifications:· 3-5 years of experience as a hands-on analyst in an enterprise setting, leveraging Salesforce, Marketo, Dynamics, and similar tools.· Excellent written and verbal communication skills.· Experience with data enrichment processes and best practices.· Strong understanding of B2B sales & marketing for large, complex organizations.· Expertise in querying, manipulating, and analyzing data using SQL and/or similar languages.· Advanced Excel skills and experience with data platforms like Hadoop and Databricks.· Proven proficiency with a data visualization tool like Tableau or Power BI.· Strong attention to detail with data quality control and integration expertise.· Results-oriented, self-directed individual with multi-tasking, problem-solving, and independent learning abilities.· Understanding of CRM systems like Salesforce and Microsoft Dynamics.· Solid grasp of marketing practices, principles, KPIs, and data types.· Familiarity with logical data architecture and cloud data ...
`Senior Data Analyst jobs with expertise in Power BI, NextGen EHR, and enterprise ETL.`	requirements.Reporting and Dashboard Development: Design, develop, and maintain reports for the HRSA HCCN Grant and other assignments. Create and maintain complex dashboards using Microsoft Power BI.Infrastructure Oversight: Monitor and enhance the data warehouse, ensuring efficient data pipelines and timely completion of tasks.Process Improvements: Identify and implement internal process improvements, including automating manual processes and optimizing data delivery.Troubleshooting and Maintenance: Address data inconsistencies using knowledge of various database structures and workflow best practices, including NextGen EHR system.Collaboration and Mentorship: Collaborate with grant PHCs and analytic teams, mentor less senior analysts, and act as a project lead for specific deliverables. Experience:Highly proficient in SQL and experienced with reporting packages.Enterprise ETL experience is a major plus!data visualization tools (e.g., Tableau, Power BI, Qualtrics).Azure, Azure Data Fa...	Qualifications 3 to 5 years of experience in exploratory data analysisStatistics Programming, data modeling, simulation, and mathematics Hands on working experience with Python, SQL, R, Hadoop, SAS, SPSS, Scala, AWSModel lifecycle executionTechnical writingData storytelling and technical presentation skillsResearch SkillsInterpersonal SkillsModel DevelopmentCommunicationCritical ThinkingCollaborate and Build RelationshipsInitiative with sound judgementTechnical (Big Data Analysis, Coding, Project Management, Technical Writing, etc.)Problem Solving (Responds as problems and issues are identified)Bachelor's Degree in Data Science, Statistics, Mathematics, Computers Science, Engineering, or degrees in similar quantitative fields Desired Qualification(s) Master's Degree in Data Science, Statistics, Mathematics, Computer Science, or Engineering Hours: Monday - Friday, 8:00AM - 4:30PM Locations: 820 Follin Lane, Vienna, VA 22180

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	ai-job-validation_cosine_accuracy	ai-job-test_cosine_accuracy
0	0	0.9307	-
1.0	51	0.9703	0.9804

Framework Versions

Python: 3.11.11
Sentence Transformers: 3.3.1
Transformers: 4.47.1
PyTorch: 2.5.1+cu121
Accelerate: 1.2.1
Datasets: 3.2.0
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}