Mubin commited on
Commit
5341024
·
verified ·
1 Parent(s): d8566d3

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,806 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:812
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: sentence-transformers/all-MiniLM-L6-v2
10
+ widget:
11
+ - source_sentence: Data engineering, AWS services, Big Data manipulation
12
+ sentences:
13
+ - 'Skills: SQL, PySpark, Databricks, Azure Synapse, Azure Data Factory.
14
+
15
+ Need hands-on coding
16
+
17
+ Requirements:1. Extensive knowledge of any of the big cloud services - Azure,
18
+ AWS or GCP with practical implementation (like S3, ADLS, Airflow, ADF, Lamda,
19
+ BigQuery, EC2, Fabric, Databricks or equivalent)2. Strong Hands-on experience
20
+ in SQL and Python/PySpark programming knowledge. Should be able to write code
21
+ during an interview with minimal syntax error.3. Strong foundational and architectural
22
+ knowledge of any of the data warehouses - Snowflake, Redshift. Synapse etc.4.
23
+ Should be able to drive and deliver projects with little or no guidance. Take
24
+ ownership, become a self-learner, and have leadership qualities.'
25
+ - "requirements, and general interpretation of dataMentor, teach, share knowledge\
26
+ \ and analytic techniques with your colleagues\n\nExperience And Preferred Qualifications\n\
27
+ \nMinimum of three years of relevant experience in developing analytic solutions\
28
+ \ with proficiency in SQL, Microsoft Excel, Power BI, or similar data analysis\
29
+ \ and ETL toolsBachelor's degree (B.S./B.A.) in an appropriate field from an accredited\
30
+ \ college or universityStrong verbal and written communication skills with the\
31
+ \ ability to convey highly complex technical concepts down to actionable objectives\
32
+ \ to advise stakeholders including attorneys, firm management, and firm colleaguesExperience\
33
+ \ in project management including planning, organizing, and supervising clients\
34
+ \ and colleagues towards successful project completionDemonstrated ability to\
35
+ \ learn and succeed in a fast-paced environmentExpert level of proficiency with\
36
+ \ T-SQL or equivalent including a high level of proficiency in database administrationHigh\
37
+ \ proficiency with Microsoft Excel including an ability to create pivot tables,\
38
+ \ power pivots & queries, formulas, and external data connectionsAbility to design\
39
+ \ and implement ETL solutionsExperience in developing client facing visualizations\
40
+ \ and reports using Power BI, SSRS or similar visualization tools is a plusKnowledge\
41
+ \ of coding in Python, R, DAX and/or MExperience in developing SSIS and/or SSAS\
42
+ \ solutions\n\nQualified candidates must apply online by visiting our website\
43
+ \ at www.morganlewis.com and selecting “Careers.”\n\nMorgan, Lewis & Bockius LLP\
44
+ \ is committed to \n\nPursuant to applicable state and municipal Fair Chance Laws\
45
+ \ and Ordinances, we will consider for employment qualified applicants with arrest\
46
+ \ and conviction records.\n\nCalifornia Applicants: Pursuant to the California\
47
+ \ Consumer Privacy Act, the following link contains the Firm's California Consumer\
48
+ \ Privacy Act Privacy Notice for Candidates which explains the categories of personal\
49
+ \ information that we collect and the purposes for which we use such personal\
50
+ \ information. CCPA Privacy Notice for Candidates\n\nMorgan Lewis & Bockius LLP\
51
+ \ is also \n\nIf You Are Interested In Applying For Employment With Morgan Lewis\
52
+ \ And Need Special Assistance Or An Accommodation To Use Our Website Or To Apply\
53
+ \ For a Position, Please Call Or Email The Following Contacts\n\nProfessional\
54
+ \ Staff positions – 1.888.534.5003 / [email protected] \n\nMorgan,\
55
+ \ Lewis & Bockius, LLP reasonably accommodates applicants and employees who need\
56
+ \ them to perform the essential functions of the job because of disability, religious\
57
+ \ belief, or other reason protected by applicable law. If you believe you need\
58
+ \ a reasonable accommodation during the application process, please contact Talent\
59
+ \ Acquisition at [email protected]."
60
+ - experience as a data engineer, data architect, with strong Python and SQL knowledge.
61
+ Experience with AWS services and Databricks, and ideal if they've developed data
62
+ pipelines in airflow or any streaming services (Kafka, Kinesis, etc). Expert-level
63
+ competency in Big Data manipulation and transformation, both within and outside
64
+ of a database. Need to have competency in API creation, and Machine Learning model
65
+ deployment. Experience mentoring others and can help as a field leader for newer
66
+ team members.Additional Skills & QualificationsExperience building decision-support
67
+ applications based on Data Science and Machine LearningExperience building effective,
68
+ efficient solutions in AWS, using Terraform and/or CloudFormation to build infrastructure
69
+ as codeFamiliarity with Snowflake, Airflow, and other Big Data and data pipeline
70
+ frameworksEducation, training, and certifications in engineering, computer science,
71
+ math, statistics, analytics, or cloud computing.
72
+ - source_sentence: Digital advertising, MLOps, audience segmentation
73
+ sentences:
74
+ - "experience, skills and abilities will determine where an employee is ultimately\
75
+ \ placed in the pay range.\n\nCategory/Shift\n\nSalaried Full-Time\n\nPhysical\
76
+ \ Location:\n\n6420 Poplar Avenue\n\nMemphis, TN\n\nFlexible Remote Work Schedule\n\
77
+ \nThe Job You Will Perform\n\nLead the hands-on IT development and deployment\
78
+ \ of data science and advanced analytics solutions for the North American Container\
79
+ \ (NAC) division of International Paper to support business strategies across\
80
+ \ approximately 200 packaging and specialty plants in the US and MexicoBreak down\
81
+ \ complex data science methodologies to business leaders in a way that is applicable\
82
+ \ to our North American Container business strategy.Identify opportunities for\
83
+ \ improving business performance and present identified opportunities to senior\
84
+ \ leadership; proactively driving the discovery of business value through data.Collaborate\
85
+ \ directly with NAC business partners to produce user stories, analyze source\
86
+ \ data capabilities, identify issues and opportunities, develop data models, and\
87
+ \ test and deploy innovative analytics solutions and systemsLead the application\
88
+ \ of data science techniques to analyze and interpret complex data sets, providing\
89
+ \ insights and enabling data-driven decision-making for North American ContainerLead\
90
+ \ analytics projects through agile or traditional project management methodologiesInfluence\
91
+ \ IT projects/initiatives with project managers, business leaders and other IT\
92
+ \ groups without direct reporting relationships.Work closely with IT Application\
93
+ \ Services team members to follow standards, best practices, and consultation\
94
+ \ for data engineeringRole includes: Data analysis, predictive and prescriptive\
95
+ \ modeling, machine learning, and algorithm development; collaborating and cross-training\
96
+ \ with analytics and visualization teams.Under general direction works on complex\
97
+ \ technical issues/problems of a large scope, impact, or importance. Independently\
98
+ \ resolves complex problems that have significant cost. Leads new technology innovations\
99
+ \ that define new “frontiers” in technical direction\n\nThe Skills You Will Bring\
100
+ \ \n\nBachelor’s degree in Computer Science, Information Technology, Statistics,\
101
+ \ or a related field is required. A Masters degree and/or PhD is preferred.Minimum\
102
+ \ 12 years of relevant work experience, less if holding a Masters or PhD.Skills\
103
+ \ with Data Visualization using tools like Microsoft Power BIDemonstrated leadership\
104
+ \ in building and deploying advanced analytics models for solving real business\
105
+ \ problems.Strong Interpersonal and Communication SkillsAdaptable to a changing\
106
+ \ work environment and dealing with ambiguity as it arises. Data Science Skills:Data\
107
+ \ analysisPredictive and Prescriptive ModelingMachine Learning (Python / R)Artificial\
108
+ \ Intelligence and Large Language ModelsAlgorithm DevelopmentExperience with Azure\
109
+ \ Analytics ServicesCompetencies:Dealing with AmbiguityFunctional / Technical\
110
+ \ Skills Problem SolvingCreativity\nThe Benefits You Will Enjoy\n\nPaid time off\
111
+ \ including Vacation and Holidays Retirement and 401k Matching ProgramMedical\
112
+ \ & Dental Education & Development (including Tuition Reimbursement)Life & Disability\
113
+ \ Insurance\n\nThe Career You Will Build\n\nLeadership trainingPromotional opportunities\n\
114
+ \nThe Impact You Will Make\n\nWe continue to build a better future for people,\
115
+ \ the plant, and our company! IP has been a good steward of sustainable practices\
116
+ \ across communities around the world for more than 120 years. Join our team and\
117
+ \ you’ll see why our team members say they’re Proud to be IP.\n\nThe Culture You\
118
+ \ Will Experience\n\nInternational Paper promotes employee well-being by providing\
119
+ \ safe, caring and inclusive workplaces. You will learn Safety Leadership Principles\
120
+ \ and have the opportunity to opt into Employee Networking Circles such as IPVets,\
121
+ \ IPride, Women in IP, and the African American ENC. We invite you to bring your\
122
+ \ uniqueness, creativity, talents, experiences, and safety mindset to be a part\
123
+ \ of our increasingly diverse culture.\n\nThe Company You Will Join\n\nInternational\
124
+ \ Paper (NYSE: IP) is a leading global supplier of renewable fiber-based products.\
125
+ \ We produce corrugated packaging products that protect and promote goods, and\
126
+ \ enable worldwide commerce, and pulp for diapers, tissue and other personal care\
127
+ \ products that promote health and wellness. Headquartered in Memphis, Tenn.,\
128
+ \ we employ approximately 38,000 colleagues globally. We serve customers worldwide,\
129
+ \ with manufacturing operations in North America, Latin America, North Africa\
130
+ \ and Europe. Net sales for 2021 were $19.4 billion. Additional information can\
131
+ \ be found by visiting InternationalPaper.com.\n\nInternational Paper is an Equal\
132
+ \ Opportunity/Affirmative Action Employer. All qualified applicants will receive\
133
+ \ consideration for employment without regard to sex, gender identity, sexual\
134
+ \ orientation, race, color, religion, national origin, disability, protected veteran\
135
+ \ status, age, or any other characteristic protected by law."
136
+ - 'experience, education, geographic location, and other factors. Description: This
137
+ role is within an organization responsible for developing and maintaining a high-performance
138
+ Advertising Platform across various online properties, including streaming services.
139
+ The Ad Platform Research team focuses on transforming advertising with data and
140
+ AI, seeking a lead machine learning engineer to develop prediction and optimization
141
+ engines for addressable ad platforms.
142
+
143
+ Key responsibilities include driving innovation, developing scalable solutions,
144
+ collaborating with teams, and mentoring. Preferred qualifications include experience
145
+ in digital advertising, knowledge of ML operations, and proficiency in relevant
146
+ technologies like PyTorch and TensorFlow.
147
+
148
+ Basic Qualifications:MS or PhD in computer science or EE.4+ years of working experience
149
+ on machine learning, and statistics in leading internet companies.Experience in
150
+ the advertising domain is preferred.Solid understanding of ML technologies, mathematics,
151
+ and statistics.Proficient with Java, Python, Scala, Spark, SQL, large scale ML/DL
152
+ platforms and processing tech stack.
153
+
154
+ Preferred Qualifications:Experience in digital video advertising or digital marketing
155
+ domain.Experience with feature store, audience segmentation and MLOps.Experience
156
+ with Pytorch, TensorFlow, Kubeflow, SageMaker or Databricks.
157
+
158
+ If you are interested in this role, then please click APPLY NOW. For other opportunities
159
+ available at Akkodis, or any questions, please contact Amit Kumar Singh at [email protected].
160
+
161
+ Equal Opportunity Employer/Veterans/Disabled
162
+
163
+ Benefit offerings include medical, dental, vision, term life insurance, short-term
164
+ disability insurance, additional voluntary benefits, commuter benefits, and a
165
+ 401K plan. Our program provides employees the flexibility to choose the type of
166
+ coverage that meets their individual needs. Available paid leave may include Paid
167
+ Sick Leave, where required by law; any other paid leave required by Federal, State,
168
+ or local law; and Holiday pay upon meeting eligibility criteria. Disclaimer: These
169
+ benefit offerings do not apply to client-recruited jobs and jobs which are direct
170
+ hire to a client.
171
+
172
+ To read our Candidate Privacy Information Statement, which explains how we will
173
+ use your information, please visit https://www.akkodis.com/en/privacy-policy.'
174
+ - 'Qualifications
175
+
176
+ Master''s degree is preferred in a Technical Field, Computer Science, Information
177
+ Technology, or Business ManagementGood understanding of data structures and algorithms,
178
+ ETL processing, large-scale data and machine-learning production, data and computing
179
+ infrastructure, automation and workflow orchestration.Hands-on experience in Python,
180
+ Pyspark, SQL, and shell scripting or similar programming languagesHands-on Experience
181
+ in using cloud-based technologies throughout data and machine learning product
182
+ development.Hands-on experience with code versioning, automation and workflow
183
+ orchestration tools such as Github, Ansible, SLURM, Airflow and TerraformGood
184
+ Understanding of data warehousing concepts such as data migration and data integration
185
+ in Amazon Web Services (AWS) cloud or similar platformExcellent debugging and
186
+ code-reading skills.Documentation and structured programming to support sustainable
187
+ development.Ability to describe challenges and solutions in both technical and
188
+ business terms.Ability to develop and maintain excellent working relationships
189
+ at all organizational levels.'
190
+ - source_sentence: Geospatial data management, spatial analysis, PostGIS expertise
191
+ sentences:
192
+ - 'experiences, revenue generation, ad targeting, and other business outcomes.Conduct
193
+ data processing and analysis to uncover hidden patterns, correlations, and insights.Design
194
+ and implement A/B testing frameworks to test model quality and effectiveness.Collaborate
195
+ with engineering and product development teams to integrate data science solutions
196
+ into our products and services.Stay up-to-date with the latest technologies and
197
+ techniques in data science, machine learning, and artificial intelligence.
198
+
199
+ Technical Requirements:Strong proficiency in programming languages such as Python
200
+ or R for data analysis and modeling.Extensive experience with machine learning
201
+ techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, etc.Knowledge
202
+ of advanced statistical techniques and concepts (regression, properties of distributions,
203
+ statistical tests, etc.).Experience with data visualization tools (e.g., Matplotlib,
204
+ Seaborn, Tableau).Familiarity with big data frameworks and tools (e.g., Hadoop,
205
+ Spark).Proficient in using query languages such as SQL.Experience with cloud computing
206
+ platforms (AWS, Azure, or Google Cloud) is a plus.Understanding of software development
207
+ practices and tools, including version control (Git).
208
+
209
+ Experience:3+ years of experience in a Data Scientist or similar role.Demonstrated
210
+ success in developing and deploying data models, algorithms, and predictive analytics
211
+ solutions.Experience working with large, complex datasets and solving analytical
212
+ problems using quantitative approaches.
213
+
214
+ Who You Are:Analytically minded with a passion for uncovering insights through
215
+ data analysis.Creative problem solver who is eager to tackle complex challenges.Excellent
216
+ communicator capable of explaining complex technical concepts to non-technical
217
+ stakeholders.Self-motivated and able to work independently in a remote environment.A
218
+ collaborative team player who thrives in a dynamic, fast-paced setting.
219
+
220
+ Join Us:At RTeams, you''ll be part of an innovative company that values the transformative
221
+ power of data. Enjoy the flexibility of remote work across the US, with standard
222
+ working hours that support work-life balance. Here, we believe in empowering our
223
+ team members to innovate, explore, and make a significant impact.'
224
+ - "Skills:Intermediate Level MS Excel (Pivot & Macros knowledge helpful)Intermediate\
225
+ \ Level MS PowerPoint (Presentation Slides & Charts)Familiarity with Data Storage\
226
+ \ platforms, directories and network drivesVBA ConceptsSQL BasicData Visualization\
227
+ \ Concepts\n\nSoft Skills:Punctuality is required due to the reporting deadlines\
228
+ \ & on time delivery of dataOrganizedTeam playerCurious & Quick Learner\n\nEducation/Experience:Associate\
229
+ \ Degree in a technical field such as computer science, computer engineering or\
230
+ \ related field required2 -3 years of experience requiredProcess certification,\
231
+ \ such as, Six Sigma, CBPP, BPM, ISO 20000, ITIL, CMMI\n\nSummary: The main function\
232
+ \ of the Data Analyst is to provide business intelligence support and supporting\
233
+ \ areas by means of both repeatable and ad hoc reporting delivery reports (charts,\
234
+ \ graphs, tables, etc.) that enable informed business decisions. \nJob"
235
+ - 'experience.Support database architecture performance and optimization.Support,
236
+ and explore new ways to monetize Galehead’s geospatial tools, including entering
237
+ new verticals.Provide as-needed support for both technical and business issues
238
+ related to geospatial tools and outputs, including coaching/training other team
239
+ members, as needed.Collaborate to develop new analytic data productsWrite and
240
+ maintain a suite of automated data processesBring your best stuff: we need the
241
+ best from everyone.
242
+
243
+ KEY REQUIREMENTS:Ability to create reproducible data processes, products, and
244
+ visualizations using Python and SQL (or similar).Strong analytical and problem
245
+ solving skills.Experience with open source geospatial processing tools including
246
+ PostGIS (or other spatial SQL), GDAL/OGR, and/or Geopandas.Communications: Effective
247
+ and thoughtful written and verbal communications. Work through issues or differing
248
+ perspectives in a concise and professional manner.Organization: Maintain focus
249
+ and extract value from the high volume of opportunities through command of the
250
+ mission and meticulous organization of information, communications, resources
251
+ and responsibilities.Collaboration: Serve as a resource to the entire team and
252
+ facilitate getting work completed cross-functionally.
253
+
254
+ PREFERED SKILLS/CAPABILITIESExperience using Postgresql including complex analytic
255
+ queries and performance considerations.Energy industry experience.Experience in
256
+ software development practices including, but not limited to Git, Jira, Agileogr/gdalpostgres/postgispython
257
+ - (pandas/geopandas)
258
+
259
+ GALEHEAD CULTURE:Accountability: Set and maintain high standards for yourself
260
+ and your coworkers.Problem-Solving: Willingness to consider problems and find/drive
261
+ a path forward. Identify and recommend solutions.Our Values:Bold: Demonstrate
262
+ a bias for action and stretching conventional boundaries with a commensurate ability
263
+ to acknowledge, define, and mitigate risk.Driven: Demonstrate an inherent motivation
264
+ to succeed, regardless of externalities.True: Demonstrate transparency at all
265
+ times, provide and accept constructive feedback.'
266
+ - source_sentence: Data analysis, statistical modeling, data visualization
267
+ sentences:
268
+ - "Skills: AWS, Spark, Adobe Analytics/AEP(Adobe Experience Platform) platform experience,\
269
+ \ Glue, Lamda, Python, Scala, EMR, Talend, PostgreSQL, Redshift\n\n Configure\
270
+ \ AEP to get the data set needed and then use spark (AWS glue ) to load data in\
271
+ \ the data lake Evaluate new use cases and design ETL technical solutions to meet\
272
+ \ requirements Develop ETL solutions to meet complex use cases\n\nAdobe Data Engineer\
273
+ \ || Remote"
274
+ - 'experience solutions and technologies.This is a hybrid position, with the ideal
275
+ candidate located near one of our regional hubs (New York, Chicago, Boston) and
276
+ able to travel to an office as needed for working sessions or team meetings.
277
+
278
+ Curinos is looking for a Senior Data Engineering Manager to lead the build and
279
+ expansion of our Retail Consumer product suite, relied on by our clients for precision
280
+ deposit analysis and optimization. Our Retail Consumer business covers the largest
281
+ suite of Curinos products and this position is a critical role within the Product
282
+ Development team, combining both hands-on technical work (architecture, roadmap,
283
+ code review, POC of new/complex methodologies) and team management.In this role,
284
+ you will lead a cross-functional Product Development team of Software, Data and
285
+ QA engineers covering all aspects of product development (UI/Middle Tier/API/Backend/ETL).
286
+ You will collaborate with product owners on business requirements and features,
287
+ work with the development team to identify scalable architecture and methodologies
288
+ needed to implement, and own the timely and error-free delivery of those features.
289
+ You will be expected to be “hands-on-keys” in this role, leading the team by example
290
+ and helping to establish and model quality software development practices as the
291
+ team, products and business continues to grow.
292
+
293
+ ResponsibilitiesBuilding and leading a Product Engineering team consisting of
294
+ Software, Data and QA EngineersModeling quality software development practices
295
+ to the team by taking on user stories and writing elegant and scalable codeConducting
296
+ code reviews and providing feedback to help team members advance their skillsLeading
297
+ the design and development of performant, extendable and maintainable product
298
+ functionality, and coaching the team on the principles of efficient and scalable
299
+ designEngaging with product owner and LOB head to understand client needs and
300
+ craft product roadmaps and requirementsProviding input into the prioritization
301
+ of features to maximize value delivered to clientsAnalyzing complex business problems
302
+ and identifying solutions and own the implementationIdentifying new technologies
303
+ and tools which could improve the efficiency and productivity of your teamWorking
304
+ with in the Agile framework to manage the team’s day-to-day activitiesUnderstanding
305
+ Curinos’ Application, API and Data Engineering platforms and effectively using
306
+ them to build product featuresUnderstanding Curinos’ SDLC and compliance processes
307
+ and ensuring the team’s adherence to them
308
+
309
+ Base Salary Range: $160,000 to $185,000 (plus bonus)
310
+
311
+ Desired Skills & Expertise6+ years professional full stack experience developing
312
+ cloud based SaaS products using Java, SPA and related technologies with a complex
313
+ backend data processing system[SW1][NS2]3+ years of experience with SQL Server
314
+ or Databricks ETL, including hands-on experience developing SQL stored procedures
315
+ and SQL-based ETL pipelines2+ Years of management experience of engineers/ICsProven
316
+ ability to grow and lead geographically dispersed and cross-functional teamsA
317
+ passion for proactively identifying opportunities to eliminate manual work within
318
+ the SDLC process and as part of product operationA commitment to building a quality
319
+ and error-free product, via implementation of unit testing, integration testing,
320
+ and data validation strategiesA desire to design and develop for scale and in
321
+ anticipation of future use casesDemonstrated intellectual curiosity and innovative
322
+ thinking with a passion for problem-solvingSelf–discipline and willingness to
323
+ learn new skills, tools and technologiesExcellent verbal and written communication
324
+ skillsAdvanced proficiency in Java (including testing frameworks like Junit) and
325
+ T-SQL (including dynamic sql and the use of control structures) is an assetExperience
326
+ using Scala is a plusExperience using a templating language like Apache Freemarker
327
+ is a plusBachelors or advanced degrees (Masters or PhD) degree, preferably in
328
+ computer science, or a related engineering field
329
+
330
+ Why work at Curinos?Competitive benefits, including a range of Financial, Health
331
+ and Lifestyle benefits to choose fromFlexible working options, including home
332
+ working, flexible hours and part time options, depending on the role requirements
333
+ – please ask!Competitive annual leave, floating holidays, volunteering days and
334
+ a day off for your birthday!Learning and development tools to assist with your
335
+ career developmentWork with industry leading Subject Matter Experts and specialist
336
+ productsRegular social events and networking opportunitiesCollaborative, supportive
337
+ culture, including an active DE&I programEmployee Assistance Program which provides
338
+ expert third-party advice on wellbeing, relationships, legal and financial matters,
339
+ as well as access to counselling services
340
+
341
+ Applying:We know that sometimes the ''perfect candidate'' doesn''t exist, and
342
+ that people can be put off applying for a job if they don''t meet all the requirements.
343
+ If you''re excited about working for us and have relevant skills or experience,
344
+ please go ahead and apply. You could be just what we need!If you need any adjustments
345
+ to support your application, such as information in alternative formats, special
346
+ requirements to access our buildings or adjusted interview formats please contact
347
+ us at [email protected] and we’ll do everything we can to help.
348
+
349
+ Inclusivity at Curinos:We believe strongly in the value of diversity and creating
350
+ supportive, inclusive environments where our colleagues can succeed. As such,
351
+ Curinosis proud to be'
352
+ - "Qualifications\n Data Science, Statistics, and Data Analytics skillsData Visualization\
353
+ \ and Data Analysis skillsExperience with machine learning algorithms and predictive\
354
+ \ modelingProficiency in programming languages such as Python or RStrong problem-solving\
355
+ \ and critical thinking abilitiesExcellent communication and presentation skillsAbility\
356
+ \ to work independently and remotelyExperience in the field of data science or\
357
+ \ related rolesBachelor's degree in Data Science, Statistics, Computer Science,\
358
+ \ or a related field"
359
+ - source_sentence: NLP algorithm development, statistical modeling, biomedical informatics
360
+ sentences:
361
+ - 'skills for this position are:Natural Language Processing (NLP)Python (Programming
362
+ Language)Statistical ModelingHigh-Performance Liquid Chromatography (HPLC)Java
363
+ Job Description:We are seeking a highly skilled NLP Scientist to develop our innovative
364
+ and cutting-edge NLP/AI solutions to empower life science. This involves working
365
+ directly with our clients, as well as cross-functional Biomedical Science, Engineering,
366
+ and Business leaders, to identify, prioritize, and develop NLP/AI and Advanced
367
+ analytics products from inception to delivery.Key requirements and design innovative
368
+ NLP/AI solutions.Develop and validate cutting-edge NLP algorithms, including large
369
+ language models tailored for healthcare and biopharma use cases.Translate complex
370
+ technical insights into accessible language for non-technical stakeholders.Mentor
371
+ junior team members, fostering a culture of continuous learning and growth.Publish
372
+ findings in peer-reviewed journals and conferences.Engage with the broader scientific
373
+ community by attending conferences, workshops, and collaborating on research projects.
374
+ Qualifications:Ph.D. or master''s degree in biomedical NLP, Computer Science,
375
+ Biomedical Informatics, Computational Linguistics, Mathematics, or other related
376
+ fieldsPublication records in leading computer science or biomedical informatics
377
+ journals and conferences are highly desirable
378
+
379
+
380
+ Regards,Guru Prasath M US IT RecruiterPSRTEK Inc.Princeton, NJ [email protected]:
381
+ 609-917-9967 Ext:114'
382
+ - 'Qualifications and Experience:
383
+
384
+
385
+ Bachelor’s degree in data science, Statistics, or related field, or an equivalent
386
+ combination of education and experience.Working knowledge of Salesforce.Ability
387
+ to leverage enterprise data for advanced reporting.Proficiency in combining various
388
+ data sources for robust output.Strong knowledge of Annuity products and distribution
389
+ structure.Influencing skills and change management abilities.4-6 years of experience
390
+ in financial services.Strong organizational skills.Proven success in influencing
391
+ across business units and management levels.Confidence and ability to make effective
392
+ business decisions.Willingness to travel (less. than 10%)
393
+
394
+
395
+ Drive. Discipline. Confidence. Focus. Commitment. Learn more about working at
396
+ Athene.
397
+
398
+
399
+ Athene is a Military Friendly Employer! Learn more about how we support our Veterans.
400
+
401
+
402
+ Athene celebrates diversity, is committed to inclusion and is proud to be'
403
+ - 'Skills :
404
+
405
+ a) Azure Data Factory – Min 3 years of project experiencea. Design of pipelinesb.
406
+ Use of project with On-prem to Cloud Data Migrationc. Understanding of ETLd. Change
407
+ Data Capture from Multiple Sourcese. Job Schedulingb) Azure Data Lake – Min 3
408
+ years of project experiencea. All steps from design to deliverb. Understanding
409
+ of different Zones and design principalc) Data Modeling experience Min 5 Yearsa.
410
+ Data Mart/Warehouseb. Columnar Data design and modelingd) Reporting using PowerBI
411
+ Min 3 yearsa. Analytical Reportingb. Business Domain Modeling and data dictionary
412
+
413
+ Interested please apply to the job, looking only for W2 candidates.'
414
+ datasets:
415
+ - Mubin/ai-job-embedding-finetuning
416
+ pipeline_tag: sentence-similarity
417
+ library_name: sentence-transformers
418
+ metrics:
419
+ - cosine_accuracy
420
+ model-index:
421
+ - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
422
+ results:
423
+ - task:
424
+ type: triplet
425
+ name: Triplet
426
+ dataset:
427
+ name: ai job validation
428
+ type: ai-job-validation
429
+ metrics:
430
+ - type: cosine_accuracy
431
+ value: 0.9702970297029703
432
+ name: Cosine Accuracy
433
+ - task:
434
+ type: triplet
435
+ name: Triplet
436
+ dataset:
437
+ name: ai job test
438
+ type: ai-job-test
439
+ metrics:
440
+ - type: cosine_accuracy
441
+ value: 0.9803921568627451
442
+ name: Cosine Accuracy
443
+ ---
444
+
445
+ # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
446
+
447
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) on the [ai-job-embedding-finetuning](https://huggingface.co/datasets/Mubin/ai-job-embedding-finetuning) dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
448
+
449
+ ## Model Details
450
+
451
+ ### Model Description
452
+ - **Model Type:** Sentence Transformer
453
+ - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision fa97f6e7cb1a59073dff9e6b13e2715cf7475ac9 -->
454
+ - **Maximum Sequence Length:** 256 tokens
455
+ - **Output Dimensionality:** 384 dimensions
456
+ - **Similarity Function:** Cosine Similarity
457
+ - **Training Dataset:**
458
+ - [ai-job-embedding-finetuning](https://huggingface.co/datasets/Mubin/ai-job-embedding-finetuning)
459
+ <!-- - **Language:** Unknown -->
460
+ <!-- - **License:** Unknown -->
461
+
462
+ ### Model Sources
463
+
464
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
465
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
466
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
467
+
468
+ ### Full Model Architecture
469
+
470
+ ```
471
+ SentenceTransformer(
472
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
473
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
474
+ (2): Normalize()
475
+ )
476
+ ```
477
+
478
+ ## Usage
479
+
480
+ ### Direct Usage (Sentence Transformers)
481
+
482
+ First install the Sentence Transformers library:
483
+
484
+ ```bash
485
+ pip install -U sentence-transformers
486
+ ```
487
+
488
+ Then you can load this model and run inference.
489
+ ```python
490
+ from sentence_transformers import SentenceTransformer
491
+
492
+ # Download from the 🤗 Hub
493
+ model = SentenceTransformer("Mubin/allmini-ai-embedding-similarity")
494
+ # Run inference
495
+ sentences = [
496
+ 'NLP algorithm development, statistical modeling, biomedical informatics',
497
+ "skills for this position are:Natural Language Processing (NLP)Python (Programming Language)Statistical ModelingHigh-Performance Liquid Chromatography (HPLC)Java Job Description:We are seeking a highly skilled NLP Scientist to develop our innovative and cutting-edge NLP/AI solutions to empower life science. This involves working directly with our clients, as well as cross-functional Biomedical Science, Engineering, and Business leaders, to identify, prioritize, and develop NLP/AI and Advanced analytics products from inception to delivery.Key requirements and design innovative NLP/AI solutions.Develop and validate cutting-edge NLP algorithms, including large language models tailored for healthcare and biopharma use cases.Translate complex technical insights into accessible language for non-technical stakeholders.Mentor junior team members, fostering a culture of continuous learning and growth.Publish findings in peer-reviewed journals and conferences.Engage with the broader scientific community by attending conferences, workshops, and collaborating on research projects. Qualifications:Ph.D. or master's degree in biomedical NLP, Computer Science, Biomedical Informatics, Computational Linguistics, Mathematics, or other related fieldsPublication records in leading computer science or biomedical informatics journals and conferences are highly desirable\n\nRegards,Guru Prasath M US IT RecruiterPSRTEK Inc.Princeton, NJ [email protected]: 609-917-9967 Ext:114",
498
+ 'Skills :\na) Azure Data Factory – Min 3 years of project experiencea. Design of pipelinesb. Use of project with On-prem to Cloud Data Migrationc. Understanding of ETLd. Change Data Capture from Multiple Sourcese. Job Schedulingb) Azure Data Lake – Min 3 years of project experiencea. All steps from design to deliverb. Understanding of different Zones and design principalc) Data Modeling experience Min 5 Yearsa. Data Mart/Warehouseb. Columnar Data design and modelingd) Reporting using PowerBI Min 3 yearsa. Analytical Reportingb. Business Domain Modeling and data dictionary\nInterested please apply to the job, looking only for W2 candidates.',
499
+ ]
500
+ embeddings = model.encode(sentences)
501
+ print(embeddings.shape)
502
+ # [3, 384]
503
+
504
+ # Get the similarity scores for the embeddings
505
+ similarities = model.similarity(embeddings, embeddings)
506
+ print(similarities.shape)
507
+ # [3, 3]
508
+ ```
509
+
510
+ <!--
511
+ ### Direct Usage (Transformers)
512
+
513
+ <details><summary>Click to see the direct usage in Transformers</summary>
514
+
515
+ </details>
516
+ -->
517
+
518
+ <!--
519
+ ### Downstream Usage (Sentence Transformers)
520
+
521
+ You can finetune this model on your own dataset.
522
+
523
+ <details><summary>Click to expand</summary>
524
+
525
+ </details>
526
+ -->
527
+
528
+ <!--
529
+ ### Out-of-Scope Use
530
+
531
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
532
+ -->
533
+
534
+ ## Evaluation
535
+
536
+ ### Metrics
537
+
538
+ #### Triplet
539
+
540
+ * Datasets: `ai-job-validation` and `ai-job-test`
541
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
542
+
543
+ | Metric | ai-job-validation | ai-job-test |
544
+ |:--------------------|:------------------|:------------|
545
+ | **cosine_accuracy** | **0.9703** | **0.9804** |
546
+
547
+ <!--
548
+ ## Bias, Risks and Limitations
549
+
550
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
551
+ -->
552
+
553
+ <!--
554
+ ### Recommendations
555
+
556
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
557
+ -->
558
+
559
+ ## Training Details
560
+
561
+ ### Training Dataset
562
+
563
+ #### ai-job-embedding-finetuning
564
+
565
+ * Dataset: [ai-job-embedding-finetuning](https://huggingface.co/datasets/Mubin/ai-job-embedding-finetuning) at [b18b3c2](https://huggingface.co/datasets/Mubin/ai-job-embedding-finetuning/tree/b18b3c20bc31354d97bad62866da97618b6c13b7)
566
+ * Size: 812 training samples
567
+ * Columns: <code>query</code>, <code>job_description_pos</code>, and <code>job_description_neg</code>
568
+ * Approximate statistics based on the first 812 samples:
569
+ | | query | job_description_pos | job_description_neg |
570
+ |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
571
+ | type | string | string | string |
572
+ | details | <ul><li>min: 7 tokens</li><li>mean: 15.03 tokens</li><li>max: 38 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 216.92 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 217.63 tokens</li><li>max: 256 tokens</li></ul> |
573
+ * Samples:
574
+ | query | job_description_pos | job_description_neg |
575
+ |:----------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
576
+ | <code>Data Engineering Lead, Databricks administration, Neo4j expertise, ETL processes</code> | <code>Requirements<br><br>Experience: At least 6 years of hands-on experience in deploying production-quality code, with a strong preference for experience in Python, Java, or Scala for data processing (Python preferred).Technical Proficiency: Advanced knowledge of data-related Python packages and a profound understanding of SQL and Databricks.Graph Database Expertise: Solid grasp of Cypher and experience with graph databases like Neo4j.ETL/ELT Knowledge: Proven track record in implementing ETL (or ELT) best practices at scale and familiarity with data pipeline tools.<br><br>Preferred Qualifications<br><br>Professional experience using Python, Java, or Scala for data processing (Python preferred)<br><br>Working Conditions And Physical Requirements<br><br>Ability to work for long periods at a computer/deskStandard office environment<br><br>About The Organization<br><br>Fullsight is an integrated brand of our three primary affiliate companies – SAE Industry Technologies Consortia, SAE International and Performance Review Institute – a...</code> | <code>skills through a combination of education, work experience, and hobbies. You are excited about the complexity and challenges of creating intelligent, high-performance systems while working with a highly experienced and driven data science team.<br><br>If this described you, we are interested. You can be an integral part of a cross-disciplinary team working on highly visible projects that improve performance and grow the intelligence in our Financial Services marketing product suite. Our day-to-day work is performed in a progressive, high-tech workspace where we focus on a friendly, collaborative, and fulfilling environment.<br><br>Key Duties/Responsibilities<br><br>Leverage a richly populated feature stores to understand consumer and market behavior. 20%Implement a predictive model to determine whether a person or household is likely to open a lending or deposit account based on the advertising signals they've received. 20%Derive a set of new features that will help better understand the interplay betwe...</code> |
577
+ | <code>Snowflake data warehousing, Python design patterns, AWS tools expertise</code> | <code>Requirements:<br>- Good communication; and problem-solving abilities- Ability to work as an individual contributor; collaborating with Global team- Strong experience with Data Warehousing- OLTP, OLAP, Dimension, Facts, Data Modeling- Expertise implementing Python design patterns (Creational, Structural and Behavioral Patterns)- Expertise in Python building data application including reading, transforming; writing data sets- Strong experience in using boto3, pandas, numpy, pyarrow, Requests, Fast API, Asyncio, Aiohttp, PyTest, OAuth 2.0, multithreading, multiprocessing, snowflake python connector; Snowpark- Experience in Python building data APIs (Web/REST APIs)- Experience with Snowflake including SQL, Pipes, Stream, Tasks, Time Travel, Data Sharing, Query Optimization- Experience with Scripting language in Snowflake including SQL Stored Procs, Java Script Stored Procedures; Python UDFs- Understanding of Snowflake Internals; experience in integration with Reporting; UI applications- Stron...</code> | <code>skills and ability to lead detailed data analysis meetings/discussions.<br><br>Ability to work collaboratively with multi-functional and cross-border teams.<br><br>Good English communication written and spoken.<br><br>Nice to have;<br><br>Material master create experience in any of the following areas;<br><br>SAP<br><br>GGSM<br><br>SAP Data Analyst, MN/Remote - Direct Client</code> |
578
+ | <code>Cloud Data Engineering, Databricks Pyspark, Data Warehousing Design</code> | <code>Experience of Delta Lake, DWH, Data Integration, Cloud, Design and Data Modelling. Proficient in developing programs in Python and SQLExperience with Data warehouse Dimensional data modeling. Working with event based/streaming technologies to ingest and process data. Working with structured, semi structured and unstructured data. Optimize Databricks jobs for performance and scalability to handle big data workloads. Monitor and troubleshoot Databricks jobs, identify and resolve issues or bottlenecks. Implement best practices for data management, security, and governance within the Databricks environment. Experience designing and developing Enterprise Data Warehouse solutions. Proficient writing SQL queries and programming including stored procedures and reverse engineering existing process. Perform code reviews to ensure fit to requirements, optimal execution patterns and adherence to established standards. <br><br>Requirements: <br><br>You are:<br><br>Minimum 9+ years of experience is required. 5+ years...</code> | <code>QualificationsExpert knowledge of using and configuring GCP (Vertex), AWS, Azure Python: 5+ years of experienceMachine Learning libraries: Pytorch, JaxDevelopment tools: Bash, GitData Science frameworks: DatabricksAgile Software developmentCloud Management: Slurm, KubernetesData Logging: Weights and BiasesOrchestration, Autoscaling: Ray, ClearnML, WandB etc.<br>Optional QualificationsExperience training LLMs and VLMsML for Robotics, Computer Vision etc.Developing Browser Apps/Dashboards, both frontend and backend Javascript, React, etc. Emancro is committed to equal employment opportunities regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status.</code> |
579
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
580
+ ```json
581
+ {
582
+ "scale": 20.0,
583
+ "similarity_fct": "cos_sim"
584
+ }
585
+ ```
586
+
587
+ ### Evaluation Dataset
588
+
589
+ #### ai-job-embedding-finetuning
590
+
591
+ * Dataset: [ai-job-embedding-finetuning](https://huggingface.co/datasets/Mubin/ai-job-embedding-finetuning) at [b18b3c2](https://huggingface.co/datasets/Mubin/ai-job-embedding-finetuning/tree/b18b3c20bc31354d97bad62866da97618b6c13b7)
592
+ * Size: 101 evaluation samples
593
+ * Columns: <code>query</code>, <code>job_description_pos</code>, and <code>job_description_neg</code>
594
+ * Approximate statistics based on the first 101 samples:
595
+ | | query | job_description_pos | job_description_neg |
596
+ |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
597
+ | type | string | string | string |
598
+ | details | <ul><li>min: 10 tokens</li><li>mean: 15.78 tokens</li><li>max: 51 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 220.13 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 21 tokens</li><li>mean: 213.07 tokens</li><li>max: 256 tokens</li></ul> |
599
+ * Samples:
600
+ | query | job_description_pos | job_description_neg |
601
+ |:---------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
602
+ | <code>Big Data Engineer, Spark, Hadoop, AWS/GCP</code> | <code>Skills • Expertise and hands-on experience on Spark, and Hadoop echo system components – Must Have • Good and hand-on experience* of any of the Cloud (AWS/GCP) – Must Have • Good knowledge of HiveQL & SparkQL – Must Have Good knowledge of Shell script & Java/Scala/python – Good to Have • Good knowledge of SQL – Good to Have • Good knowledge of migration projects on Hadoop – Good to Have • Good Knowledge of one of the Workflow engines like Oozie, Autosys – Good to Have Good knowledge of Agile Development– Good to Have • Passionate about exploring new technologies – Good to Have • Automation approach – Good to Have <br>Thanks & RegardsShahrukh KhanEmail: [email protected]</code> | <code>experience:<br><br>GS-14:<br><br>Supervisory/Managerial Organization Leadership<br><br>Supervises an assigned branch and its employees. The work directed involves high profile data science projects, programs, and/or initiatives within other federal agencies.Provides expert advice in the highly technical and specialized area of data science and is a key advisor to management on assigned/delegated matters related to the application of mathematics, statistical analysis, modeling/simulation, machine learning, natural language processing, and computer science from a data science perspective.Manages workforce operations, including recruitment, supervision, scheduling, development, and performance evaluations.Keeps up to date with data science developments in the private sector; seeks out best practices; and identifies and seizes opportunities for improvements in assigned data science program and project operations.<br><br><br>Senior Expert in Data Science<br><br>Recognized authority for scientific data analysis using advanc...</code> |
603
+ | <code>Time series analysis, production operations, condition-based monitoring</code> | <code>Experience in Production Operations or Well Engineering Strong scripting/programming skills (Python preferable)<br><br>Desired: <br><br> Strong time series surveillance background (eg. OSI PI, PI AF, Seeq) Strong scripting/programming skills (Python preferable) Strong communication and collaboration skills Working knowledge of machine learning application (eg. scikit-learn) Working knowledge of SQL and process historians Delivers positive results through realistic planning to accomplish goals Must be able to handle multiple concurrent tasks with an ability to prioritize and manage tasks effectively<br><br><br><br>Apex Systems is <br><br>Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in man...</code> | <code>Qualifications:· 3-5 years of experience as a hands-on analyst in an enterprise setting, leveraging Salesforce, Marketo, Dynamics, and similar tools.· Excellent written and verbal communication skills.· Experience with data enrichment processes and best practices.· Strong understanding of B2B sales & marketing for large, complex organizations.· Expertise in querying, manipulating, and analyzing data using SQL and/or similar languages.· Advanced Excel skills and experience with data platforms like Hadoop and Databricks.· Proven proficiency with a data visualization tool like Tableau or Power BI.· Strong attention to detail with data quality control and integration expertise.· Results-oriented, self-directed individual with multi-tasking, problem-solving, and independent learning abilities.· Understanding of CRM systems like Salesforce and Microsoft Dynamics.· Solid grasp of marketing practices, principles, KPIs, and data types.· Familiarity with logical data architecture and cloud data ...</code> |
604
+ | <code>Senior Data Analyst jobs with expertise in Power BI, NextGen EHR, and enterprise ETL.</code> | <code>requirements.Reporting and Dashboard Development: Design, develop, and maintain reports for the HRSA HCCN Grant and other assignments. Create and maintain complex dashboards using Microsoft Power BI.Infrastructure Oversight: Monitor and enhance the data warehouse, ensuring efficient data pipelines and timely completion of tasks.Process Improvements: Identify and implement internal process improvements, including automating manual processes and optimizing data delivery.Troubleshooting and Maintenance: Address data inconsistencies using knowledge of various database structures and workflow best practices, including NextGen EHR system.Collaboration and Mentorship: Collaborate with grant PHCs and analytic teams, mentor less senior analysts, and act as a project lead for specific deliverables.<br>Experience:Highly proficient in SQL and experienced with reporting packages.Enterprise ETL experience is a major plus!data visualization tools (e.g., Tableau, Power BI, Qualtrics).Azure, Azure Data Fa...</code> | <code>Qualifications<br><br>3 to 5 years of experience in exploratory data analysisStatistics Programming, data modeling, simulation, and mathematics Hands on working experience with Python, SQL, R, Hadoop, SAS, SPSS, Scala, AWSModel lifecycle executionTechnical writingData storytelling and technical presentation skillsResearch SkillsInterpersonal SkillsModel DevelopmentCommunicationCritical ThinkingCollaborate and Build RelationshipsInitiative with sound judgementTechnical (Big Data Analysis, Coding, Project Management, Technical Writing, etc.)Problem Solving (Responds as problems and issues are identified)Bachelor's Degree in Data Science, Statistics, Mathematics, Computers Science, Engineering, or degrees in similar quantitative fields<br><br><br>Desired Qualification(s)<br><br>Master's Degree in Data Science, Statistics, Mathematics, Computer Science, or Engineering<br><br><br>Hours: Monday - Friday, 8:00AM - 4:30PM<br><br>Locations: 820 Follin Lane, Vienna, VA 22180 | 5510 Heritage Oaks Drive, Pensacola, FL 32526 | 141 Se...</code> |
605
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
606
+ ```json
607
+ {
608
+ "scale": 20.0,
609
+ "similarity_fct": "cos_sim"
610
+ }
611
+ ```
612
+
613
+ ### Training Hyperparameters
614
+ #### Non-Default Hyperparameters
615
+
616
+ - `eval_strategy`: steps
617
+ - `per_device_train_batch_size`: 16
618
+ - `per_device_eval_batch_size`: 16
619
+ - `learning_rate`: 2e-05
620
+ - `num_train_epochs`: 1
621
+ - `warmup_ratio`: 0.1
622
+ - `batch_sampler`: no_duplicates
623
+
624
+ #### All Hyperparameters
625
+ <details><summary>Click to expand</summary>
626
+
627
+ - `overwrite_output_dir`: False
628
+ - `do_predict`: False
629
+ - `eval_strategy`: steps
630
+ - `prediction_loss_only`: True
631
+ - `per_device_train_batch_size`: 16
632
+ - `per_device_eval_batch_size`: 16
633
+ - `per_gpu_train_batch_size`: None
634
+ - `per_gpu_eval_batch_size`: None
635
+ - `gradient_accumulation_steps`: 1
636
+ - `eval_accumulation_steps`: None
637
+ - `torch_empty_cache_steps`: None
638
+ - `learning_rate`: 2e-05
639
+ - `weight_decay`: 0.0
640
+ - `adam_beta1`: 0.9
641
+ - `adam_beta2`: 0.999
642
+ - `adam_epsilon`: 1e-08
643
+ - `max_grad_norm`: 1.0
644
+ - `num_train_epochs`: 1
645
+ - `max_steps`: -1
646
+ - `lr_scheduler_type`: linear
647
+ - `lr_scheduler_kwargs`: {}
648
+ - `warmup_ratio`: 0.1
649
+ - `warmup_steps`: 0
650
+ - `log_level`: passive
651
+ - `log_level_replica`: warning
652
+ - `log_on_each_node`: True
653
+ - `logging_nan_inf_filter`: True
654
+ - `save_safetensors`: True
655
+ - `save_on_each_node`: False
656
+ - `save_only_model`: False
657
+ - `restore_callback_states_from_checkpoint`: False
658
+ - `no_cuda`: False
659
+ - `use_cpu`: False
660
+ - `use_mps_device`: False
661
+ - `seed`: 42
662
+ - `data_seed`: None
663
+ - `jit_mode_eval`: False
664
+ - `use_ipex`: False
665
+ - `bf16`: False
666
+ - `fp16`: False
667
+ - `fp16_opt_level`: O1
668
+ - `half_precision_backend`: auto
669
+ - `bf16_full_eval`: False
670
+ - `fp16_full_eval`: False
671
+ - `tf32`: None
672
+ - `local_rank`: 0
673
+ - `ddp_backend`: None
674
+ - `tpu_num_cores`: None
675
+ - `tpu_metrics_debug`: False
676
+ - `debug`: []
677
+ - `dataloader_drop_last`: False
678
+ - `dataloader_num_workers`: 0
679
+ - `dataloader_prefetch_factor`: None
680
+ - `past_index`: -1
681
+ - `disable_tqdm`: False
682
+ - `remove_unused_columns`: True
683
+ - `label_names`: None
684
+ - `load_best_model_at_end`: False
685
+ - `ignore_data_skip`: False
686
+ - `fsdp`: []
687
+ - `fsdp_min_num_params`: 0
688
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
689
+ - `fsdp_transformer_layer_cls_to_wrap`: None
690
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
691
+ - `deepspeed`: None
692
+ - `label_smoothing_factor`: 0.0
693
+ - `optim`: adamw_torch
694
+ - `optim_args`: None
695
+ - `adafactor`: False
696
+ - `group_by_length`: False
697
+ - `length_column_name`: length
698
+ - `ddp_find_unused_parameters`: None
699
+ - `ddp_bucket_cap_mb`: None
700
+ - `ddp_broadcast_buffers`: False
701
+ - `dataloader_pin_memory`: True
702
+ - `dataloader_persistent_workers`: False
703
+ - `skip_memory_metrics`: True
704
+ - `use_legacy_prediction_loop`: False
705
+ - `push_to_hub`: False
706
+ - `resume_from_checkpoint`: None
707
+ - `hub_model_id`: None
708
+ - `hub_strategy`: every_save
709
+ - `hub_private_repo`: None
710
+ - `hub_always_push`: False
711
+ - `gradient_checkpointing`: False
712
+ - `gradient_checkpointing_kwargs`: None
713
+ - `include_inputs_for_metrics`: False
714
+ - `include_for_metrics`: []
715
+ - `eval_do_concat_batches`: True
716
+ - `fp16_backend`: auto
717
+ - `push_to_hub_model_id`: None
718
+ - `push_to_hub_organization`: None
719
+ - `mp_parameters`:
720
+ - `auto_find_batch_size`: False
721
+ - `full_determinism`: False
722
+ - `torchdynamo`: None
723
+ - `ray_scope`: last
724
+ - `ddp_timeout`: 1800
725
+ - `torch_compile`: False
726
+ - `torch_compile_backend`: None
727
+ - `torch_compile_mode`: None
728
+ - `dispatch_batches`: None
729
+ - `split_batches`: None
730
+ - `include_tokens_per_second`: False
731
+ - `include_num_input_tokens_seen`: False
732
+ - `neftune_noise_alpha`: None
733
+ - `optim_target_modules`: None
734
+ - `batch_eval_metrics`: False
735
+ - `eval_on_start`: False
736
+ - `use_liger_kernel`: False
737
+ - `eval_use_gather_object`: False
738
+ - `average_tokens_across_devices`: False
739
+ - `prompts`: None
740
+ - `batch_sampler`: no_duplicates
741
+ - `multi_dataset_batch_sampler`: proportional
742
+
743
+ </details>
744
+
745
+ ### Training Logs
746
+ | Epoch | Step | ai-job-validation_cosine_accuracy | ai-job-test_cosine_accuracy |
747
+ |:-----:|:----:|:---------------------------------:|:---------------------------:|
748
+ | 0 | 0 | 0.9307 | - |
749
+ | 1.0 | 51 | 0.9703 | 0.9804 |
750
+
751
+
752
+ ### Framework Versions
753
+ - Python: 3.11.11
754
+ - Sentence Transformers: 3.3.1
755
+ - Transformers: 4.47.1
756
+ - PyTorch: 2.5.1+cu121
757
+ - Accelerate: 1.2.1
758
+ - Datasets: 3.2.0
759
+ - Tokenizers: 0.21.0
760
+
761
+ ## Citation
762
+
763
+ ### BibTeX
764
+
765
+ #### Sentence Transformers
766
+ ```bibtex
767
+ @inproceedings{reimers-2019-sentence-bert,
768
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
769
+ author = "Reimers, Nils and Gurevych, Iryna",
770
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
771
+ month = "11",
772
+ year = "2019",
773
+ publisher = "Association for Computational Linguistics",
774
+ url = "https://arxiv.org/abs/1908.10084",
775
+ }
776
+ ```
777
+
778
+ #### MultipleNegativesRankingLoss
779
+ ```bibtex
780
+ @misc{henderson2017efficient,
781
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
782
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
783
+ year={2017},
784
+ eprint={1705.00652},
785
+ archivePrefix={arXiv},
786
+ primaryClass={cs.CL}
787
+ }
788
+ ```
789
+
790
+ <!--
791
+ ## Glossary
792
+
793
+ *Clearly define terms in order to be accessible across audiences.*
794
+ -->
795
+
796
+ <!--
797
+ ## Model Card Authors
798
+
799
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
800
+ -->
801
+
802
+ <!--
803
+ ## Model Card Contact
804
+
805
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
806
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/all-MiniLM-L6-v2",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 384,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1536,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.47.1",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.47.1",
5
+ "pytorch": "2.5.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:daff92a6b8591c145e5c7081286dc06a035f5989ad8278ef5c97036caf20d53d
3
+ size 90864192
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "max_length": 128,
51
+ "model_max_length": 256,
52
+ "never_split": null,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "[PAD]",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "[SEP]",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "[UNK]"
65
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff