We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Lead Research Software Engineer

Oak Ridge National Laboratory
United States, Tennessee, Oak Ridge
1 Bethel Valley Road (Show on map)
Apr 21, 2026

Requisition Id16230

Overview:

The National Center for Computational Sciences (NCCS) at Oak Ridge National Lab (ORNL), which hosts several of the world's most powerful computer systems, is seeking highly qualified individuals to play a key role in designing, developing, and deploying data management tools and persistent services that support scientific and AI/ML campaigns that run onNCCS computing infrastructure, including the world's first exaflop system, Frontier.

The Team:

As a Lead Research Software Engineer (RSE) in the Data and Platform Services (DAPS) group, you will work within the HPC Operations Section. The DAPS group designs and operates data management platforms, tools, and services for the end-to-end data lifecycle from ingestion to publication and supports several large initiatives and facilities at ORNL. Our primary development and deployment platform is the Oak Ridge Leadership Computing Facility (OLCF) Slate Service, built on Kubernetes and Rancher, which provides a container orchestration service for running critical operation applications and user-managed persistent applications that run alongside our OLCF supercomputer systems and other OLCF managed HPC clusters.

The Role:

As a Lead Research Software Engineer, you will design, implement, operate, and maintain federated data platforms, data management portals, data processing pipelines, API gateways, and persistent services for the entire data lifecycle on our on-premises Kubernetes clusters, with a strong focus on innovation, scalability, reliability, and maintainability. You will also assist with AI initiatives at OLCF, evaluate and integrate key data engineering and MLOps technologies, be an individual contributor for medium sized projects, and collaborate with Platform engineers in delivering a robust set of production services for OLCF users. This role requires: significant experience with data management platforms and tools, significant experience with full stack application and API development, working knowledge of Kubernetes and containerization. Knowledge of current AI/ML tools and workflows is preferred but not required.

Major Duties/Responsibilities:

Application Development and Deployment

  • Identify and evaluate solutions for federated data management (e.g., Pelican, Rucio), data catalog solutions (e.g. CKAN, DKAN, Schema.org), streaming data (e.g., Kafka), and data movement (XRootD, Globus, S3).
  • Design and implement web portals and API services for data management using a combination of modern web technologies.
  • Develop, implement, and maintain Kubernetes deployment recipes for data portals, catalogs, API gateways, and other ancillary services like key-value stores and databases.
  • Design and develop solutions for MLOps including model lifecycle management and storage, as well as integration with existing platforms like MLFlow.
  • Stay up to date on both open source and commercial platforms and tools being developed for end-to-end data and ML lifecycle management.

Collaboration

  • Assist the Group Leader with developing platform and software design documents for new projects and lead implementation efforts with other Software Engineers in the group.
  • Partner closely with internal platforms, cybersecurity, and account management teams to ensure the platform meets security, compliance, role-based access controls, and usability expectations.
  • Participate in cross-functional projects related to platform enhancements and cluster lifecycle automation.
  • Be able to represent the DAPS team with internal collaborators and partners across the lab.

Basic Qualifications:

  • BSdegree and 5+ years of relevant experience or equivalent experience.
  • At least three years of experience with data management platform and tools development.
  • At least three years of experience with full stack application and API development.
  • Experience with CI/CD tooling, GitOps, and Kubernetes.
  • Experience with code review and familiarity with tools like git, GitHub and GitLab.


Preferred Qualifications:

  • M.S. or Ph.D. in a technical field.
  • Excellent interpersonal/communications skills, and the ability to work as part of a team.
  • Experience with modern MLOps, data engineering, and LLM technologies.
  • Experience with PHP, Python, modern Javascript frameworks (React, AngularJS, NodeJS).
  • Experience designing and implementing highly available systems/services.
  • 8+ years of experience in addition to the degree.
  • Experience with modern software practices such as test-driven development, Agile software development practices and a firm, proven knowledge of software development lifecycles.
  • Demonstrated activity within the broader open-source software community.

This position will remain open for a minimum of 5 days after which it will close when a qualified candidate is identified and/or hired.

We accept Word (.doc, .docx), Adobe (unsecured .pdf), Rich Text Format (.rtf), and HTML (.htm, .html) up to 5MB in size. Resumes from third party vendors will not be accepted; these resumes will be deleted and the candidates submitted will not be considered for employment.

If you have trouble applying for a position, please email ORNLRecruiting@ornl.gov.

ORNL is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. UT-Battelle is an E-Verify employer.

Applied = 0

(web-bd9584865-8jwgc)