We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Cloud Network Engineer

Microsoft
United States, Texas, Irving
7000 State Highway 161 (Show on map)
Oct 10, 2025
OverviewThe High Performance Computing and Artificial Intelligence (HPC and AI) team is focused on building the next-generation distributed artificial intelligence supercomputer. This effort supports breakthroughs in artificial intelligence by delivering unmatched computational power, scalability, and reliability. The team designs and develops advanced infrastructure to support high-performance model training at scale, laying the groundwork for innovations that expand the boundaries of what artificial intelligence can achieve.We are seeking a Cloud Network Engineer who is passionate about designing and developing the infrastructure that powers large-scale artificial intelligence and high-performance computing systems. In this role, you will contribute to the design, deployment, and operation of network infrastructure, automation workflows, observability frameworks, and performance optimization systems. These systems are essential for achieving ultra-low latency, high throughput, and efficient performance at petabyte scale in distributed artificial intelligence workloads.As a Cloud Network Engineer on the High Performance Computing and Artificial Intelligence Infrastructure team, you will work at the intersection of artificial intelligence supercomputing and large-scale networking. Your work will directly impact the reliability and performance of distributed clusters, using high-speed fabrics such as Ethernet and InfiniBand, and accelerated compute platforms including graphics processing units from NVIDIA and AMD. This is a unique opportunity to help build the network infrastructure that ensures speed, reliability, and availability at exascale levels. You will collaborate across hardware, infrastructure, and platform teams to deliver systems that support the future of artificial intelligence training and inference.Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
ResponsibilitiesNetwork Deployment: Deploy high-throughput, low-latency physical network topologies (e.g., Clos, FatTree) using technologies such as InfiniBand and Ethernet to support AI model training and HPC workloads.Operational Readiness: Serve as a Designated Responsible Individual (DRI) for physical network systems-monitoring health, responding to incidents, performing root-cause analysis, and driving improvements in availability and observability.Cross-Functional Collaboration: Partner with hardware engineering, DataCentre operations, and software-defined networking teams to ensure seamless integration of physical and logical network layers.Documentation & Standards: Own the documentation of physical network designs, cabling standards, and deployment procedures. Lead design reviews and ensure alignment with compliance and safety standards.Innovation & Research: Stay current with advancements in optical networking, high-speed interconnects, and AI/HPC fabric technologies. Evaluate and integrate emerging solutions to improve scalability, efficiency, and performance.
Applied = 0

(web-c549ffc9f-cs7fj)