We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Senior Software Engineer

Microsoft
United States, Texas, Irving
7000 State Highway 161 (Show on map)
Oct 29, 2025
OverviewThe HPC/AI (High-Performance Computing and Artificial Intelligence) team is driving the creation of next-generation distributed AI supercomputers-delivering unmatched computational power, scalability, and reliability to enable breakthroughs in artificial intelligence. We design and build advanced infrastructure for large-scale AI model training, setting the stage for innovations that redefine what AI can achieve. We're seeking Senior Software Engineers who are passionate about high-performance systems and eager to tackle complex challenges in backend network design, RDMA-based communication libraries, and new network transport protocol development. In this role, you'll develop networking solutions that ensure high throughput, ultra-low latency, and minimal jitter for distributed AI workloads-critical for enabling state-of-the-art AI systems to reach their full potential.In this role, you'll develop next-generation network transport protocols, and build RDMA-based communication libraries that deliver ultra-low latency and high throughput. You'll collaborate across diverse network architectures, processors, and accelerator technologies to deliver end-to-end solutions with a relentless focus on performance, scalability, and observability.Generative AI and large-scale distributed systems are transforming technology. As a Senior Software Engineer on our team, you'll work at the intersection of AI and high-performance computing, shaping the networking backbone that powers Azure's AI supercomputing platform. This is your chance to influence the future of AI infrastructure and make an impact at global scale.Microsoft's mission is to empower every person and every organization on the planet to achieve more. We embrace a growth mindset, innovate to empower others, and collaborate to achieve shared goals. Our values-respect, integrity, and accountability-guide us as we build a culture of inclusion where everyone can thrive.
ResponsibilitiesDesign, develop, and optimize networking solutions tailored for large-scale AI training infrastructure.Benchmark, analyze, and enhance the scalability and reliability of networking systems to handle petabyte-scale data transfer.Debug and resolve complex networking issues in large-scale, high-performance environments.Drive identification of dependencies and the development of design documents for a product, application, service, or platform.Create, implement, optimize, debug, refactor, and reuse code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI).Proactively seek new knowledge and adapts to new AI trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance.
Applied = 0

(web-675dddd98f-24cnf)