Senior Distributed Systems Engineer, AI Infrastructure

NVIDIA
Shanghai, 上海市
全职
2天前

NVIDIA is hiring a senior distributed systems engineer to architect, lead and develop our exa-scale AI infrastructure and deep learning platform for Autonomous Vehicles. You will need to have strong programming skills, a deep understanding of cloud technologies, distributed storage & compute systems, and distributed systems architecture. You will need to have excellent communication and planning skills. You ideally have experience in securing distributed systems or willingness to learn it. Finally, you will need engineering technical leadership skills. Together, we will build the exa-scale software 2.0 cloud platform for one of the most ambitious problems of our time: autonomous vehicles. Then we will apply it to other applications such as medical imaging, data science, genomics and more.

What you'll be doing:

  • Architect and build scalable and distributed services that will help power the AI infrastructure for deep learning platforms.

  • Design and build infrastructure and microservices that help index, mine, transform, and compose PB sized deep learning datasets.

  • Design the next generation of dataset management services for real and synthetic / simulated datasets.

  • Leverage LLM and AI agents, you will create AI assistants everywhere

  • Collaborate with multiple AI teams to understand their requirements and build a future-proof platform that improves their productivity.

  • Be a technical leader on various projects across the platform, and be a major contributor of the entire platform’s architecture.

  • Support users of the platform.

What we need to see:

  • BS, MS, or PhD in Computer Architecture, Computer Science, Electrical Engineering or related field or equivalent experience.

  • 5+ years of Work or Research Experience in distributed systems development and design.

  • Strong programming background that incorporates methodologies like data structures, design patterns, OOP, and test driven development.

  • Proven technical foundation in distributed computing and storage, including significant experience with most of the following: server systems, storage, I/O, networking, and systems software.

  • Hands-on experience in or willingness to learn about authentication and authorization as well as the related technologies such as OIDC, TLS, AWS IAM, role-based access control, attribute-based access control, Open Policy Agent.

  • Advanced programming skills to build distributed storage and compute systems, backend services, microservices, and web technologies.

  • A specialist programmer in Python, Go or C/C++.

  • Ability to switch effectively between long-term strategic and near-term tactical topics.

  • Highly motivated with strong interpersonal skills, you have the ability to work successfully with multi-functional teams, principles and architects and coordinate optimally across interpersonal boundaries and geographies.

  • A track record of successful technical leadership and large-scale architecture that impacted critical projects.

Ways to stand out from the crowd:

  • Experience building MLOps, Multi AI Agent Systems or AI/ML solutions on-premise or in the cloud.

  • Hands-on experience in or willingness to learn about security topics such as secure design, secure coding, data protection, zero trust networks, and incident response management.

  • Sophisticated programming expertise in Spark and Databricks for large‑scale data processing and analytics.

  • Experience with Kubernetes and Docker.

  • Open source contributions.

With highly competitive salaries and a comprehensive benefits package, NVIDIA is widely considered to be one of the technology industry's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working with us and our engineering teams are growing fast in some of the hottest state of the art fields: Deep Learning, Artificial Intelligence, and Autonomous Vehicles. If you're a creative computer scientist/engineer with a real passion for distributed systems and autonomous driving, we want to hear from you.

#deeplearning
申请
其他职位推荐:

Programing & Application Engineer

Lincoln Electric
Provincia di Cuneo, Piemonte
林肯电气是先进电弧焊解决方案、自动连接、装配和切割系统、等离子和氧燃料切割设备的工程、设计和制造领域的全球领导者,在钎焊和焊接合金领域也占据全球领先地位。林肯因其领先的材料科学、软件开发、自动化工程和应用专业知识而被誉为焊接专家™,这些专业知识可提高客户的制造...
1周前

Maintenance Engineer II

美国雅培
Shanghai, 上海市
……在工程维修经理/维修主管的领导下, 负责生产设备、仪器; 负责生产设备等的维修保养、改善工作, 设备质量问题改善,对生产线人员进行基本的操作和清洁维护指导,以保证生产的正常运行。
2周前

Application Engineer ST

Henkel
顺德区, 广东省
在汉高,我们都有着各种不同的背景、观点和生活经历。我们相信,所有员工 的独特性就是我们的力量。加入我们,成为团队的一员,发挥您的独特性!我们欢 迎所有申请,无论申请人属于哪种性别、血统、文化、宗教、性取向、残疾状况以 及世代。
1周前

Design Engineer

Oerlikon
Beijing, 北京市
1. 按计划进行设计,确保设计质量及进度。 2. 解决生产过程中出现的与设计有关的问题,与相关人员沟通,必要时更新设计或制作放行文件。 3. 对本地化工作提供必要的技术支持。 4. 对现场安装提供技术支持,解决安装过程中的设计问题。 5....
1周前

Software Engineer II - AI Tools

微软
Shanghai, 上海市
Want to work on products and services to empower millions of developers? Join us to make a difference! Join us to design and ship...
3天前

Senior Associate Operator, Production, Process Expert

Celanese
顺德区, 广东省
2. 具备优秀的生产一线的工艺知识来监控,预测,汇报,消除,解决或者升级可能影响4个核心原则的问题
1周前

Senior System Engineer

Westinghouse Electric Company, LLC
Shanghai, 上海市
  • Perform the piping networks engineering design and analysis...
  • Identify problems from plant testing and operational data...
1周前

Senior Software Engineer, Machine Learning & AI

苹果
Shanghai, 上海市
Are you passionate about Machine Learning and AI, and eager to apply your expertise to solve real-world problems at Apple’s...
5天前

Control Systems and Sensing Integration Engineer

苹果
澳門
  • Skilled in rapid integration, prototyping, calibration and...
  • Experience with embedded systems development, analysis and...
1周前

Senior Mechanical Engineer (Based in Dongguan) (IANG Welcome)

Karrie International Holdings Limited
中国
嘉利國際是一家全球塑膠和金屬產品製造商,為許多財富500強企業提供產品,主要包括商用伺服器外殼(全球市場份額達10%)、磁帶資料儲存和銷售點系統等。我們以高品質、快速、靈活和經濟高效的方式承接每一個OEM專案。我們的合約製造服務也同樣備受青睞。近年來,我們已進...
1周前