搜索建议:

外企
远程办公
remote
marketing
finance
实习
国外工作
french
work from home
project manager
兼职
法务
德国企业
香港
顺德区
上海市
Shanghai
澳門
澳門
Shenzhen
Sanya City
海南省
屯門區
Guangzhou City
Remote
申请

Senior Performance Software Engineer, Deep Learning Libraries

NVIDIA
Shanghai, 上海市
全职
3周前

We are now looking for a Senior Performance Software Engineer for Deep Learning Libraries! Do you enjoy tuning parallel algorithms and analyzing their performance? If so, we want to hear from you! As a deep learning library performance software engineer, you will be developing optimized code to accelerate linear algebra and deep learning operations on NVIDIA GPUs. The team delivers high-performance code to NVIDIA’s cuDNN, cuBLAS, and TensorRTlibraries to accelerate deep learning models. The team is proud to play an integral part in enabling the breakthroughs in domains such as image classification, speech recognition, and natural language processing. Join the team that is building the underlying software used across the world to power the revolution in artificial intelligence! We’re always striving for peak GPU efficiency on current and future-generation GPUs. To get a sense of the code we write, check out our CUTLASS open-source project showcasing performant matrix multiply on NVIDIA’s Tensor Cores with CUDA. This specific position primarily deals with code lower in the deep learning software stack, right down to the GPU HW.

What you'll be doing:

  • Writing highly tuned compute kernels, mostly in C++ CUDA, to perform core deep learning operations (e.g. matrix multiplies, convolutions, normalizations)

  • Following general software engineering best practices including support for regression testing and CI/CD flows

  • Collaborating with teams across NVIDIA:

    • CUDA compiler team on generating optimal assembly code

    • Deep learning training and inference performance teams on which layers require optimization

    • Hardware and architecture teams on the programming model for new deep learning hardware features

What we need to see:

  • Masters or PhD degree or equivalent experience in Computer Science, Computer Engineering, Applied Math, or related field

  • 2+ years of relevant industry experience

  • Demonstrated strong C++ programming and software design skills, including debugging, performance analysis, and test design

  • Experience with performance-oriented parallel programming, even if it’s not on GPUs (e.g. with OpenMP or pthreads)

  • Solid understanding of computer architecture and some experience with assembly programming

Ways to stand out from the crowd:

  • Tuning BLAS or deep learning library kernel code

  • CUDA/OpenCL GPU programming

  • Numerical methods and linear algebra

  • LLVM, TVM tensor expressions, or TensorFlow MLIR

保存 申请
举报职位
其他职位推荐:

Senior Infrastructure Software Engineer, Deep Learning Libraries

NVIDIA
Shanghai, 上海市
  • Building scalable automation for build, test, integration,...
  • Developing throughout the software stack, from the user...
3周前

Deep Learning Performance Architect Intern - 2025

NVIDIA
Shanghai, 上海市
  • Unlock Architectural Insights: Analyze GPU workloads to...
  • AI-Powered Automation: Build AI/ML-driven tools to automate...
2周前

Deep Learning Performance Architect

NVIDIA
Shanghai, 上海市
In this position, you will have a chance to work on DL performance modelling, analysis, and optimization on state-of-the-art...
3周前

Engineer, Software Development (vBNG Control Plane Software R&D Engineer)

CommScope
澳門
In our ‘always on’ world, we believe it’s essential to have a genuine connection with the work you do. Due to our continued...
1周前

CPU Performance Developer Technology Engineer

NVIDIA
上海市
  • Engage directly with the developer community and experts in...
  • Guide key framework and application developers, contribute...
3周前