搜索建议:

外企
远程办公
remote
german
小红书
国外工作
中国海外
esg
executive assistant
兼职
实习
marketing
finance
Shanghai
顺德区
澳門
香港
上海市
澳門
Chongqing
海南省
葵青區
Hangzhou City
沙田區
Wenzhou
申请

Deep Learning Performance Architect Intern - 2025

NVIDIA
Shanghai, 上海市
全职
2周前

We are looking for a first-class Deep Learning Performance architect to join in us to shape the performance analysis infrastructures for GPUs. We build cutting-edge analysis tools and visualization frameworks that empower engineers to optimize GPU performance for Deep Learning and HPC workloads—spanning pre-silicon architectural exploration to post-silicon validation and optimization. Your work will directly shape the tools that define how NVIDIA GPUs are analyzed, tuned, and scaled for next-gen AI systems, and impact the next-gen GPUs architectures.

What you’ll be doing:

  • Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle

  • Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities.

  • AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure.

  • Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to co-design performance-centric solutions.

  • End-to-End Optimization: Create benchmarks to validate performance improvements across AI/HPC workloads and present actionable insights.

What we need to see:

  • BS/MS+ in relevant discipline (CS, EE, Math)

  • Proficiency in C/C++ (performance-critical coding) and Python (automation/scripting, and AI/ML frameworks)

  • Strong grasp of computer architecture (pipelines, memory hierarchies) and Operating System fundamentals

  • Understand machine learning and data analysis basics, LLM techniques such as prompt engineering, fine-tuning, vector databases

  • Experience with performance modeling, architecture simulation, profiling, and analysis.

  • Self-starter who thrives in dynamic environments and manages competing priorities effectively.

Ways to stand out from the crowd:

  • Experience with developing HW performance debugging and analysis tools

  • Familiar with System Software Stack(like CUDA Driver), CUDA kernel optimization and understand GPU architecture

  • Familiarity with GPU performance profiling tools like Nsight System, Nsight Compute

  • Practical experience or projects demonstrating LLM-based code generation, automated data analysis, or workflow assistants. Prior experience with agentic LLM frameworks like Langchain and LLamaIndex.

  • Full-Stack Versatility: Skills in JavaScript, SQL, or UI/UX design for tool interfaces.

保存 申请
举报职位
其他职位推荐:

Deep Learning Performance Architect Intern - 2025

NVIDIA
Shanghai, 上海市
  • Develop analytical models for the state-of-the-art deep...
  • Specify hardware/software configurations and metrics to...
3周前

Deep Learning Performance Architect

NVIDIA
Shanghai, 上海市
In this position, you will have a chance to work on DL performance modelling, analysis, and optimization on state-of-the-art...
3周前

Senior Performance Software Engineer, Deep Learning Libraries

NVIDIA
Shanghai, 上海市
We are now looking for a Senior Performance Software Engineer for Deep Learning Libraries! Do you enjoy tuning parallel...
3周前

Senior Infrastructure Software Engineer, Deep Learning Libraries

NVIDIA
Shanghai, 上海市
  • Building scalable automation for build, test, integration,...
  • Developing throughout the software stack, from the user...
3周前

Senior Project Manager / Senior Architect / Architect / Junior Architect

BIG (Bjarke Ingels Group)
Shanghai, 上海市
  • Lead the management of small- to large-scale and complex...
  • Coordinate project schedules, deliverables, and client...
1周前