Deep Learning Performance Architect Intern - 2025

NVIDIA
Shanghai, 上海市
全职
4周前

We are looking for a first-class Deep Learning Performance architect to join in us to shape the performance analysis infrastructures for GPUs. We build cutting-edge analysis tools and visualization frameworks that empower engineers to optimize GPU performance for Deep Learning and HPC workloads—spanning pre-silicon architectural exploration to post-silicon validation and optimization. Your work will directly shape the tools that define how NVIDIA GPUs are analyzed, tuned, and scaled for next-gen AI systems, and impact the next-gen GPUs architectures.

What you’ll be doing:

  • Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle

  • Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities.

  • AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure.

  • Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to co-design performance-centric solutions.

  • End-to-End Optimization: Create benchmarks to validate performance improvements across AI/HPC workloads and present actionable insights.

What we need to see:

  • BS/MS+ in relevant discipline (CS, EE, Math)

  • Proficiency in C/C++ (performance-critical coding) and Python (automation/scripting, and AI/ML frameworks)

  • Strong grasp of computer architecture (pipelines, memory hierarchies) and Operating System fundamentals

  • Understand machine learning and data analysis basics, LLM techniques such as prompt engineering, fine-tuning, vector databases

  • Experience with performance modeling, architecture simulation, profiling, and analysis.

  • Self-starter who thrives in dynamic environments and manages competing priorities effectively.

Ways to stand out from the crowd:

  • Experience with developing HW performance debugging and analysis tools

  • Familiar with System Software Stack(like CUDA Driver), CUDA kernel optimization and understand GPU architecture

  • Familiarity with GPU performance profiling tools like Nsight System, Nsight Compute

  • Practical experience or projects demonstrating LLM-based code generation, automated data analysis, or workflow assistants. Prior experience with agentic LLM frameworks like Langchain and LLamaIndex.

  • Full-Stack Versatility: Skills in JavaScript, SQL, or UI/UX design for tool interfaces.

申请
其他职位推荐:

Senior Project Manager / Senior Architect / Architect / Junior Architect

BIG (Bjarke Ingels Group)
Shanghai, 上海市
  • Lead the management of small- to large-scale and complex...
  • Coordinate project schedules, deliverables, and client...
2周前

LLM High-Performance Optimization Architect

AMD
Beijing, 北京市
  • Develop and implement LLM training and inference frameworks...
  • Analyze and optimize accuracy and performance issues during...
4周前

SAP China iXp Intern - Developer Intern for SAP Business One Web Client - Shanghai

SAP
Shanghai, 上海市
At SAP, we enable you to bring out your best We offer a highly collaborative, caring team environment with a strong focus on...
6天前

SAP China iXp Intern - S4HANA Public Cloud - QA Intern - Shanghai

SAP
Shanghai, 上海市
How? We focus every day on building the foundation for tomorrow and creating a workplace that embraces differences, values...
3周前

Vendor Performance Manager, SMB Growth (English, Mandarin)

Google
北京市
  • 7 years of experience in operations or business management,...
  • 5 years of experience working with executive stakeholders...
1天前