Site Reliability Engineer for Observability Services

SAP
Shanghai, 上海市
1天前
We help the world run better
At SAP, we keep it simple: you bring your best to us, and we'll bring out the best in you. We're builders touching over 20 industries and 80% of global commerce, and we need your unique talents to help shape what's next. The work is challenging – but it matters. You'll find a place where you can be yourself, prioritize your wellbeing, and truly belong. What's in it for you? Constant learning, skill growth, great benefits, and a team that wants you to grow and succeed.


We build up a new next-gen SRE team in our state-of-the-art office in Budapest, Shanghai and Montreal to evolve our cloud platform capabilities and shape the industry. Site Reliability Engineering as part of the development organization, provides 24x7 deep technical coverage for Incident Management (Outages and other incidents with major customer impact) and builds automation for large-scale systems. We share a Live Site First culture and care for the business continuity of our customers running mission critical applications in the Cloud.

We are looking for motivated Site Reliability Engineers to take active role in driving DevOps and Reliability topics within the organization.


EXPECTATIONS AND TASKS


As a Site Reliability Engineer, you will have the opportunity to operate and support business critical Cloud services. As part of your daily job, you will proactively monitor the service behavior and identify areas for improvement. You will participate in the development of tools for monitoring and troubleshooting cloud services built on latest open source and SAP technologies, following SRE principles.


What you will do

  • Act as technical expert during Live Site incidents (downtimes of supported services in scope), investigate and solve incidents on a deep technical level.
  • Drive root cause analysis and follow-up improvements to prevent issues from reoccurring.
  • Perform in-depth troubleshooting and log analysis to identify and solve complex issues in accordance with internal and external SLAs.
  • Build software-based solutions to address improvements in service stability and reliability.
  • Enhance infrastructure and platform monitoring by gathering system metrics (4 Golden Signals) and implementing tools for recovery.
  • Integrate and collaborate closely with development teams and work with them on outputs from Postmortems and product improvements.
  • Learn new technologies and keep up to date with latest development increments.
  • Define, advocate, apply SRE best practices
  • Participate in the on-call rotation (follow the sun approach) to react to major incidents. On-call has a special compensation package.


If you are interested in software engineering based on cutting-edge technology, you will find an inspiring and professional environment for your learning and growth. You will be working in close collaboration with the development teams that build the services which are in your responsibility. We emphasize teamwork and a trust-based working model. Collaboration with other teams in an international environment will be a regular part of your work.


EDUCATION AND QUALIFICATIONS / SKILLS AND COMPETENCIES


  • Bachelor's degree in computer science or engineering or equivalent combination of education and experience
  • Good understanding of modern cloud architectures (experience with Cloud Platforms such as AWS, Azure, GCP are a plus)
  • Enthusiasm for automation - make the computers do the work for you
  • Working efficiently in emergency situations. Affinity to quickly analyze and solve problems in a worldwide team setup
  • Excellent team player, passionate about his/her work, self-motivated and driven
  • Excellent communication skills - precise, based on facts
  • Fluency in English


Professional experience in at least one of the following areas and good knowledge of the rest

§ Experience with one programming language (e.g. Java, C#, Python, Go) and troubleshooting.

§ Scripting and automation

§ Experience with Unix/Linux operating system and good understanding of Linux internals

§ Experience with modern monitoring, logging and alerting tools (Dynatrace, Grafana, Kibana)

§ Database (PostgreSQL) Administration and support

  • Security best practices for application development and operations in Cloud Environment


Experience with any of the following is considered an advantage

  • Network architecture, e.g., TCP/IP, MAC addresses, IP packets, DNS, OSI layers and load balancing
  • Experience with REST APIs is a plus
  • Cloud and container technologies such as Cloud Foundry, Kubernetes, Docker
  • Git, GitHub, Maven, Jenkins, Gradl
We help the world run better
At SAP, we keep it simple: you bring your best to us, and we'll bring out the best in you. We're builders touching over 20 industries and 80% of global commerce, and we need your unique talents to help shape what's next. The work is challenging – but it matters. You'll find a place where you can be yourself, prioritize your wellbeing, and truly belong. What's in it for you? Constant learning, skill growth, great benefits, and a team that wants you to grow and succeed.

We win with inclusion

SAP’s culture of inclusion, focus on health and well-being, and flexible working models help ensure that everyone – regardless of background – feels included and can run at their best. At SAP, we believe we are made stronger by the unique capabilities and qualities that each person brings to our company, and we invest in our employees to inspire confidence and help everyone realize their full potential. We ultimately believe in unleashing all talent and creating a better world.

SAP is committed to the values of Equal Employment Opportunity and provides accessibility accommodations to applicants with physical and/or mental disabilities. If you are interested in applying for employment with SAP and are in need of accommodation or special assistance to navigate our website or to complete your application, please send an e-mail with your request to Recruiting Operations Team: Careers@sap.com. Requests for reasonable accommodation will be considered on a case-by-case basis.

For SAP employees: Only permanent roles are eligible for the SAP Employee Referral Program, according to the eligibility rules set in the SAP Referral Policy. Specific conditions may apply for roles in Vocational Training.

Successful candidates might be required to undergo a background verification with an external vendor.

AI Usage in the Recruitment Process
For information on the responsible use of AI in our recruitment process, please refer to our Guidelines for Ethical Usage of AI in the Recruiting Process.

Please note that any violation of these guidelines may result in disqualification from the hiring process.

Additional Locations:


#SAPNextGen

申请
其他职位推荐:

Reliability Engineer

JLL
Shanghai, 上海市
Our people at JLL are shaping the future of real estate for a better world by combining world class services, advisory and...
1周前

Programing & Application Engineer

Lincoln Electric
Provincia di Cuneo, Piemonte
林肯电气是先进电弧焊解决方案、自动连接、装配和切割系统、等离子和氧燃料切割设备的工程、设计和制造领域的全球领导者,在钎焊和焊接合金领域也占据全球领先地位。林肯因其领先的材料科学、软件开发、自动化工程和应用专业知识而被誉为焊接专家™,这些专业知识可提高客户的制造...
3周前

Reliability Engineer - Accessory

苹果
澳門
  • Ability to use fundamental failure analysis methodology to...
  • Application of FMEA (Failure Modes and Effects Analysis) ...
2周前

Reliability Test Engineer

苹果
澳門
We're looking for someone to help us execute and oversee reliability testing in our state-of-the-art labs You will review test...
2周前

Sales Engineer

江森自控
Shanghai, 上海市
3、以江森自控和客户的保养业务(包括 Open Blue PSA)合作为基础,进一步提供有附加值的解决方案,帮助客户实现安全、可靠、高效率和可持续发展的目标。
6天前

Maintenance Engineer II

美国雅培
Shanghai, 上海市
……在工程维修经理/维修主管的领导下, 负责生产设备、仪器; 负责生产设备等的维修保养、改善工作, 设备质量问题改善,对生产线人员进行基本的操作和清洁维护指导,以保证生产的正常运行。
3周前

Application Engineer ST

Henkel
顺德区, 广东省
在汉高,我们都有着各种不同的背景、观点和生活经历。我们相信,所有员工 的独特性就是我们的力量。加入我们,成为团队的一员,发挥您的独特性!我们欢 迎所有申请,无论申请人属于哪种性别、血统、文化、宗教、性取向、残疾状况以 及世代。
3周前

Design Engineer

Oerlikon
Beijing, 北京市
1. 按计划进行设计,确保设计质量及进度。 2. 解决生产过程中出现的与设计有关的问题,与相关人员沟通,必要时更新设计或制作放行文件。 3. 对本地化工作提供必要的技术支持。 4. 对现场安装提供技术支持,解决安装过程中的设计问题。 5....
3周前

Supply Chain Analyst

Dentsply Sirona
Shanghai, 上海市
Date: Aug 25, 2025 Location: Shanghai, CN, 200042 Company: Dentsply Sirona, Inc 登士柏西诺德是全球最大的专业牙科产品和技术制造商,拥有 130...
1天前

市场传媒总监Director of Marketing & Communications

IHG Hotels & Resorts
澳門, 澳門
Hotel: Xiangyang (KLJXY), Shengli Street, Xiangcheng Dist. 6. 管理酒店的市场活动,包括预算,客人名单、议程/计划、媒体参与、摄影/视频及礼品。 7....
1天前