Skip navigation EPAM

Systems Engineer (Site Reliability) Singapore, Singapore

  • hot

Systems Engineer (Site Reliability) Description

Job #: 57731
Striving for excellence is in our DNA. Since 1993, we have been helping the world’s leading companies imagine, design, engineer, and deliver software and digital experiences that change the world. We are more than just specialists, we are experts.


Currently we are looking for a Systems Engineer (Site Reliability) to make the team even stronger.


  • Work with our business units, development teams, and many other units to help maintain the high quality and service level objectives of our systems
  • Optimize the supportability of systems through automation and applying basic SRE principles such as blameless post-mortem, error budget, and automation
  • Provide production support for the application domain when applicable
  • Manage, monitor and operate the system to ensure all business functions are running smoothly
  • Work across teams to continually review, provide feedback, implement best practices to improve the efficiency of the systems and drive future innovation
  • Manage on-going changes while retaining high levels of service availability to our customer base
  • Pragmatically identify root cause for production incidents and lead to implement necessary actions to prevent recurrence
  • Drive incident management process and support a blameless post-mortem culture
  • Automate the system operations to reduce Toil and attain high level of efficiency
  • Participate in platform operations management and capacity management
  • Coordinate and implement platform/infrastructure upgrades and releases with technical and business teams
  • React to critical issues immediately - troubleshoot, investigate and apply appropriate solutions to normalize systems operations
  • Provide off-hour/weekend support to ensure production systems stability
  • Troubleshoot problems across a wide range of technical skills (development, CI/CD, infrastructure, etc.)
  • Maintain awareness of relevant technical and product trends with self-learning and job shadowing
  • Create and maintain the operational documents to reflect system changes and upgrades
  • Communicate effectively, professionally and comfortably, both verbally and in writing across all levels


  • Bachelor’s Degree/Diploma in Computer Science, Computer Engineering, or Computer Application. Equivalent experience may be considered
  • 3+ years of experience working in supporting critical applications using API driven technologies
  • 2+ years of hands on experience in Python development (preferably with RESTFUL APIs)
  • 2+ years of working with a modern stack (AWS, PCF, containers, or Kubernetes)
  • 1+ years of Continuous Integration and Continuous Delivery experience through Jenkins or equivalent
  • Experience with modern observability tools such as Grafana, Kibana, or Prometheus preferred
  • Experience working in an Agile (SAFE or Kanban) environment preferred
  • Knowledge and/or experience using SQL and Linux Shell scripting
  • Basic understanding of firewalls, load balancers, and networking concepts
  • Communication skills with all levels and teamwork spirits are essential
  • Proactive with good analytical and organization skills
  • Ability to work independently, multi-task, prioritize and deliver in a time pressured environment

We offer

  • Friendly team and enjoyable working environment
  • Work-life balance and flexible schedule
  • Online training library, mentoring, career development and potential partial grant of certification
  • Unlimited access to LinkedIn learning solutions
  • Referral bonuses
  • Compensation for sick leave and paid time off
  • Opportunities for self-realization

Hello. How Can We Help You?



Our Offices