AIOps / DevOps / SRE Software Engineer: Engineering Infra - Marketplace SRE
Shopee in Singapore, Singapore
Information Technologies
Full-Time
The Engineering and Technology team is at the core of the Shopee platform development. The team is made up of a group of passionate engineers from all over the world, striving to build the best systems with the most suitable technologies. Our engineers do not merely solve problems at hand; We build foundations for a long-lasting future. We don't limit ourselves on what we can or can't do; we take matters into our own hands even if it means drilling down to the bottom layer of the computing platform. Shopee's hyper-growing business scale has transformed most "innocent" problems into huge technical challenges, and there is no better place to experience it first-hand if you love technologies as much as we do.
About the Team:
Shopee will be prioritizing applicants who have a current right to work in Singapore, and do not require Shopee sponsorship of a visa.
Kindly note that you can only be considered in one recruitment process at a time within Sea Group and will be considered for jobs in the order that you have applied.
As a Marketplace SRE (Site Reliability Engineer) at Shopee, you will manage the technical operations of our e-commerce platform, encompassing core marketplace businesses, engineering infrastructure, and middlewares. This includes product lines like order and checkout, promotion, user and accounts, seller and listing, games, customer service, mobile apps, PC/mobile web, and Shopee tech services. You will help build and maintain large-scale distributed systems that are robust, scalable, and cost-efficient, ensuring maximum system availability and performance.
The role blends software engineering and system operations expertise, requiring you to design, implement, and maintain full-stack intelligent solutions for operational challenges. You will work closely with Shopee’s development and business teams, diving deep into system architecture and operations cycles to drive innovation and scalability. Additionally, the team is engaged in AI-related initiatives in areas such as incident management and troubleshooting, capacity management and projections, system anomaly detection, chat operations and support, and knowledge management, further enhancing operational efficiency.
What We Offer:
- A fun and energetic team culture emphasizing learning, sharing, and personal growth.
- A clear learning roadmap tailored for new hires.
- Wide exposure to different projects, enabling rapid personal and career development.
Job Description:
- Design and implement systematic, intelligent solutions to optimize operations for millions of users globally, ensuring the seamless operation of thousands of production systems.
- Develop and maintain more than 10 full-stack automation platforms to improve operational efficiency and reliability.
- Set up, manage, and maintain Shopee services and middlewares, ensuring their reliability during campaigns and all-year-round operations.
- Deep dive into Shopee’s core product lines and contribute to their scalability, availability, and performance.
- Balance time between software engineering (50%) and technical operations (50%).
- Build and enhance the SRE ecosystem to optimize system performance and minimize costs.
- Receive mentorship through structured learning programs for both fresh graduates and experienced hires.
Requirements:
- Bachelor’s or higher degree in Computer Science, Computer Engineering, Information Systems, or related fields.
- Strong fundamentals in computer science, including data structures, algorithms, operating systems, computer networking/security, virtualization, and containerization.
- Solid software engineering and application architecture skills, including backend/frontend development, architecture design patterns, and middlewares (e.g., caches, databases, queues, file storage).
- Individual traits: fast learning ability, strong team player, analytical and problem-solving skills, adaptability in a dynamic environment, passion for the role, and strong sense of ownership.
Preferred Skills (Optional):
- Experience with DevOps concepts and tools.
- Familiarity with Site Reliability Engineering principles and practices.
- Hands-on experience with automation tools like Ansible, etc.
- Proficiency with monitoring tools such as Prometheus, Grafana, etc.
- Knowledge of load balancing tools like LVS, Nginx, OpenResty, HAProxy, etc.
- Experience with container technologies (e.g., Docker, Kubernetes).
- Skills in load testing, capacity management, and campaign preparation.
- Experience with AI-driven tools and frameworks to enhance operational efficiency.
Join us in building an efficient, scalable e-commerce infrastructure that enhances user experience across the globe!
Apply to this job and join Ivy Exec
Ivy Exec members get:
Access to 60,000+ senior-level job openings
Opportunities to join market research studies
A dedicated Career Advisor
Exclusive career growth courses and content