This content was reproduced from the employer’s website on December 27, 2022. Please visit their website below for the most up-to-date information about this position.
Your Impact on our Mission:
Zocdoc is looking for a Senior Site Reliability Engineer to help monitor, maintain, and improve our production systems. You’ll be challenged with ensuring uptime and ludicrous speed response times for our patients and providers in a constantly changing environment. You’ll work with microservices and distributed systems in our AWS Cloud based environment. We’re looking for someone who loves challenging the status quo and strives to make everything they touch easier, faster, and more robust. This position can be based out of our NYC headquarters, or fully remote.
You’ll enjoy this role if you are…
- Passionate about ensuring complex systems never skip a beat
- Pragmatic in your decision making day-to-day
- Motivated to learn new technologies, design patterns, and work in the cloud
- Comfortable with failures and outages and believe in blameless post-mortems
- Excited to work in a highly collaborative environment with diverse individuals
- Autonomous, individually accountable, and comfortable working in a remote environment
- A believer that diverse and inclusive teams and cultures are non-negotiable
Your day to day is…
- Monitoring and maintaining complex cloud-based infrastructure, systems, and services and ensuring its uptime in order to enable millions of patients to get the care they need
- Supporting our large product engineering org with their scaling, performance, and uptime needs as well as helping diagnose and debug production related issues
- Automating and codifying our tooling, processes, and infrastructure to speed up development and make them repeatable and error-proof
- Analyzing and performance tuning systems, code, and networking for scaling and optimal operation
You’ll be successful in this role if you have…
- A Bachelor’s degree in Computer Science, Computer Engineering, or equivalent engineering experience
- 10+ years of total experience with 5+ years of either software development or systems administration, followed by 5+ years of supporting consumer facing web application production environments and systems in a Site Reliability Engineering or Production Engineering role
- 2+ years of on-call experience in a 24/7 cloud-based production environment
- 2+ years of experience in managing and supporting modern cloud-based environments and infrastructure like AWS/Azure/GCP, Docker, Kubernetes, etc.
- Experience with edge technologies such as load balancers, reverse proxies, web application firewalls, routing, etc.
- Deep understanding of protocols such as TCP/IP, HTTP/HTTPS, TLS, DNS, NTP