Senior Site Reliability Engineer
Santa Clara, CA 95054
Senior Site Reliability Engineer -SRE
This is a visible position for a seasoned SRE building next generation cloud infrastructure with a specific focus on enterprises workloads in a high volume fast scaling Cloud environment.
Your primary role as a Sr.Site Reliability Engineer will be working on a hyper-scale cloud (IaaS, PaaS or SaaS) platform in a Multi-Cloud software development and DevOps.
Code in Python/Ansible in a production environment.
Key Responsibilities Are:
Code in Python/Ansible, troubleshoot, debug, support production environment (VMWare)
Serve as go to person for architecting and delivering all of the Operations/SRE services and processes getting your hands dirty, troubleshooting infrastructure, and architecting data centers, coding 30-40% on different stack on tope of of VMWare Cloud to run production environment.
Continuously analyze the current Site Reliability capabilities and identify areas of improvements
Identify, define, and implement new tools and technologies for improving the quality and efficiency of distributed Cloud platform.
Manage automated infrastructure deployments, ongoing operation and monitoring of Cloud infrastructure, working closely with the development teams.
Drive reliability and supportability aspects of Cloud service, including change management, triage of customer escalations, remediation plans, playbooks and automations.
Qualifications You Must Have:
Hands on coding in Python and Ansible.
10 or more years of experience in data center operations, VMware Cloud Stack.
A minimum of 3+ years of experience in a Senior SRE/DevOps role in a cloud service company.
Worked in a hyper-scale Multi-Cloud (Azure, AWS, GCP), or a SaaS/PaaS company.
Prior experience as a cloud-native and microservices software developer in a DevOps function for continuous integration and delivery (CI/CD).
Prior experience designing, deploying and managing VMware SDDC platforms.
Expertise in cloud technologies (Containers, Docker, Kubernetes, Elastic, Logstash, Kibana, Kafka, Consul, Cassandra) is a MUST.
Experience in cloud provisioning code development and tools (Azure Management API, GCP API, Terraform)
Virtualization technologies, in particular VMware product suite (vSphere, VSAN, NSX, vROps)
Deep understanding of data center networking, including Software Defined Networking (SDN) and network architecture of Azure/AWS/GCP.
Experience operating Large-scale (Linux) production environments, in an online service provider environment.
Deep understanding of monitoring and support platforms, such as Prometheus, Kibana, Grafana, Zendesk, and Pagerduty.
Our client is a substantially funded start up client developing next Generation Multi-Cloud Infrastructure Service to deliver dramatically better efficiency and control for Enterprise applications.