Skip to main content
Sr. Site Reliability Support Engineer

Job Details: Sr. Site Reliability Support Engineer

Job Description: Sr. Site Reliability Support Engineer

Position Id: J0524-0031
Job Type:contract-to-hire
Country: United States
Location: Jersey City, NJ
Pay Rate: Open
Contact Recruiter:(732) 876-7629

Job Description:

Trigyn’s financial services client has an immediate need for a Site Reliability Engineer in Jersey City. This is a long-term contract assignment, that could potentially become a “temp to perm” opportunity for the right candidate. Please see details below:

Location: Must be able to work a minimum of 2-3 days/week onsite in Jersey City, NJ.

Overview:
Our client is looking to enhance their existing staff with additional Sr. Site Reliability Engineers (SRE) to help their internal team provide production support in a public cloud environment. In this role, you’ll be working with and supporting cloud engineers to build the platform, pipeline, and monitoring systems to ensure the application landscape is designed to take most advantage of the firm’s global cloud solution.

The ideal candidate will have strengths in:
• A strong understanding of SRE philosophy, technologies, platforms and tools, SLA management, incident resolution, and automation
• Mastery of application, data and infrastructure architecture disciplines
• Understand the basics of architecture, design and business processes
• Expertise in working in partnership with colleagues throughout the firm, and in leading collaborative teams to achieve common goals
• Hands on experience on managing operations of large-scale internet-centric production environments for application or infrastructure services serving enterprise environments (millions of end users).
• Prior experience in large scale internet companies/technologies, where uptime and continuous availability was core to the business.
• Identify and partner with Infrastructure teams and AD teams to implement automation opportunities to drive down toil and reduce technical debt.
• Apply standards of cloud compliance to application design to achieve reliability
• Understanding of networking and cloud (AWS) technologies, for example Security, Load Balancing, Network routing protocols.

Responsibilities:
• Implement SRE frameworks to support globally multi-cloud environments, and ensure the highest level of SLA through operational excellence
• Provides failure analysis / root cause analysis when required
• Provides support to develop & improve the quality of technical engineering documentation
• Provides support to drive the maturity of the software development lifecycle
• Provides quality control of engineering deliverables
• Provides technical consultation to product management
• Performs deployment, administration, management, configuration, testing, and integration tasks related to the Blockchain (DLT) platforms in cloud environment
• Helps to develop new cloud engineering strategies and implementations for the firm
• Champion a DevOps model so that services are automated and elastic across all platforms
• Helps on coaching and mentoring less experienced team members.
• Writes operation documentation and knowledge base of known issues with solutions
• Participates in 24x7 SRE on-call rotations and escalation workflows.

Required:
• Demonstrated experience as a Site Reliability Engineer
• Bachelor’s degree or equivalent experience in a software engineering discipline
• Expertise in at least one technology stack designing, coding, testing, and delivering software
• Proficiency in one or more technology domains may be a cross-domain expert able to solve complex and mission-critical problems within a business or across the firm
• Working knowledge of infrastructure components (e.g., routers, load balancers, cloud products, container systems, compute, storage, and networks)
• Excellent debugging and troubleshooting skills
• Proven experience as a software engineer, including proficiency in at least one systems programming language (Python/Go preferred)
• Understanding of key SRE concepts, such as Service Level Objectives (SLOs), Service Level Agreements (SLAs), and Service Level Indicators (SLIs)
• Understanding of observability in distributed systems
• Experience with Linux
• Experience Kubernetes and AWS; ideally IAM & VPC Networking, Prometheus and Grafana

Desired:
• Terraform
• AWS - Sysops/Solution Architect Certification
• Prometheus / PromQL
• Datadog
• Kubernetes CKA/CKAD Certifications.

NOTE: No Third Party or Independent Contractors permitted.

For Immediate Response call 732-876-7629, or send your resume to RecruiterOR@Trigyn.com

TRIGYN TECHNOLOGIES, INC. is an EQUAL OPPORTUNITY EMPLOYER and has been in business for 35 years. TRIGYN is an ISO 9001:2015, , ISO 27001:2013 (ISMS),ISO 20000:2018 and CMMI Level 5 certified company.