Site Reliability Engineer (Remote, USA)

Job Summary:

American Trust seeks a motivated Site Reliability Engineer (SRE) that thrives in a rapidly evolving technology environment. As part of the SRE Team, you will foster a culture of SRE for horizontal activities and DevOps for products and tools across our operations teams. The team you work in will have diverse expertise in systems, networking, and software engineering to provide the stability, performance, and reliability our customers need. We work with multiple service and development teams to identify cross-team issues which create risk for operations across the organization and resolve those issues with a mixture of engineering, troubleshooting expertise, and general operational guidance.

Responsiblities:

Solve complex problems related to infrastructure services and build automation to prevent problem recurrence.
Ensure that the underlying infrastructure runs smoothly, and that systems and tools work as expected.
Monitor critical applications and services to minimize downtime and ensure their availability.
Design, write, and deploy software to improve the availability, scalability, and efficiency of products and services used by American Trust.
Work with the Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and technology areas.
Partner with development teams in defining and implementing improvements in service architecture.
Act as the ultimate escalation point for complex or critical issues that still need to be documented as Standard Operating Procedures (SOPs).
Utilize a deep understanding of service topology and the dependencies required to troubleshoot issues and define mitigations.

Qualifications:

3+ years of experience in software engineering
M.S / B.S. in Computer Science, Computer Engineering, Software Engineering, or related areas is preferred.
Extensive knowledge of SQL (MSSQL, Postgres or Oracle)
Experience building backend development products
Be comfortable with coding (Java, JavaScript, and .NET)
Experience in System Administration, including Linux and Windows internals, TCP/IP, DNS, Load balancing technologies
Cloud network experience
Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.
Self-motivated, independent, creative problem solver.
Lifelong learner, striving for continuous growth and development
Excellent organizational, verbal, and written communication skills

Preferred Skills:

Experience in a 24×7 high-availability production environment
Experience VDI and Remote Administration Services
Knowledge of cloud computing technologies, network monitoring
Database knowledge in ATP, ADW, and programming in SQL, PL/SQL
Experience in developing scripts to automate software deployments and installations
Demonstrate a clear understanding of automation and orchestration principles.

Salary: $100,000 – $110,000 base salary, based on experience, plus eligibility to participate in company’s bonus program.