Company Description
AGSI was incorporated in April 2016. We are committed to supporting the goals of Arch divisions through exceptional service delivery. We pride ourselves on maintaining flexibility and responsiveness to adapt to business unit and industry demands while focusing on sound project management. We are dedicated to growing and developing our employees as we build strong teams with strategic leadership.
Job Description
The DevOps Engineer partners closely with architecture, and development teams to operate and scale AWS-based, cloud-native, highly available, and fault-tolerant systems. This role ensures that internally and externally facing applications consistently meet or exceed ambitious SLAs (Service Level Agreements) while driving continuous improvement in reliability, performance, and efficiency.
The Systems Engineer is responsible for managing day-to-day operations of AWS infrastructure, including automated deployments, observability, incident response, and ongoing optimization of systems and services. Leveraging established and emerging AWS tools, the engineer will contribute to a secure, resilient, and cost-optimized cloud environment.
Achieves all of the above while keeping costs optimized.
Job Responsibilities:
- Infrastructure Design & Management – Architect, provision, and maintain scalable, secure, and highly available cloud infrastructure.
- CI/CD Pipeline Development – Build, optimize, and maintain automated build, test, and deployment pipelines to accelerate delivery.
- Monitoring & Observability – Implement and manage monitoring, logging, and alerting systems to ensure system health and performance.
- Configuration Management & Automation – Leverage tools (e.g. Cloudformation or similar) to automate infrastructure and application management.
- Security & Compliance – Enforce security best practices (IAM, secrets management, patching, vulnerability scanning) and ensure compliance with standards.
- Incident Response & Troubleshooting – Lead root cause analysis and resolution of production issues, ensuring minimal downtime.
- Cost Optimization – Monitor and optimize cloud resource usage to balance performance and cost efficiency.
- Collaboration with Development Teams – Work closely with developers to integrate DevOps practices, improve workflows, and provide infrastructure guidance.
- Scalability & Reliability Engineering – Design and implement systems that can handle growth while maintaining resilience and performance.
- Mentorship & Best Practices – Guide engineers, advocate for DevOps culture, and define standards for automation, deployment, and operations.
Qualifications
Required Skills:
- Deep expertise in AWS cloud infrastructure operations, including core services (EC2, S3, RDS, Lambda, ECS/EKS, CloudFormation/Terraform, IAM, VPC, CloudWatch, etc.), with strong understanding of high availability, fault tolerance, and cost optimization.
- Proven experience as a strategic problem solver with advanced analytical and decision-making skills, able to balance speed, reliability, and security in critical production environments.
- Extensive experience supporting mission-critical production systems, including incident management, root cause analysis, and post-mortem processes.
- Strong metrics-driven mindset with the ability to establish and track SLAs for infrastructure reliability and performance.
- Hands-on expertise in infrastructure automation and configuration management (Terraform, CloudFormation, CDK, Ansible), container orchestration (EKS), and CI/CD pipelines.
- Advanced proficiency in scripting and automation (Python, Bash, PowerShell), enabling efficiency in operations and systems management.
- Strong background in observability with experience implementing logging, monitoring, and alerting solutions using AWS-native services (CloudWatch, X-Ray, GuardDuty) and third-party tools (Datadog, Prometheus, Grafana, ELK, ETC).
- Solid understanding of DevSecOps practices including secrets management, vulnerability scanning, compliance frameworks (SOC2, HIPAA, GDPR), and IAM governance.
- Experience with multi-account and multi-region AWS environments, applying best practices for networking, security, and resource management.
Additional Information
Bachelor’s degree in Computer Science, Information Systems, any Engineering course, or related field. 5 - 10 years equivalent work experience across infrastructure and software engineering roles.