Senior Site Reliability Engineer (Remote) #2861


  • Information Technology
Apply Now

GovCIO is a team of transformers—people who are passionate about transforming government I.T. We believe in making a difference by developing digital strategies and delivering the technology-related innovation governmental operations that improve the citizen experience every day.

But we can’t do it alone. We welcome and nurture an inclusive and diversified work culture. Because different backgrounds, experiences, abilities, and perspectives make us better decision-makers, problem solvers, and creators. We’re changing the face of I.T. – from our diverse staff to the end-products we develop. And we’re excited to expand our team. Are you ready to be a transformer?


As a Senior Site Reliability Engineer, you will apply your senior application product expert skills to support building processes that manage and improve OIT’s response posture to system events impacting end users and Veterans. This includes working with business partners to improve communication and responsiveness to application failures by minimizing impacts in performance degradation and availability, working towards a significant reduction in application downtime and impact to the users. You will be working with a team of site reliability engineers, both junior and senior level, to support an engineering team lead to perform the required deliverables.

Areas of support include:

  • Triage Major Incident Management (MIM) and Problem Management (PM) incidents by deconstructing application performance, interoperability, instrumentation, and human factors to facilitate resolution and development of resilient solutions.
  • Support coordination and ensure all High Priority Incident (HPI) and Critical Priority Incident (CPI) are triaged properly and routed to the appropriate and correct groups for immediate resolution.
  • Perform enterprise root cause analysis (RCA) and identification in coordination with appropriate OI&T organizations
  • Capture technical information from the relevant stakeholders and synthesize it into useful information in various formats for OIT senior management and other VA components.
  • Support the collection, development, and/or editing of content for white papers and other communication devices; and assess and evaluate the effectiveness of executive communication to effect process improvement.
  • Demonstrate proficiency with DevOps tools, JIRA, ServiceNow, and MS Project and perform tasks using the tools 
  • Analyze incident record data, research trends, and digest findings into written recommendations and strategies for improving the posture of the VA’s information technology services, reducing both MTTR and incident occurrence frequency.
  • Case management and follow-through post-incident resolution for root cause analysis, developing permanent fixes and preventative strategies to reduce MTTR and incident reoccurrence.
  • Digesting and writing technical recommendations for case management and trend analysis presentations.

Required Skills and Experience

  • Masters Degree is preferred in Business Administration, Business Management, Computer Science, Information Systems, Information Resource Management, Industrial Engineering, Operations Research, or related fields
  • 5+ years of relative experience
  • Certifications in relevant UX software plus 3-5 years of relevant experience;
  • 8 to 10 years of relevant experience may be substituted for education (13-15 years total)
  • Be a technical expert with expertise across multiple technology areas and the ability to diagnose complex issues throughout many technologies.
  • Must be able to identify and mitigate risks to the product
  • Must be able to provide oral and written discussion of analytical findings using narrative and graphic forms.
  • Must be able to use qualitative and quantitative analytical skills to assess the effectiveness of the operations.
  • Identifying symptoms for process improvement.
  • Communications including being able to craft content for executive-level presentations.
  • IT background and ability to understand technical content.
  • Experience working with packet capture analysis using tools such as Wireshark or Netscout.
  • Experience with monitoring tools such as Splunk, AppDynamics, SolarWinds or Dynatrace.
  • ServiceNow experience is nice to have.
  • Understands the RCA process and can work across teams to guide the implementation of solutions to identified incident root causes.
  • Broad understanding of ITIL.



COVID Policy: New employees will be required to adhere to the Company’s and its clients’ COVID-19 safety procedures. In the event that the COVID-19 vaccination mandate for Federal Contractors is enforced, you must become fully vaccinated or request and be approved for an exemption. Employees working onsite at a client location must comply with our client’s COVID-19 requirements.


GovCIO is a team of professionals who want to make a difference. And that can only happen with a diverse, happy, and cared-for team. So, we prioritize your well-being, equity for all and look for ways to make work a better place for each of us every day.


We are an Equal Opportunity Employer. All qualified applicants receive consideration for employment without regard to race, ethnicity, religious affiliation, gender, gender identity or expression, sexual orientation, national origin, or disability status.

Compensation Range (In compliance with Colorado's Equal Pay for Equal Work Act for remote or positions located in CO)

$145.00 - $180,000

Apply Now

Not The Right Fit?

Is this not the job you’re looking for? That’s ok! We’ve got plenty of other opportunities for you to peruse. Search all of our open positions by your area of interest or location.

View All Jobs