--- layout: markdown_page title: "Monitor workflow - Resolve" --- - TOC {:toc} # Resolve This page contains a description of the Gitlab **Resolve** workflow vision as a part of our [Monitor](https://about.gitlab.com/handbook/engineering/development/ops/monitor/) stage. ## Why Resolve? Resolve is the process of restoring IT services following an incident that disrupted availability. This workflow follows Triage in which the problem at hand was investigated and the root cause determined. Once the fix for the root cause has been determined, the solution must be verified in a local environment before it is deployed to production. Following release, the services must be monitored to ensure that they return to levels that meet SLOs. ## User Journey ### Change proposal The root cause has been discovered and responders have determined a potential solution. The next step is to propose a set of changes for review with the intention of restoring impacted services. In this scenario, the responding team is typically under pressure and the proposed solution may not be a long-term solution. The goal is to restore services for stakeholders as quickly as possible and follow-up the incident with a review where a long-term solution can be designed, discussed, and scheduled for implementation. ### Verify and deploy The solution has been reviewed and approved. A responder implements the solution and tests in their local environment before pushing to master and deploying to production. Depending on progressiveness of the team, this process may be streamlined using CI/CD workflows. ### Monitor metrics After release it is important to monitor production metrics to ensure the solution was comprehensive and worked as intended. Alerts will often auto-resolve during this phase. ### External communication Services have been restored and meet SLOs. A member of the team, often the the Incident Commander, will communicate with stakeholders via different channels (Status Page, social media platforms, internal email, etc) to inform them that services are back up and available. ### Documentation After an incident, it is important to document what happened and how it was fixed. Taking the time to document this information may help the team triage and resolve a similar incident much faster in the future. ## Today ### What's possible We have not enabled the entire workflow detailed above, however, we do have a couple features you can take advantage of today to simplify your Resolve processes: * [Auto DevOps CI/CD pipelines](https://docs.gitlab.com/ee/topics/autodevops): Auto DevOps provides pre-defined CI/CD configuration which allows you to automatically verify and deploy the fix. This can make a difference when the response team is under pressure. * [Monitoring Environments](https://docs.gitlab.com/ee/ci/environments.html#monitoring-environments): All deployments to an environment are shown directly on the monitoring dashboard, which allows easy correlation between any changes in performance and new versions of the app, all without leaving GitLab. ### Maturity This workflow is currently at **Planned** stage. Workflows in the Operations section are graded on the same [maturity scale](https://about.gitlab.com/direction/maturity/) as categories. ## What's next We plan to provide a **Resolve** experience to allows our users to efficiently restore services whether it be deploying a patch to application code or running a script to unclog ETL pipelines. Work supporting this workflow is captured in this [epic](https://gitlab.com/groups/gitlab-org/-/epics/1972).