--- layout: markdown_page title: "Product Direction - Monitor" --- ## On this page {:.no_toc} - TOC {:toc} This is the product direction for Monitor. If you'd like to discuss this direction directly with the product managers for Monitor, feel free to reach out to Dov Hershkovitch ([GitLab](https://gitlab.com/dhershkovitch), [Email](mailto:dhershkovitch@gitlab.com)), Sarah Waldner ([GitLab](https://gitlab.com/sarahwaldner), [Email](mailto:swaldner@gitlab.com) [Zoom call](https://calendly.com/swaldner-gitlab/30min)) or Kevin Chu ([GitLab](https://gitlab.com/kbychu), [Email](mailto:kchu@gitlab.com) [Zoom call](https://calendly.com/kchu-gitlab/30min)). ## Landscape The monitoring and management market, otherwise branded as observability, is well-established and crowded. It is also fast-changing in terms of technologies used and users' expectations. For instance, the trend to move infrastructure to public cloud introduced a new category of technologies to monitor that traditional vendors did not address. This SaaS delivery model disrupted the on-prem delivery model for many existing vendors. More recently, a transition from virtualization to container-based technologies caused another wave of adjustment to what it means to monitor or observe. This further challenged existing vendors. These and other market trends allow new entrants (like [Sentry](https://sentry.io/welcome/) and current market leader [Datadog](https://www.datadoghq.com/)) to quickly capture mindshare and eclipse existing vendors. ## Vision Statement In 2 year’s time, GitLab aims to make observability a commodity by being ubiquitous, complete, cost effective, and simple to setup and operate for any cloud-native team, enabling them to continuously improve. By using GitLab, teams can reduce the frequency and severity of issues in production. GitLab, at this particular time, is uniquely qualified to deliver on this bold and [ambitious](/handbook/product/#how-this-impacts-planning) vision because: 1. GitLab is a complete devops tool that is connected across the devops stages. Being one tool makes the circular devops workflow, and particularly the observability feedback loop, achievable. 2. GitLab's mission is that everyone can contribute. We believe that having observability that's available as a commidity facilitates the mission. As such, we will take a differentiated path from other vendors and not pursue a usage based model where processing and storing of telemetry is part of the business model. Rather, we are leaning and will continue to lean heavily on powerful open source software and commodity cloud services to enable customers to setup and operate GitLab's observability solution that customers are in control of. 3. Going cloud-native is a disruption to operations as usual. Cloud-native systems are constantly changing, are ephemeral, and are complex. As more and more companies adopt cloud-native, by building on the tools that cloud-native teams are already familiar with and are using, GitLab can create a well-integrated central control-pane that enables broad adoption of observability. A trade-off in our approach is that we are explicitly not striving to be a fully turn-key experience that can be used to monitor all applications, particularly legacy applications. Wholesale removing an existing monitoring solution is painful and a land and expand strategy is prudent here. As a customer recently explained, "Every greenfield application that we can deploy with your monitoring tools saves us money on New Relic licenses." As this stage matures, we will begin to shift our attention and compete more directly with incumbent players as a holistic Monitoring solution for modern applications. ## 3 Year Strategy Dovetailing on our 2 year vision statement, our 3 year goal is to have built an integrated package of observability and operations tools that can displace today's front-runner in modern observability, Datadog and compete in all Monitor categories. We'll do that by focusing on the four core workflows of Instrument, Triage, Resolve and Improve. The following links describe our strategy for each individual workflow: * [Instrument](https://gitlab.com/groups/gitlab-org/-/epics/1945) - Auto-detected and in app instrumentation, in code SLO definition and visual alert threshold setting * [Triage](https://gitlab.com/groups/gitlab-org/-/epics/1947) - Starting with the highest level alert, using preconfigured dashboards to review relevant metrics, enabling ad-hoc visualization and immediate drill down from time sliced metrics into logs and traces in the same screen * [Resolve](https://gitlab.com/groups/gitlab-org/-/epics/1972) - Access to automation and documentation for known remediations, integration of collaboration and user response tools * [Improve](https://gitlab.com/groups/gitlab-org/-/epics/1973) - Automated incident review creation which compiles recorded information from the incident then track the code, infrastructure and observability improvements created by the incident. Track business, performance and availability metrics overtime. ## Overview The Monitor stage comes after you've configured your production infrastructure and deployed your application to it. As part of the verification and release process you've done some performance validation - but you need to ensure your service(s) maintain the expected service-level objectives ([SLO](https://en.wikipedia.org/wiki/Service-level_objective)s) for your users. GitLab's Monitor stage product offering makes instrumentation of your service easy, giving you the right tools to prevent, respond to, and restore SLO degradation. Current DevOps teams either lack exposure to operational tools or utilize ones that put them in a reactive position when complex systems fail inexplicably. Our mission is to empower your DevOps teams by finding operational issues before they hit production and enabling them to respond like pros by leveraging default SLOs and responses they proactively instrumented. GitLab Monitoring allows you to successfully complete the DevOps loop, not just for the features in your product, but for its performance and user experience as well. Using GitLab observability solutions, users will be handed with an easy way to gain a holistic understanding of the state of production services across multiple groups and projects. When you are deploying a suite of services, it's critical that you can drill into each individual services SLO attainment as well as troubleshoot issues which span multiple services. We track epics for all the major deliverables associated with the north stars, and category maturity levels. You can view them on our [Monitor Roadmap](https://gitlab.com/groups/gitlab-org/-/roadmap?scope=all&utf8=✓&state=opened&label_name[]=devops%3A%3Amonitor). ### What's Observability? The terms monitoring and observability are at times used interchangeably and can cause some confusion. Note - Yes, we're also guilty of this and actively improving it. If you see room for improvement, please feel free to make a contribution! [Observability](https://en.wikipedia.org/wiki/Observability) is the ability to infer internal states of a system based on the system’s external outputs. Monitoring, on the other hand is the activity of observing the state of a system over time. To achieve observability, your system’s various telemetry types should all be available to enable proactive introspection and enable greater operational visibility. The overarching goal for the GitLab's monitor category is to help improve the observability of your applications and system. If you are interested in more information on this topic, [Charity Majors](https://twitter.com/mipsytipsy), CTO of [HoneyComb](https://www.honeycomb.io/), has given many great talks and written many articles on this topic. Giving credit where it is due, Charity, has played a major role in pointing out the shortcomings of monitoring and helped push observability to the mainstream. Here are some useful articles from her along with [Ben Sigelman](https://twitter.com/el_bhs) of [LightStep](https://lightstep.com/) on this topic: * [Observability - A 3-Year Retrospective](https://thenewstack.io/observability-a-3-year-retrospective/) * [Observability is a Many-Splendored Definition](https://charity.wtf/2020/03/03/observability-is-a-many-splendored-thing/) Note - Charity has [critiqued our direction](https://twitter.com/mipsytipsy/status/1210769987477991425?s=20) in the past. Points taken, improvements coming! ## What's next We are currently in the process of bringing most of the Monitor categories to `minimal` maturity. Post this effort, we will have two main focus areas for the next 3 to 6 months. First, we plan to provide a streamline triage experience to allows our users to quickly identify and effectively troubleshoot an application problem as described in the following flow: ```mermaid graph TB; A[Alerts] -->|Embedded Metric Chart in Incident|B B[Metrics] -->|Timespan Log Drilldown|C C[Logs] -->|TraceID Search|D[Traces] ``` * Triage flow starts with triggered alert on breached metric * Alert opens an incident from which the user can see the current status * The user can drill directly from the incident into the relevant logs and search for the root cause analysis * User can drill from each log into APM traces and view the stack trace Detailed information can be found in the [triage to minimal epic](https://gitlab.com/groups/gitlab-org/-/epics/2225) Second, we plan to [dogfood](#dogfooding) our current capabilities. Monitor and observability solutions, by nature of what they are, have a high bar to meet before adoption. By continuing to improve the triage workflow, we will at the same time enable our GitLab teammates to use GitLap Monitor more fully. We will pause incremental investment in additional Monitor capabilities until we have at minimum met GitLab's internal need for Monitoring. ## North Stars We're pursuing a few key objectives within the Monitor Stage. ### Instrument with ease Your team's service(s), first and foremost, need to be observable before you are able to evaluate production performance characteristics. We believe that observability should be easy. GitLab will ship with smart conventions that setup your applications with generic observability. We will also make it simple to instrument your service, so that custom metrics, ones that you'd like to build your own SLOs around, can be added with a few lines of code. ### Detect what's important Alerting and notification services is a table-stakes expectation of APM, and Metrics solutions. GitLab will build a great experience for setting thresholds and metrics, including setting smart defaults for known metrics. We'll lean heavily on our early integration with Prometheus scheduling, notification, and alerting services. Beyond alerting, integration with chatops and incident management is also going to be important. ### Visualize and triage Visually working with time-series data is an important expectation of an observability solution. Our dashboarding solutions will include an ad-hoc data visualization which allow us to quickly build time-series based visualizations based on metrics, charting them against related metrics, and breaking them down per the field of your choice. A dashboarding system should also provide a curated UI experience for the established vendors that are clearly in the lead. The most effective way to bootstrap usage of a new feature / solution is to expose existing users to it in the context of what they are already doing. All 3 solution areas (Logs, Metrics and APM) should incorporate integrations of each solution and a guide on how to get started. In addition to cross-linking between observability apps, a number of broader GitLab initiatives ### Resolve like a pro We want to help teams resolve outages faster, accelerating both the troubleshooting and resolution of incidents. GitLab's single platform can correlate the incoming observability data with known CI/CD events and source code information, to automatically suggest potential root causes. ### Gain insights seamlessly Continuously learning and driving those insights back into your development cycle is a critical part of the DevOps loop. The tools in the Monitor stage make it possible to gain insights about production SLOs, incidents and observability sources across the multi-project systems that make up a complete application. Container based deployments have rapidly expanded the amount of observability information available. It is no longer possible to collate and visualize this information without automation and distillation of valuable insights which GitLab can do for you. We'll also provide views across a suite of applications so that managers of a large number of DevOps or Operations teams can get a quick view of their application suite, and team's health. ## Principles Our north stars are the guide posts for where we are headed. Our principles inform how we will get there. First and foremost we abide by GitLab's universal [Product Principles](https://about.gitlab.com/handbook/product/#product-principles). There are a few unique principles to the Monitor stage itself. ### Complete the Loop First As part of our general principle of [Flow One](/handbook/product/#flow-one) the Monitor stage will seek to complete the full observability feedback loop for limited use cases first, before moving on to support others. As a starting point this will mean supoprt for [modern](/handbook/product/#modern-first), [cloud-native](/handbook/product/#cloud-native-first) [developers](/handbook/product/#developer-first) first. ### Observability for those who operate In modern DevOps organizations developers are expected to also operate the services they develop. In many cases this expectation isn't met. Whether a developer is the one operating an application or not, we will build tools that work for those doing the operator job. This means forgoing preferences, like developers to avoid deep production troubleshooting, and instead building tools that allow those who operate to be best-in-class operators, regardless of their title. ### Dogfooding Our users can't expect a complete set of Monitoring tools if we don't utilize it ourselves for instrumenting and operating GitLab. That's why we will [dogfood everything](/handbook/values/#dogfooding). We will start with GitLab Self-Monitoring and our own Infrastructure teams. We want self-managed administrator users to utilize the same tools to observe and respond to health alerts about their GitLab instance as they would to monitor their own services. We'll also complete our own DevOps loop by [having our Infrastructure teams for GitLab.com utilize our incident management feature](https://gitlab.com/groups/gitlab-org/-/epics/1672). ## Performance Indicators (PIs) Our [Key Performance Indicator](https://about.gitlab.com/handbook/ceo/kpis/) for the Monitor stage is the **Monitor SMAU** ([stage monthly active users](https://about.gitlab.com/handbook/product/metrics/#monthly-active-users-mau)). Monitor SMAU is determined by tracking how users *configure*, *interact*, and *view* the features contained within the stage. The following features are considered: | Configure | Interact | View | |-----------|----------|------| |Install Prometheus|Add/Update/Delete Metric Chart|View Metrics Dashboard| |Enable external Prometheus instance integration|Download CSV data from a Metric chart|View Kubernetes pod logs| |Enable Jaeger for Tracing|Generate a link to a Metric chart|View Environments| |Enable Sentry integration for Error Tracking|Add/removes an alert|View Tracing| |Enable auto-creation of issues on alerts|Change the environment when looking at pod logs|View operations settings| |Enable Generic Alert endpoint|Selects issue template for auto-creation|View Prometheus Integration page| |Enable email notifications for auto-creation of issues|Use /zoom and /remove_zoom quick actions|View error list| ||Click on metrics dashboard links in issues|| ||Click **View in Sentry** button in errors list|| See the corresponding [Periscope dashboard](https://app.periscopedata.com/app/gitlab/522840/Monitor-GitLab.com-SMAU) (internal). <%= partial("direction/workflows", :locals => { :stageKey => "monitor" }) %> <%= partial("direction/categories", :locals => { :stageKey => "monitor" }) %> ## Prioritization Process We follow the same [prioritization guidelines](/handbook/product/product-management/process/#prioritization) as the product team at large. As noted above, in the short term the Monitor stage will be prioritizing ([video discussion](https://www.youtube.com/watch?v=nB5KDY4nsFg)) the following: * Enabling and Dogfooding full DevOps cycle Auto DevOps including Metrics, Tracing, Logging, Alerts and Incidents * [triage to minimal epic](https://gitlab.com/groups/gitlab-org/-/epics/2225) You can see our entire public backlog for Monitor at this [link](https://gitlab.com/groups/gitlab-org/-/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=Monitoring); filtering by labels or milestones will allow you to explore. If you find something you're interested in, you're encouraged to jump into the conversation and participate. At GitLab, everyone can contribute! Issues with the "direction" label have been flagged as being particularly interesting, and are listed in the section below. ## Upcoming Releases <%= direction["all"]["all"] %> <%= partial("direction/other", :locals => { :stage => "monitor" }) %>