--- layout: markdown_page title: "Category Direction - Metrics" --- - TOC {:toc} ## Metrics ### Introduction and how you can help Thanks for visiting this category strategy page on Metrics in GitLab. This category belongs to and is maintained by the [APM](/handbook/engineering/development/ops/monitor/APM/) group of the Monitor stage. Please share feedback directly via email, Twitter, or on a video call. If you're a GitLab user and have direct knowledge of your Metrics usage, we'd especially love to hear your use case(s). * [Maturity Plan](#maturity-plan) * [Related Issues](https://gitlab.com/groups/gitlab-org/-/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=group%3A%3Aapm&label_name[]=Category%3AMetrics) * [Monitor Stage Direction Page](/direction/monitor/) ## Background Metrics help users understand how their applications are performing, and if they are healthy. Examples of common metrics include response metrics like latency and error rate, system metrics like cpu and memory consumption, as well as any other type of telemetry desired. Actions and insights can then be derived from these metrics like setting Service Levels, Error Budgets, as well as triggering alerts. ## Target audience and experience Metrics are an important tool for all users across the DevOps spectrum. From pure developers who should understand the performance impact of changes they are making, as well as pure operators who are responsible for keeping production services online. The target workflow includes a few important use cases: 1. Configuring GitLab to monitor an application should be as easy as possible. To the degree possible given the environment, we should automate this activity for our users. 1. Dashboards should automatically populate with relevant metrics that were detected, however still offer flexibility to be customized as needed for a specific use case or application. The dashboards themselves should offer the visualizations required to best represent the data. 1. When troubleshooting, we should offer the ability to easily explore the data to help understand potential relationships and create/share one off dashboards. 1. Alerts should be easy to create, and provide a variety of notification options include Issues and third party services like Slack. It would also be great if we could provide some automatic detection of outliers/anomalies, and out of the box alerts based on best practices. 1. Service Level Objectives should be able to be defined, with corresponding impact on Error Budgets when they are not met. The experience today offers our users to deploy Prometheus instance into a project cluster quickly. Once deployed, it will automatically collect key metrics from the running application (% of error rate, latency, and throughput). If you already have a running Prometheus deploy, you can connect to an external Prometheus and presents metrics on charts within GitLab UI. ## What's Next & Why TThe APM team current focus is on [Dogfooding metrics](https://gitlab.com/gitlab-com/www-gitlab-com/-/issues/6508). The team is leading the initiative of migrating all the useful dashboards our infrastructure team is using for monitoring Gitlab.com from Grafana to GitLab metrics charts. Today the team was able to migrate 20 dashboards while doing so successfully we've identified [critical](https://gitlab.com/groups/gitlab-org/-/epics/2541) and [non-critical](https://gitlab.com/groups/gitlab-org/-/epics/2597) gaps. Those issues will enable our Infrastructure team to start using GitLab metrics charts instead of Grafana. This will initiate a feedback loop to improve our solution. ## Maturity Plan * [Critical gaps from Grafana](https://gitlab.com/groups/gitlab-org/-/epics/2541) * [non-critical gaps](https://gitlab.com/groups/gitlab-org/-/epics/2597) ## Competitive Landscape [Datadog](https://www.datadoghq.com/) and [New Relic](https://newrelic.com/) are the top two competitors in this space.