Request tracing subsystem

🌐 This document is available in both English and Ukrainian. Use the language toggle in the top right corner to switch between versions.

1. Overview

The Request Tracing Subsystem assists administrators and registry developers in tracking, profiling, and debugging in microservice-oriented distributed systems like the Registry Platform. It provides the ability to trace requests as they navigate through a complex architecture, allowing administrators and developers to understand how different Platform services interact and where there might be performance issues.

The Request Tracing Subsystem comprises several components: agents, collectors, a storage repository for traces, and a query service. Agents collect data from microservices and forward it to the collectors. Collectors store and index this data, and the query service facilitates the searching and visualization of this data in a user-friendly manner.

The subsystem allows for tracking individual requests across multiple Platform services and viewing detailed information about each request, such as response time, error rate, and service dependencies. This information can be utilized to identify performance bottlenecks, optimize service communication, and enhance the overall system reliability.

2. Subsystem Features

  • Visualization and analysis of tracing data, enabling developers to identify problems and optimize their systems quickly.

  • Display of request chain and individual request execution times.

  • The subsystem provides detailed metrics for tracing data, such as the number of collected spans, the number of discrete traces, and the number of errors encountered.

  • Searching request chains based on request identifiers (trace_id).

  • The subsystem features a user-friendly web interface for the visualization and analysis of tracing data, facilitating quick problem identification within the Platform.

A span represents a segment of code that measures the execution time of an operation and gathers information about it, such as an identifier, the start and end time of the operation, and tags that describe the operation’s context.

A trace is a sequence of spans that form a request structure. Each span has a unique identifier, thus allowing for the assembly of a tree of spans, displaying the sequence of operations and their dependencies during the request execution.

3. Technical Design

This diagram depicts the subsystem’s technical design.

distributed tracing subsystem.drawio

4. Subsystem сomponents

Component Name Namespace Deployment Origin Repository Purpose

Jaeger Operator

istio-system

jaeger-operator

3rd-party

gerrit:/mdtu-ddm/infrastructure/service-mesh

Auxiliary software that performs deployment, configuration, and recovery functions of Jaeger as a subsystem component.

Jaeger Collector

istio-system

jaeger-collector

3rd-party

Component designed for the collection, processing, and filtering of tracing data from various sources. It aggregates the data and ensures reliable storage in the required format for further analysis and visualization.

Query Service and Web Interface

istio-system

jaeger-query

3rd-party

Component provides a web interface for searching and filtering traces and their visualization and analysis.

5. Technology stack

The following technologies were used during the design and development of the subsystem:

6. Subsystem quality attributes

6.1. Observability

The Request tracing subsystem provides a comprehensive overview of the Platform’s behavior, allowing administrators and developers to monitor performance, detect anomalies, and identify potential operational issues.

6.2. Performance

The Request tracing subsystem offers unobtrusive instrumentation support for tracing and efficient data storage and indexing. This functionality enables high-performance monitoring of requests within the Platform without expending excessive computational resources.

6.3. Scalability

The Request tracing subsystem is designed considering large volumes of tracing data, supporting horizontal scalability and distributed data storage.

6.4. Security

An additional proxy component protects the user interface. In conjunction with the centralized Users and roles management subsystem, it is responsible for granting access and segregating rights. By default, users are granted minimal rights necessary to accomplish their tasks.

A mechanism is present to restrict access to the interface, which minimizes the subsystem’s attack surface from the outside.

All events within the subsystem are logged and available for further analysis when needed by the Event logging subsystem.

The subsystem does not have access to sensitive data and does not store it.