Table of Contents
Systems Performance Glossary
Return to Web Performance Glossary, JVM Performance Glossary, Java Performance Glossary, Golang Performance Glossary, Rust Performance Glossary, Python Performance Glossary, JavaScript Performance Glossary, CPP Performance Glossary, Network Performance Glossary, Database Performance Glossary, Storage Performance Glossary, Linux Performance Glossary, Windows Server Performance Glossary, macOS Performance Glossary, Glossaries, Systems Performance, 2nd Edition, Performance Bibliography, Systems Performance, Performance DevOps, IT Bibliography, DevOps Bibliography
“ (SysPrfBGrg 2021)
CPU Utilization refers to the percentage of processing power being used by the system at any given time. It measures how much of the CPU's capacity is being used by processes, helping to determine if the system is under- or over-utilized. https://en.wikipedia.org/wiki/CPU_utilization
I/O Wait is the time a CPU spends idle while waiting for input/output operations to complete, such as disk reads and writes. High I/O Wait times can indicate a bottleneck in disk or network subsystems, impacting overall system performance. https://en.wikipedia.org/wiki/Input/output_wait
Load Average is a measure of the system's workload over a period of time, typically reported over 1, 5, and 15 minutes. It indicates the number of processes actively demanding CPU resources or waiting for disk access, providing insight into system load and potential performance issues. https://en.wikipedia.org/wiki/Load_(computing)#Unix-style_load_calculation
Throughput is the rate at which data is processed by a system or network. It measures the amount of data successfully transmitted or processed per unit of time, often in bits per second (bps) for networks or operations per second for disks. https://en.wikipedia.org/wiki/Throughput
Latency refers to the time delay between the initiation of an action, such as a request, and its completion. In systems performance, latency can refer to delays in network communication, disk access, or CPU processing, and it directly impacts the responsiveness of applications. https://en.wikipedia.org/wiki/Latency_(engineering)
Cache Miss occurs when the data requested by the CPU is not found in the CPU cache, forcing it to fetch data from the main memory. High levels of cache misses increase memory access time, negatively affecting system performance. https://en.wikipedia.org/wiki/CPU_cache#Cache_miss
Disk I/O refers to the read and write operations performed on a storage device. Monitoring disk I/O is crucial for understanding how storage subsystems are affecting overall performance, as excessive I/O can cause delays and bottlenecks in processing. https://en.wikipedia.org/wiki/Input/output
Page Fault occurs when a program tries to access a portion of memory that is not currently in RAM and must be fetched from disk. Page Faults can lead to increased I/O and system slowdowns if they occur frequently, especially if the system is relying on swap space. https://en.wikipedia.org/wiki/Page_fault
Swap Usage refers to the amount of data being transferred between RAM and swap space (a portion of disk used as virtual memory). Excessive use of swap indicates that the system is running out of physical memory, which can significantly degrade performance. https://en.wikipedia.org/wiki/Paging#Swap
Context Switch is the process by which the CPU switches from executing one process to another. A high number of context switches can indicate excessive multitasking or resource contention, which can negatively impact system performance. https://en.wikipedia.org/wiki/Context_switch
Memory Utilization measures the amount of RAM being used by processes on the system. Monitoring memory utilization helps to identify if the system is using physical memory efficiently or if it is under memory pressure, leading to performance degradation. https://en.wikipedia.org/wiki/Computer_memory
Network Bandwidth refers to the maximum rate at which data can be transmitted over a network connection. Monitoring network bandwidth usage is essential for identifying bottlenecks in network performance and ensuring that available bandwidth is being used efficiently. https://en.wikipedia.org/wiki/Bandwidth_(computing)
Jitter is the variation in the time it takes for packets to be transmitted across a network. High levels of jitter can cause inconsistencies in data flow, leading to degraded performance in real-time applications like video streaming and voice-over-IP (VoIP). https://en.wikipedia.org/wiki/Jitter
System Call is a mechanism by which programs request services from the operating system's kernel. Monitoring the frequency and duration of system calls can provide insights into how applications are interacting with the system and where performance bottlenecks may occur. https://en.wikipedia.org/wiki/System_call
Throughput vs Latency Trade-off describes the relationship between maximizing the amount of data processed per second (throughput) and minimizing the time taken to complete individual operations (latency). Optimizing system performance often involves balancing these two factors based on workload needs. https://en.wikipedia.org/wiki/Latency_(engineering)#Throughput-vs-Latency_Tradeoff
Disk Queue Length refers to the number of read and write operations waiting to be processed by the disk. A long disk queue length indicates that the disk subsystem is overwhelmed, leading to performance bottlenecks, especially during intensive I/O operations. https://en.wikipedia.org/wiki/Disk_scheduling
Heap Usage refers to the memory allocated dynamically by applications during runtime, often managed by the heap in memory. Monitoring heap usage helps identify memory leaks or excessive memory consumption, which can affect the performance of applications and the system as a whole. https://en.wikipedia.org/wiki/Heap_(data_structure)
Thread Contention occurs when multiple threads are trying to access shared resources, such as CPU or memory, at the same time. This can lead to context switches and waiting, reducing overall system performance due to lock contention or thread scheduling delays. https://en.wikipedia.org/wiki/Thread_(computing)
IOPS (Input/Output Operations Per Second) is a performance measurement used to evaluate the speed of storage devices. IOPS is crucial in understanding how quickly a storage system can process multiple read and write commands, particularly in environments with high I/O demand. https://en.wikipedia.org/wiki/IOPS
Garbage Collection refers to the process of automatically reclaiming memory that is no longer in use by a program. In systems with languages that use automatic memory management, such as Java or C, excessive garbage collection can cause delays and performance issues. https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
CPU Throttling occurs when the system reduces the CPU clock speed to prevent overheating or conserve power. This can lead to decreased system performance, particularly during periods of high demand when the CPU cannot operate at full capacity. https://en.wikipedia.org/wiki/Dynamic_frequency_scaling
Disk Latency refers to the time it takes for a read or write operation to complete on a storage device. High disk latency can indicate slow performance in the storage subsystem, leading to longer wait times for data access and reduced system responsiveness. https://en.wikipedia.org/wiki/Latency_(engineering)#Disk
Kernel Mode refers to the privileged mode of operation in which the operating system kernel executes. Monitoring time spent in kernel mode can help identify whether system performance issues are related to excessive system-level processing, such as handling interrupts or system calls. https://en.wikipedia.org/wiki/Kernel_(operating_system)#Kernel_mode
User Mode is the non-privileged mode of operation where application code runs. Monitoring time spent in user mode versus kernel mode helps distinguish between application-level and system-level performance bottlenecks, aiding in more targeted optimizations. https://en.wikipedia.org/wiki/Execution_mode#User_mode
NUMA (Non-Uniform Memory Access) is a memory architecture used in modern multiprocessor systems. NUMA affects how memory is accessed by processors, and poor NUMA configuration can lead to performance degradation due to increased memory access times. https://en.wikipedia.org/wiki/Non-uniform_memory_access
Interrupt Request (IRQ) refers to a signal sent to the CPU to gain its attention for processing hardware events, such as keyboard input or network packet arrival. High levels of interrupt activity can consume significant CPU time, leading to performance bottlenecks if not managed properly. https://en.wikipedia.org/wiki/Interrupt_request_(PC_architecture)
TCP Retransmission occurs when data packets fail to reach their destination and must be sent again. High levels of TCP retransmissions indicate network congestion or errors, which can significantly degrade network performance and increase latency. https://en.wikipedia.org/wiki/TCP_congestion_control
Memory Paging is the process of moving data between RAM and disk-based swap space. Frequent paging indicates that the system is running low on physical memory, leading to slower performance as the CPU waits for data to be fetched from disk rather than directly from memory. https://en.wikipedia.org/wiki/Paging
Buffer Bloat is the excessive buffering of packets in a network, which can increase latency and reduce performance. It occurs when buffers in networking devices hold too much data, causing delays in packet transmission and impacting real-time applications like voice or video. https://en.wikipedia.org/wiki/Bufferbloat
Overcommitment refers to allocating more resources, such as CPU or RAM, than are physically available, under the assumption that not all allocated resources will be used simultaneously. While this can optimize resource usage, overcommitment can lead to performance degradation if demand exceeds available capacity. https://en.wikipedia.org/wiki/Memory_overcommitment
SoftIRQ is a type of interrupt used in Linux systems to handle lower-priority tasks that can be deferred until later. High levels of SoftIRQ activity can consume significant CPU resources, affecting overall system performance by delaying the processing of critical tasks. https://en.wikipedia.org/wiki/Interrupt#Software_interrupt
Dirty Pages occur when pages in memory have been modified but not yet written back to disk. If too many dirty pages accumulate, it can lead to performance degradation as the system becomes overwhelmed with flushing the pages to storage. Monitoring the number of dirty pages helps in tuning memory and I/O performance. https://en.wikipedia.org/wiki/Page_(computer_memory)
Hyper-Threading is a technology that allows a single physical CPU core to appear as two virtual cores to the operating system, improving parallel processing capabilities. While hyper-threading can enhance performance, it can also lead to contention for shared resources, reducing efficiency in some workloads. https://en.wikipedia.org/wiki/Hyper-threading
Swapiness is a Linux kernel parameter that controls the tendency of the system to use swap space. Adjusting the swapiness value can influence the balance between keeping data in RAM or moving it to swap, optimizing memory usage for different performance requirements. https://en.wikipedia.org/wiki/Swappiness
Out-of-Memory (OOM) Killer is a process in Linux that terminates programs when the system runs out of RAM to prevent the system from crashing. Understanding when the OOM killer is triggered can help identify memory leaks or inefficient memory usage in applications. https://en.wikipedia.org/wiki/Out_of_memory
Inode is a data structure used by file systems to store metadata about files and directories. If the system runs out of available inodes, it can no longer create new files, even if there is disk space available, leading to system errors and reduced performance. https://en.wikipedia.org/wiki/Inode
Block Size refers to the smallest unit of data that can be read from or written to disk in a file system. Optimizing block size based on workload (e.g., large files vs. small files) can significantly impact disk I/O performance. https://en.wikipedia.org/wiki/Data_block
Affinity is a technique used to bind processes or threads to specific CPU cores. Setting CPU affinity can improve performance by reducing context switches and ensuring that processes run on the same core, taking advantage of cache locality. https://en.wikipedia.org/wiki/Processor_affinity
NFS Latency refers to the delays encountered in accessing files over the Network File System (NFS). High NFS latency can result from network issues or overloaded servers, and it can severely affect performance in environments that rely on networked storage. https://en.wikipedia.org/wiki/Network_File_System
Perf is a performance analysis tool for Linux that provides detailed information about CPU usage, cache behavior, and system calls. It is widely used for identifying performance bottlenecks in both kernel and user-space applications. https://en.wikipedia.org/wiki/Perf
Network Latency refers to the time it takes for a data packet to travel from its source to its destination over a network. High network latency can lead to slower application response times, particularly for real-time services such as video streaming or VoIP communications. https://en.wikipedia.org/wiki/Latency_(engineering)
Forking is the process by which a running program creates a copy of itself in Unix and Linux systems. While necessary for multitasking, excessive forking can consume significant CPU and memory resources, potentially leading to performance issues, particularly in high-load environments. https://en.wikipedia.org/wiki/Fork_(system_call)
NUMA Balancing is a kernel feature that automatically migrates processes and memory pages between nodes to optimize memory access in Non-Uniform Memory Access (NUMA) systems. Proper NUMA balancing is critical for performance in large, multi-core systems. https://en.wikipedia.org/wiki/Non-uniform_memory_access#NUMA_balancing
Throttle is the deliberate reduction of CPU or network speed to manage performance, power consumption, or heat. It is used in systems where overheating or power efficiency is a concern but can cause slower response times or reduced system throughput. https://en.wikipedia.org/wiki/Throttling_process_(computing)
Direct Memory Access (DMA) allows certain hardware subsystems to access RAM independently of the CPU, speeding up data transfers between devices and memory. Inefficient use of DMA can lead to bottlenecks, particularly in systems with high I/O demand. https://en.wikipedia.org/wiki/Direct_memory_access
Cache Coherency refers to the consistency of data stored in local caches of a multi-core processor system. Ensuring cache coherency is essential for the correct execution of parallel programs, and poor management can lead to performance issues due to cache invalidation and stale data. https://en.wikipedia.org/wiki/Cache_coherence
Transparent Huge Pages (THP) is a memory management feature that automatically allocates large memory pages to improve TLB efficiency. While THP can enhance memory access performance, it may also cause latency spikes during memory allocation and compaction. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/performance_tuning_guide/memory-optimize-thp
Idle CPU States (C-states) are power-saving modes used by processors when idle. While entering deeper C-states can save energy, frequent transitions between idle and active states can cause latency, impacting overall system performance. https://en.wikipedia.org/wiki/C-state
Interrupt Coalescing is a technique used in networking and storage devices to reduce the number of interrupts generated by bundling multiple events into a single interrupt. Properly configured interrupt coalescing can reduce CPU overhead, but excessive coalescing can increase latency. https://en.wikipedia.org/wiki/Interrupt_coalescing
Pipeline Stall occurs when a processor must wait for data or instructions, delaying the execution of subsequent operations. Pipeline stalls can result from cache misses, branch misprediction, or data dependencies, all of which reduce overall CPU throughput. https://en.wikipedia.org/wiki/Classic_RISC_pipeline
Cache Line refers to the smallest unit of data that can be transferred between CPU cache and main memory. Optimizing cache line usage is critical for performance, as poor alignment or excessive cache line transfers can lead to cache misses and reduced performance. https://en.wikipedia.org/wiki/Cache_line
Instruction-Level Parallelism (ILP) is a measure of how many instructions a CPU can execute simultaneously. High ILP improves CPU performance by utilizing multiple execution units within the processor, but achieving optimal ILP often depends on the specific workload and compiler optimizations. https://en.wikipedia.org/wiki/Instruction-level_parallelism
Read-Ahead is a file system optimization technique that pre-fetches data into memory before it is requested by an application. Read-ahead can improve performance for sequential read operations but may cause cache pollution if the data is not used, potentially leading to wasted I/O bandwidth. https://en.wikipedia.org/wiki/Prefetching
TLB (Translation Lookaside Buffer) is a specialized cache used by the CPU to speed up virtual-to-physical memory address translation. TLB misses can significantly degrade system performance as they require additional memory accesses to resolve, leading to longer memory access times. https://en.wikipedia.org/wiki/Translation_lookaside_buffer
Branch Prediction is a technique used by modern processors to guess the outcome of conditional operations (such as if-else statements) to avoid pipeline stalls. Incorrect branch predictions lead to branch mispredictions, which can severely impact performance by causing instruction re-execution. https://en.wikipedia.org/wiki/Branch_predictor
False Sharing occurs when multiple threads on different CPU cores access different variables that reside on the same cache line, causing unnecessary invalidation and cache coherence traffic. False sharing can lead to performance degradation, especially in multi-threaded applications. https://en.wikipedia.org/wiki/False_sharing
Spinlock is a low-level synchronization primitive used in multi-threaded programming to protect shared resources. Unlike traditional mutexes, spinlocks repeatedly check a condition in a tight loop (spinning) until the resource becomes available, which can waste CPU cycles and degrade performance under contention. https://en.wikipedia.org/wiki/Spinlock
NUMA Node is a grouping of CPU cores and memory in systems with Non-Uniform Memory Access (NUMA). Optimizing workloads to stay within the same NUMA node (locality) improves memory access times, whereas cross-node memory access can lead to latency and reduced performance. https://en.wikipedia.org/wiki/Non-uniform_memory_access#NUMA_nodes
DRAM Refresh is the process of periodically recharging DRAM cells to preserve the data stored in them. DRAM refresh cycles consume memory bandwidth and CPU time, and excessive refresh activity can lead to performance degradation, especially in memory-bound workloads. https://en.wikipedia.org/wiki/Dynamic_random-access_memory#Refreshing
Adaptive Locking is a technique that dynamically adjusts the behavior of locks (such as mutexes) based on contention levels. By optimizing how locks are handled during periods of low or high contention, adaptive locking can reduce the performance overhead associated with synchronization in multi-threaded environments. https://en.wikipedia.org/wiki/Lock_(computer_science)
Micro-Op Cache is a specialized cache in modern processors that stores decoded instructions, allowing the CPU to bypass the instruction decoding phase during re-execution. Efficient use of the micro-op cache can improve CPU performance by reducing the overhead of instruction decoding. https://en.wikipedia.org/wiki/Micro-operation
Prefetching is a technique used by CPUs and storage devices to load data into memory before it is actually needed by the program, improving access times for sequential data. However, excessive or incorrect prefetching can waste memory and bandwidth, reducing overall system performance. https://en.wikipedia.org/wiki/Prefetching
Cycle Per Instruction (CPI) is a performance metric that measures the average number of CPU cycles used to execute each instruction. Lower CPI values indicate better performance, but achieving optimal CPI depends on factors such as instruction-level parallelism and cache efficiency. https://en.wikipedia.org/wiki/Instructions_per_cycle
Direct I/O bypasses the file system cache and directly transfers data between user space and storage devices. While direct I/O reduces the overhead of file system caching, it can lead to performance degradation if applications are not optimized for it, especially in systems with slow storage devices. https://en.wikipedia.org/wiki/Input/output#Direct_IO
Thread Migration occurs when a CPU scheduler moves a thread from one CPU core to another. Although necessary in multi-core systems, excessive thread migration can lead to performance loss due to cache misses and reduced CPU locality, as the thread must reload its data into the new core’s cache. https://en.wikipedia.org/wiki/Thread_migration
Instruction Cache (L1 I-Cache) is a small, fast memory within the CPU that stores instructions. If the instruction needed by the CPU is already in the I-Cache, it can be executed quickly, but instruction cache misses result in delays as the data must be fetched from higher levels of memory. https://en.wikipedia.org/wiki/CPU_cache
I/O Scheduler is responsible for determining the order in which I/O requests are processed by the storage subsystem. Different I/O schedulers (e.g., CFQ, NOOP, Deadline) can be used to optimize performance depending on workload characteristics like I/O intensity or request size. https://en.wikipedia.org/wiki/IO_scheduling
Out-of-Order Execution is a technique used by modern processors to execute instructions in an order different from the one in which they appear in the code. This helps to optimize the use of CPU resources by minimizing idle time, but can cause performance issues if data dependencies are not handled correctly. https://en.wikipedia.org/wiki/Out-of-order_execution
Retpoline is a software-based mitigation technique used to protect against speculative execution vulnerabilities like Spectre. While Retpoline improves security, it can introduce performance overhead by limiting the CPU's ability to optimize speculative execution. https://en.wikipedia.org/wiki/Retpoline
Demand Paging is a memory management technique where pages of data are loaded from disk into RAM only when they are needed by an application. While this conserves memory, frequent page faults caused by excessive demand paging can degrade system performance, especially in memory-intensive workloads. https://en.wikipedia.org/wiki/Paging
Hardware Interrupt is a signal sent to the CPU by a hardware device indicating that it needs processing. Managing hardware interrupts efficiently is crucial for performance, as excessive interrupts can consume significant CPU resources, leading to system slowdowns, particularly in high I/O environments. https://en.wikipedia.org/wiki/Interrupt
Miss Rate is the percentage of requests that fail to be served from a cache (such as CPU cache, memory cache, or disk cache) and must be retrieved from a lower level in the memory hierarchy. A high miss rate indicates poor cache performance, increasing latency and reducing throughput. https://en.wikipedia.org/wiki/Cache_miss
TLB Shootdown is a performance issue that occurs in multiprocessor systems when Translation Lookaside Buffers (TLBs) need to be flushed across all processors to maintain consistency. Frequent TLB shootdowns can introduce delays, particularly in virtualized environments, as they increase memory access times. https://en.wikipedia.org/wiki/TLB_shootdown
Spinlock Contention occurs when multiple threads or processes repeatedly check a spinlock to gain access to a shared resource. High levels of spinlock contention can waste CPU time and lead to performance bottlenecks, particularly in multi-core systems where synchronization is frequent. https://en.wikipedia.org/wiki/Spinlock
Saturation refers to the point at which a system resource, such as CPU, memory, or network bandwidth, is fully utilized and unable to handle additional load without degrading performance. Identifying and addressing saturation points is key to optimizing system performance. https://en.wikipedia.org/wiki/Saturation_(signal_processing)
Write Amplification is a phenomenon where the amount of data written to storage (particularly in SSDs) is greater than the amount of data initially intended to be written. High levels of write amplification reduce the lifespan and performance of storage devices, especially in environments with frequent write operations. https://en.wikipedia.org/wiki/Write_amplification
Cache Thrashing occurs when the CPU spends more time loading and evicting data from the cache than executing instructions, often due to poor cache management or excessive cache misses. Cache thrashing can severely degrade system performance by overwhelming the memory hierarchy. https://en.wikipedia.org/wiki/Thrashing_(computer_science)#Cache_thrashing
Instruction Cache Miss occurs when the CPU fails to find the needed instruction in the instruction cache and must fetch it from a higher level of memory, such as L2 cache or main memory. High instruction cache miss rates can significantly slow down program execution. https://en.wikipedia.org/wiki/CPU_cache
Multithreading allows multiple threads to be executed concurrently on a single processor core, improving overall performance by better utilizing CPU resources. However, multithreading can introduce complexity in managing shared resources and lead to issues like thread contention and race conditions. https://en.wikipedia.org/wiki/Multithreading_(computer_architecture)
Write-Back Cache is a caching mechanism where data is written to the cache first and later transferred to the storage device. While write-back cache improves write performance, it introduces a risk of data loss if the cache is not written to disk in time, especially during system crashes or power failures. https://en.wikipedia.org/wiki/Cache
Soft Lockup occurs when a CPU core fails to return control to the kernel within a specified time due to a long-running process or excessive resource contention. Soft lockups can degrade system performance and may indicate issues such as deadlocks or high CPU usage from a single thread. https://access.redhat.com/solutions/15258
Page Cache is a mechanism in Linux where frequently accessed files are cached in memory, reducing the need for disk access. While page cache improves performance by speeding up file reads, excessive caching can lead to memory pressure if not managed correctly. https://en.wikipedia.org/wiki/Page_cache
Microbenchmarking is the practice of running small, focused benchmarks that measure the performance of individual components, such as CPU operations, memory access, or I/O tasks. While useful for understanding specific performance characteristics, microbenchmarking results may not reflect real-world system performance. https://en.wikipedia.org/wiki/Benchmark_(computing)
Kernel Same-Page Merging (KSM) is a memory-saving feature used in virtualized environments that identifies identical memory pages between virtual machines and merges them into a single copy. While KSM improves memory efficiency, it may introduce performance overhead during the merging process. https://en.wikipedia.org/wiki/Kernel_same-page_merging
Priority Queue is a data structure used by operating systems to schedule processes based on priority levels. Tasks with higher priority are processed first, but mismanagement of the priority queue can lead to issues such as priority inversion or system starvation of lower-priority processes. https://en.wikipedia.org/wiki/Priority_queue
I/O Bound refers to a situation where the performance of a system is limited by input/output operations rather than CPU processing power. I/O bound workloads can be improved by optimizing disk access, network communication, or reducing I/O wait times. https://en.wikipedia.org/wiki/I/O_bound
Page Fault Frequency measures the rate at which page faults occur in a system. High page fault frequency can indicate that a system is running out of RAM, forcing it to swap data in and out of disk storage, which can lead to degraded system performance. https://en.wikipedia.org/wiki/Page_fault
Tickless Kernel is a Linux kernel feature that eliminates the need for periodic timer interrupts during idle states, reducing CPU power consumption and improving performance in systems where power efficiency is critical. Tickless kernel operation helps minimize unnecessary context switches. https://en.wikipedia.org/wiki/Tickless_kernel
Readahead is a performance optimization technique that pre-loads blocks of data into memory before they are needed by applications. By reading data in advance, readahead reduces I/O wait times for sequential access patterns, but can lead to inefficiencies in random access workloads. https://en.wikipedia.org/wiki/Readahead
Write Combining is a technique used in memory systems where multiple small writes to adjacent memory addresses are combined into a single, larger write operation. Write combining can improve performance by reducing memory access latency, particularly in systems with high I/O workloads. https://en.wikipedia.org/wiki/Write_combining
A
- adaptive mutex - “A mutex (mutual exclusion) synchronization lock type. See Chapter 5, Applications, Section 5.2.5, Concurrency and Parallelism.” (SysPrfBGrg 2021)
- associative array - “A data type for programming languages where values are referenced by an arbitrary key or multiple keys.” (SysPrfBGrg 2021)
- AT&T - “The American Telephone and Telegraph Company, which included Bell Laboratories, where Unix was developed.” (SysPrfBGrg 2021)
B
- back end - “Refers to data storage and infrastructure components. A web server is back-end software. See front end.” (SysPrfBGrg 2021)
- BCC - “BPF compiler collection. BCC is a project that includes a BPF compiler framework, as well as many BPF performance tools. See Chapter 15.” (SysPrfBGrg 2021)
- benchmark - “In computing, a benchmark is a tool that performs a workload experiment and measures its performance: the benchmark result. These are commonly used for evaluating the performance of different options.” (SysPrfBGrg 2021)
- BIOS - “Basic Input/Output System: firmware used to initialize computer hardware and manage the booting process.” (SysPrfBGrg 2021)
- BPF - “Berkeley Packet Filter: a lightweight in-kernel technology from 1992 created to improve the performance of packet filtering. Since 2014 it has been extended to become a general-purpose execution environment (see eBPF).” (SysPrfBGrg 2021)
C
- C - The C programming language.
- cache warmth - “See Hot, Cold, and Warm Caches in Section 2.3.14, Caching, in Chapter 2, Methodologies.” (SysPrfBGrg 2021)
- client - “A consumer of a network service, referring to either the client host or the client application.” (SysPrfBGrg 2021)
- concurrency - See Section 5.2.5, Concurrency and Parallelism, in Chapter 5, Applications.“ (SysPrfBGrg 2021)
- core - “An execution pipeline on a processor. These may be exposed on an OS as single CPUs, or via hyperthreads as multiple CPUs.” (SysPrfBGrg 2021)
- CPU - ”Central processing unit. This term refers to the set of functional units that execute instructions, including the registers and arithmetic logic unit (ALU). It is now often used to refer to either the processor or a virtual CPU.“ (SysPrfBGrg 2021)
- CPU cross call - “A call by a CPU to request work from others on a multi-CPU system. Cross calls may be made for system-wide events such as CPU cache coherency calls. See Chapter 6, CPUs. Linux terms these “SMP calls.”” (SysPrfBGrg 2021)
- CPU cycle - “A unit of time based on the clock rate of the processor: for 2 GHz, each cycle is 0.5 ns. A cycle itself is an electrical signal, the rising or falling of voltage, used to trigger digital logic.” (SysPrfBGrg 2021)
D
- debuginfo file - “A symbol and debug information file, used by debuggers and profilers.” (SysPrfBGrg 2021)
- disk controller - “A component that manages directly attached disks, making them accessible to the system, either directly or mapped as virtual disks. Disk controllers may be built into the system main board, included as expansion cards, or built into storage arrays. They support one or more storage interface types (e.g., SCSI, SATA, SAS) and are also commonly called host bus adaptors (HBAs), along with the interface type, for example, SAS HBA.” (SysPrfBGrg 2021)
- DRAM - “Dynamic random-access memory, a type of volatile memory in common use as main memory.” (SysPrfBGrg 2021)
- dynamic instrumentation - “Dynamic instrumentation or dynamic tracing is a technology that can instrument any software event, including function calls and returns, by live modification of instruction text and the insertion of temporary tracing instructions.” (SysPrfBGrg 2021)
- dynamic tracing - “This can refer to the software that implements dynamic instrumentation.” (SysPrfBGrg 2021)
E
- eBPF - “Extended BPF (see BPF). The eBPF abbreviation originally described the extended BPF from 2014, which updated the register size, instruction set, added map storage, and limited kernel calls. By 2015, it was decided to call eBPF just BPF.” (SysPrfBGrg 2021)
- ELF - “Executable and Linkable Format: a common file format for executable programs.” (SysPrfBGrg 2021)
- errno - “A variable containing the last error as a number following a standard (POSIX.1-2001).” (SysPrfBGrg 2021)
- Ethernet - “A set of standards for networking at the physical and data link layers.” (SysPrfBGrg 2021)
- expander card - “A physical device (card) connected to the system, usually to provide an additional I/O controller.” (SysPrfBGrg 2021)
F
- file descriptor - “An identifier for a program to use in referencing an open file.” (SysPrfBGrg 2021)
- flame graph - “A visualization for a set of stack traces. See Chapter 2, Methodologies.” (SysPrfBGrg 2021)
- FPGA - “Field-programmable gate array. A reprogrammable integrated circuit used in computing to typically accelerate a specific operation.” (SysPrfBGrg 2021)
- frame - “A message at the data link layer of the OSI networking model (see Section 10.2.3, Protocol Stack).” (SysPrfBGrg 2021)
- front end - “Refers to end-user interface and presentation software. A web application is front-end software. See back end.” (SysPrfBGrg 2021)
G
- GPU - Graphics processing unit. These can be used for other workloads as well, such as machine learning.“ (SysPrfBGrg 2021)
H
- HDD - Hard disk drive, a rotational magnetic storage device. See Chapter 9, Disks.” (SysPrfBGrg 2021)
- hit ratio - “Often used to describe cache performance: the ratio of cache hits versus hits plus misses, usually expressed as a percentage. Higher is better.” (SysPrfBGrg 2021)
- hyperthread - ”Intel’s implementation of SMT. This is a technology for scaling CPUs, allowing the OS to create multiple virtual CPUs for one core and schedule work on them, which the processor attempt to process in parallel.“ (SysPrfBGrg 2021)
I
- ICMP - ”Internet Control Message Protocol. Used by ping(1) (ICMP echo request/reply).“ (SysPrfBGrg 2021)
- I/O - Input/output.
- IO Visor - “The Linux Foundation project that hosts the bcc and bpftrace repositories on GitHub, and facilitates collaboration among BPF developers at different companies.” (SysPrfBGrg 2021)
- IP - ”Internet Protocol. Main versions are IPv4 and IPv6. See Chapter 10, Network.“ (SysPrfBGrg 2021)
- IPC - “Either means: instructions per cycle, a low-level CPU performance metric, or inter-process communication, a means for processes to exchange data. Sockets]] are an inter-process communication mechanism.” (SysPrfBGrg 2021)
K
- kernel - “The core program on a system that runs in privileged mode to manage resources and user-level processes.” (SysPrfBGrg 2021)
L
- latency - Time spent waiting. In computing performance it is often used to describe resource I/O time. Latency is important for performance analysis, because it is often the most effective measure of a performance issue. Where exactly it is measured can be ambiguous without further qualifiers. For example, “disk latency” could mean time spent waiting on a disk driver queue only or, from an application, it could mean the entire time waiting for disk I/O to complete, both queued and service time. Latency is limited by a lower bound, bandwidth by an upper bound.” (SysPrfBGrg 2021)
- LRU - Least recently used. See Section 2.3.14, Caching, in Chapter 2, Methodologies.“ (SysPrfBGrg 2021)
M
- main board - “The circuit board that houses the processors and system interconnect; also called the system board.” (SysPrfBGrg 2021)
- main memory - “The primary memory storage of a system, usually implemented as DRAM.” (SysPrfBGrg 2021)
- major fault - “A memory access fault that was serviced from storage devices (disks). See Chapter 3, Operating Systems.” (SysPrfBGrg 2021)
- malloc - ”Memory allocate. This usually refers to the function performing allocation.“ (SysPrfBGrg 2021)
- Mbytes - “Megabytes. The International System of Units (SI) defines a megabyte as 1000000 bytes, but in computing a megabyte is typically 1048576 bytes (which SI terms a mebibyte). Throughout this book, tools that report Mbytes are usually using the definition of 1048576 (220) bytes.” (SysPrfBGrg 2021)
- minor fault - “A memory access fault that was serviced from main memory. See Chapter 3, Operating Systems.” (SysPrfBGrg 2021)
- MMU - ”Memory management unit. This is responsible for presenting memory to a CPU and for performing virtual-to-physical address translation.“ (SysPrfBGrg 2021)
- mutex - “A mutual exclusion lock. They can become a source of performance bottlenecks, and are often investigated for performance problems. See Chapter 5, Applications.” (SysPrfBGrg 2021)
N
- NVMe - ”Non-Volatile Memory express: a PCIe bus specification for storage devices.“ (SysPrfBGrg 2021)
O
- operation rate Operations per interval (e.g., operations per second), which may include non-I/O operations.“ (SysPrfBGrg 2021)
- OS Operating System. The collection of software including the kernel for managing resources and user-level processes.” (SysPrfBGrg 2021)
P
- packet A network message at the network layer of the OSI networking model (see Section 10.2.3).“ (SysPrfBGrg 2021)
- pagefault A system trap that occurs when a program references a memory location where the virtual memory is not currently mapped to a physical backing page. This is a normal consequence of an on-[[demand allocation memory model.“ (SysPrfBGrg 2021)
- pagein/pageout Functions performed by an operating system (kernel) to move chunks of memory (pages) to and from external storage devices.” (SysPrfBGrg 2021)
- parallel See Section 5.2.5, Concurrency and Parallelism, in Chapter 5, Applications.“ (SysPrfBGrg 2021)
- PCIe Peripheral Component Interconnect Express: a bus standard commonly used for storage and network controllers.” (SysPrfBGrg 2021)
- PDP Programmed Data Processor, a mini[[computer series made by Digital Equipment Corporation (DEC).“ (SysPrfBGrg 2021)
- PEBS Precise event-based sampling (aka processor event-based sampling), an Intel processor technology for use with PMCs to provide more precise recording of CPU state during events.” (SysPrfBGrg 2021)
- Performance engineer A technical staff member who works primarily on computer performance: planning, evaluations, analysis, and improvements. See Chapter 1, Introduction, Section 1.3, Activities.“ (SysPrfBGrg 2021)
- PID Process identifier. The operating system unique numeric identifier for a process.” (SysPrfBGrg 2021)
- PMCs Performance Monitoring Counters: special hardware registers on the processor that can be programmed to instrument low-level CPU events: cycles, stall cycles, instructions, memory loads/stores, etc.“ (SysPrfBGrg 2021)
- POSIX Portable Operating System Interface, a family of related standards managed by the IEEE to define a Unix API. This includes a file system interface as used by applications, provided via system calls or system libraries built upon system calls.” (SysPrfBGrg 2021)
- production A term used in technology to describe the workload of real customer requests, and the environment that processes it. Many companies also have a “test” environment with synthetic workloads for testing things before production deployment.“ (SysPrfBGrg 2021)
- profiling A technique to collect data that characterizes the performance of a target. A common profiling technique is timed sampling (see sampling).” (SysPrfBGrg 2021)
- PSI Linux pressure stall information, used for identifying performance issues caused by resources.“ (SysPrfBGrg 2021)
R
- registers Small storage locations on a CPU, used directly from CPU instructions for data processing.” (SysPrfBGrg 2021)
- RFC Request For Comments: a public document by the Internet Engineering Task Force (IETF) to share networking standards and best practices. RFCs are used to define networking protocols: RFC 793 defines the TCP protocol.” (SysPrfBGrg 2021)
S
- sampling - “An observability method for understanding a target by taking a subset of measurements: a sample.” (SysPrfBGrg 2021)
- script - “In computing, an executable program, usually short and in a high-level language.” (SysPrfBGrg 2021)
- SCSI - “Small Computer System Interface. An interface standard for storage devices.” (SysPrfBGrg 2021)
- segment - “A message at the transport layer of the OSI networking model (see Section 10.2.3, Protocol Stack).” (SysPrfBGrg 2021)
- SMP - “Symmetric multiprocessing, a multiprocessor architecture where multiple similar CPUs share the same main memory.” (SysPrfBGrg 2021)
- SMT - “Simultaneous multithreading, a processor feature to run multiple thread]]s on cores. See hyperthread.” (SysPrfBGrg 2021)
- socket - “A software abstraction representing a network endpoint for communication.” (SysPrfBGrg 2021)
- Solaris - “A Unix-derived operating system originally developed by Sun Microsystems, it was known for scalability and reliability, and was popular in enterprise environments. Since the acquisition of Sun by Oracle Corporation, it has been renamed Oracle Solaris.” (SysPrfBGrg 2021)
- SONET - “Synchronous optical networking, a physical layer protocol for optical fibers.” (SysPrfBGrg 2021)
- SRE - “Site reliability engineer: a technical staff member focused]] on infrastructure and reliability. SREs work on performance as part of incident response, under short time constraints.” (SysPrfBGrg 2021)
- SSD - “Solid-state drive, a storage device typically based on flash memory]]. See Chapter 9, Disks.” (SysPrfBGrg 2021)
- stack - “In the context of observability tools, stack is usually short for “stack trace.”” (SysPrfBGrg 2021)
- stack frame - “A data structure containing function state information, including the return address and function arguments.” (SysPrfBGrg 2021)
- stack trace - “A call stack composed of multiple stack frames spanning the code path ancestry. These are often inspected as part of performance analysis, particularly CPU profiling.” (SysPrfBGrg 2021)
- static instrumentation/tracing - “Instrumentation of software with precompiled probe points. See Chapter 4, Observability Tools.” (SysPrfBGrg 2021)
- storage array - A collection of disks housed in an enclosure, which can then be attached to a system. Storage arrays typically provide various features to improve disk reliability and performance.“ (SysPrfBGrg 2021)
- system call - “The interface for processes to request privileged actions from the kernel. See Chapter 3, Operating Systems.” (SysPrfBGrg 2021)
T
- task - A Linux runnable entity, which may be a process, a thread from a multithreaded process, or a kernel thread. See Chapter 3, Operating Systems.” (SysPrfBGrg 2021)
- TCP - Transmission Control Protocol. Originally defined in RFC 793. See Chapter 10, Network.“ (SysPrfBGrg 2021)
- thread - A software abstraction for an instance of program execution, which can be scheduled to run on a CPU. The kernel has multiple thread]]s, and a process contains one or more. See Chapter 3, Operating Systems.“ (SysPrfBGrg 2021)
- throughput - For network communication devices, throughput commonly refers to the data [[transfer rate in either bits per second or bytes per second. Throughput may also refer to I/O completions per second (IOPS) when used with statistical analysis, especially for targets of study.” (SysPrfBGrg 2021)
- TLB - Translation Lookaside Buffer. A cache for memory translation on virtual memory systems, used by the MMU (see MMU).“ (SysPrfBGrg 2021)
- TPU - Tensor processing unit. An AI accelerator ASIC for machine learning developed by Google, and named after TensorFlow (a software platform for machine learning).“ (SysPrfBGrg 2021)
U
- UDP - User Datagram Protocol. Originally defined in RFC 768. See Chapter 10, Network.“ (SysPrfBGrg 2021)
- us Microseconds. This should be abbreviated as ìs; however, you will often see it as “us,” especially in the output of ASCII-based performance tools. (Note that vmstat(8)’s output, included many times in this book, includes a us column, short for user time.)“ (SysPrfBGrg 2021)
- USDT - User-land Statically Defined Tracing. This involves the placement of static instrumentation in application code by the programmer, at locations to provide useful probes.“ (SysPrfBGrg 2021)
- user-land - This refers to user-level software and files, including executable programs in /usr/bin, usr/lib, usr/etc.” (SysPrfBGrg 2021)
- user-level - The processor privilege mode that user-land execution uses. This is a lower privilege level than the kernel, and one that denies direct access to resources, forcing user-level software to request access to them via the kernel.“ (SysPrfBGrg 2021)
V
- vCPU - A virtual CPU. Modern processors can expose multiple virtual CPUs per core (e.g., Intel Hyper-Threading).“ (SysPrfBGrg 2021)
- VFS - Virtual file system. An abstraction used by the kernel to support different file system types.” (SysPrfBGrg 2021)
- virtual memory - An abstraction of main memory that supports multitasking and over-subscription.“ (SysPrfBGrg 2021)
W
X
Z
Fair Use Sources
Performance: Systems performance, Systems performance bibliography, Systems Performance Outline: (Systems Performance Introduction, Systems Performance Methodologies, Systems Performance Operating Systems, Systems Performance Observability Tools, Systems Performance Applications, Systems Performance CPUs, Systems Performance Memory, Systems Performance File Systems, Systems Performance Disks, Systems Performance Network, Systems Performance Cloud Computing, Systems Performance Benchmarking, Systems Performance perf, Systems Performance Ftrace, Systems Performance BPF, Systems Performance Case Study), Accuracy, Algorithmic efficiency (Big O notation), Algorithm performance, Amdahl's Law, Android performance, Application performance engineering, Async programming, Bandwidth, Bandwidth utilization, bcc, Benchmark (SPECint and SPECfp), BPF, bpftrace, Performance bottleneck (“Hotspots”), Browser performance, C performance, C Plus Plus performance | C++ performance, C Sharp performance | performance, Cache hit, Cache performance, Capacity planning, Channel capacity, Clock rate, Clojure performance, Compiler performance (Just-in-time (JIT) compilation - Ahead-of-time compilation (AOT), Compile-time, Optimizing compiler), Compression ratio, Computer performance, Concurrency, Concurrent programming, Concurrent testing, Container performance, CPU cache, CPU cooling, CPU cycle, CPU overclocking (CPU boosting, CPU multiplier), CPU performance, CPU speed, CPU throttling (Dynamic frequency scaling - Dynamic voltage scaling - Automatic underclocking), CPU time, CPU load - CPU usage - CPU utilization, Cycles per second (Hz), CUDA (Nvidia), Data transmission time, Database performance (ACID-CAP theorem, Database sharding, Cassandra performance, Kafka performance, IBM Db2 performance, MongoDB performance, MySQL performance, Oracle Database performance, PostgreSQL performance, Spark performance, SQL Server performance), Disk I/O, Disk latency, Disk performance, Disk speed, Disk usage - Disk utilization, Distributed computing performance (Fallacies of distributed computing), DNS performance, Efficiency - Relative efficiency, Encryption performance, Energy efficiency, Environmental impact, Fast, Filesystem performance, Fortran performance, FPGA, Gbps, Global Interpreter Lock - GIL, Golang performance, GPU - GPGPU, GPU performance, Hardware performance, Hardware performance testing, Hardware stress test, Haskell performance, High availability (HA), Hit ratio, IOPS - I/O operations per second, IPC - Instructions per cycle, IPS - Instructions per second, Java performance (Java data structure performance - Java ArrayList is ALWAYS faster than LinkedList, Apache JMeter), JavaScript performance (V8 JavaScript engine performance, Node.js performance - Deno performance), JVM performance (GraalVM, HotSpot), Kubernetes performance, Kotlin performance, Lag (video games) (Frame rate - Frames per second (FPS)), Lagometer, Latency, Lazy evaluation, Linux performance, Load balancing, Load testing, Logging, macOS performance, Mainframe performance, Mbps, Memory footprint, Memory speed, Memory performance, Memory usage - Memory utilization, Micro-benchmark, Microsecond, Monitoring
Linux/UNIX commands for assessing system performance include:
- uptime the system reliability and load average
- Top (Unix) | top for an overall system view
- Vmstat (Unix) | vmstat vmstat reports information about runnable or blocked processes, memory, paging, block I/O, traps, and CPU.
- Htop (Unix) | htop interactive process viewer
- dstat, atop helps correlate all existing resource data for processes, memory, paging, block I/O, traps, and CPU activity.
- iftop interactive network traffic viewer per interface
- nethogs interactive network traffic viewer per process
- iotop interactive I/O viewer
- Iostat (Unix) | iostat for storage I/O statistics
- Netstat (Unix) | netstat for network statistics
- mpstat for CPU statistics
- tload load average graph for terminal
- xload load average graph for X
- /proc/loadavg text file containing load average
(Event monitoring - Event log analysis, Google Cloud's operations suite (formerly Stackdriver), htop, mpstat, macOS Activity Monitor, Nagios Core, Network monitoring, netstat-iproute2, proc filesystem (procfs)]] - ps (Unix), System monitor, sar (Unix) - systat (BSD), top - top (table of processes), vmstat), Moore’s law, Multicore - Multi-core processor, Multiprocessor, Multithreading, mutex, Network capacity, Network congestion, Network I/O, Network latency (Network delay, End-to-end delay, packet loss, ping - ping (networking utility) (Packet InterNet Groper) - traceroute - netsniff-ng, Round-trip delay (RTD) - Round-trip time (RTT)), Network performance, Network switch performance, Network usage - Network utilization, NIC performance, NVMe, NVMe performance, Observability, Operating system performance, Optimization (Donald Knuth: “Premature optimization is the root of all evil), Parallel processing, Parallel programming (Embarrassingly parallel), Perceived performance, Performance analysis (Profiling), Performance design, Performance engineer, Performance equation, Performance evaluation, Performance gains, Performance Mantras, Performance measurement (Quantifying performance, Performance metrics), Perfmon, Performance testing, Performance tuning, PowerShell performance, Power consumption - Performance per watt, Processing power, Processing speed, Productivity, Python performance (CPython performance, PyPy performance - PyPy JIT), Quality of service (QOS) performance, Refactoring, Reliability, Response time, Resource usage - Resource utilization, Router performance (Processing delay - Queuing delay), Ruby performance, Rust performance, Scala performance, Scalability, Scalability test, Server performance, Size and weight, Slow, Software performance, Software performance testing, Speed, Stress testing, SSD, SSD performance, Swift performance, Supercomputing, Tbps, Throughput, Time (Time units, Nanosecond, Millisecond, Frequency (rate), Startup time delay - Warm-up time, Execution time), TPU - Tensor processing unit, Tracing, Transistor count, TypeScript performance, Virtual memory performance (Thrashing), Volume testing, WebAssembly, Web framework performance, Web performance, Windows performance (Windows Performance Monitor). (navbar_performance)
Cloud Monk is Retired ( for now). Buddha with you. © 2025 and Beginningless Time - Present Moment - Three Times: The Buddhas or Fair Use. Disclaimers
SYI LU SENG E MU CHYWE YE. NAN. WEI LA YE. WEI LA YE. SA WA HE.