Table of Contents
Systems Performance Index
Systems Performance Index - is the MASTER.
DO indeed EDIT HERE.
Temporary Backup is Systems Performance, 2nd Edition, by Brendan Gregg Index
Return to Systems Performance Glossary, Systems Performance, 2nd Edition, Performance Bibliography, Systems Performance, Performance DevOps, IT Bibliography, DevOps Bibliography
“ (SysPrfBGrg 2021)
A
- Accelerators in USE method, 49
- accept system calls, 95
- ACK detection in TCP, 512
- Active benchmarking]], 657–660
- Activities overview, 3–4
- Adaptive mutex locks, 198
- Address space, 304
- guests, 603
- kernel, 90
- memory, 304, 310
- processes, 95, 99–102, 319–322
- threads, 227–228
- virtual memory, 104, 305
- monitoring software, 137–138
- product monitoring, 79
- Alerts, 8
- caching, 36
- congestion control, 115, 118, 513–514
- memory, 309
- multithreaded applications, 353
- Amdahl’s Law of Scalability, 64–65
- benchmarking, 644–646, 665–666
- capacity planning, 38, 71–72
- off-CPU, 188–192
- resource, 38–39
- workload, 4–5, 39–40
- Anti-methods
- blame-someone-else, 43
- Application I/O, 369, 435
- Application internals, 213
- Applications, 171
- basics, 172–173
- bpftrace for, 765
- exercises, 216–217
- internals, 213
- latency documentation, 385
- missing stacks, 215–216
- missing symbols, 214
- objectives, 173–174
- observability, 174
- references, 217–218
- distributed tracing, 199
- overview, 186–187
- static performance tuning, 198–199
- USE method, 193
- bpftrace, 209–213
- execsnoop, 207–208
- overview, 199–200
- perf, 200–203
- profile, 203–204
- strace, 205–207
- buffers, 177
- caching, 176
- concurrency and parallelism, 177–181
- non-blocking I/O, 181
- Performance Mantras, 182
- polling, 177
- Applications programming languages, 182–183
- compiled, 183–184
- garbage collection]], 184–185
- Interpreted, 184–185
- virtual machines, 185
- Appropriateness level in methodologies, 28–29
- scalable, 581–582
- kprobes, 152
- networks, 507
- tracepoints, 148–149
- uprobes, 154
- Arithmetic mean, 74
- cloud computing, 583–584
- Associativity in caches, 234
- Asynchronous disk I/O, 434–435
- Asynchronous interrupts, 96–97
- Asynchronous writes, 366
- cloud computing, 583–584
- Averages, 74–75
- Axes
- flame graphs, 10, 187, 290
- scalability tests, 62
- scatter plots, 81–82, 488
B
- Balloon drivers, 597
- disks, 424
- interconnects, 237
- networks, 500, 508, 532–533
- OS virtualization, 614–615
- Bare-metal hypervisors, 587
- bcache technology, 117
- disks, 450
- documentation, 760–761
- installing, 754
- networks, 526
- one-liners, 757–759
- overview, 753–754
- system-wide tracing, 136
- Benchmark paradox, 648–649
- Benchmarket]]ing, 642
- Benchmarking, 641–642
- analysis, 644–646
- CPUs, 254
- effective, 643–644
- exercises, 668
- failures, 645–651
- industry standards, 654–656
- memory, 328
- questions, 667–668
- reasons, 642–643
- references, 669–670
- replay, 654
- simulation, 653–654
- specials, 650
- types, 13, 651–656
- active, 657–660
- checklist, 666–667
- overview, 656
- passive, 656–657
- ramping load, 662–664
- sanity checks, 664–665
- statistical analysis, 665–666
- USE method, 661
- workload characterization, 662
- Berkeley Packet Filter (BPF), 751–752
- description, 12–13
- extended. See Extended BPF
- iterator, 562
- kernels, 92
- OS virtualization tracing, 620, 624–625, 629
- program, 90
- Berkeley Software Distribution (BSD)]], 113
- Billing in cloud computing, 584
- Bimodal performance, 76
- CPU, 253, 297–298
- NUMA, 353
- processor, 181–182
- bioerr tool, 487
- BCC, 753–755
- disks, 450, 468–470
- example, 753–754
- biosnoop tool
- BCC, 755
- disks, 470–472
- hardware virtualization, 604–605
- outliers, 471–472
- system-wide tracing, 136
- biotop tool
- BCC, 755
- disks, 450, 473–474
- BCC, 755
- blame command, 120
- Blanco, Brenden, 753
- Blind faith benchmarking, 645
- blkio control group]], 610, 617
- action identifiers, 477
- analysis, 478–479
- description, 116
- disks, 475–479
- RWBS description, 477
- visualizations, 479
- Block device interface, 109–110, 447
- Block interleaving, 378
- FFS, 378
- Bonnie and Bonnie++ benchmarking tools
- active benchmarking]], 657–660
- file systems, 412–414
- Boolean expressions in bpftrace, 775–776
- Borkmann, Daniel, 121
- capacity planning, 70–71
- complexity, 6
- defined, 22
- USE method, 47–50, 245, 324, 450–451
- application internals, 213
- description, 282
- event sources, 558
- examples, 284, 761–762
- file system internals, 408
- installing, 762
- one-liners for file systems, 402–403, 805–806
- page fault flame graphs, 346
- references, 782
- scheduling internals, 284–285
- system-wide tracing, 136
- tracepoints, 149
- actions, 769
- comments, 767
- documentation, 781
- example, 766
- filters, 769
- flow [[control, 775–777
- functions, 770–772, 778–781
- Hello, World! program, 770
- operators, 776–777
- program structure, 767
- timing, 772–773
- usage, 766–767
- variables, 770–771, 777–778
- tuning, 571
- brk system calls, 95
- btrfs file system, 381–382, 399
- btt tool, 478
- hash table]]s, 180
- Buddy allocators, 317
- Buffer caches, 110, 374
- Bufferbloat, 507
- applications, 177
- block devices, 110, 374
- networks, 507
- ring, 522
- TCP, 520, 569
- bufgrow tool, 409
- applications, 172
- Bursting in cloud computing, 584, 614–615
- Buses, memory, 312–313
- tuning, 571
- Bytecode, 185
C
- compiled languages, 183
- symbols, 214
- stacks, 215
- Cache miss rate, 36
- Cache warmth, 222
- applications, 176
- associativity, 234
- block devices, 110, 374
- cache line size, 234
- coherency, 234–235
- CPUs, hardware virtualization, 596
- CPUs, OS virtualization, 615–616
- defined, 23
- dentry, 375
- file systems, flushing, 414
- file systems, OS virtualization, 613
- file systems, overview, 361–363
- file systems, tuning, 389, 414–416
- file systems, types, 373–375
- file systems, usage, 309
- inode, 375
- methodologies, 35–37
- operating systems, 108–109
- page, 315, 374
- RAID, 445
- tuning, 60
- write-back, 365
- file systems, 399, 658–659
- memory, 348
- Canary testing, 3
- Capacity of file systems, 371
- benchmarking for, 642
- cloud computing, 582–584
- defined, 4
- overview, 69
- resource limits, 70–71
- bug database systems, 792–793
- conclusion, 792
- configuration, 786–788
- PMCs, 788–789
- references, 793
- statistics, 784–786
- tracing, 790–792
- Casual benchmarking, 645
- CPU scheduling, 241
- description, 243
- description, 116, 118
- Linux kernel, 116
- memory, 317, 353
- OS virtualization, 606, 608–611, 613–620, 630
- resource management, 111, 298
- statistics, 139, 141, 620–622, 627–628
- cgtop tool, 621
- Cheating in benchmarking, 650–651
- benchmarking, 666
- CPUs, 247, 527
- disks, 453
- file systems, 387
- memory, 325
- chrt command, 295
- Classes, scheduling
- CPUs, 242–243
- I/O, 493
- kernel, 106, 115
- priority, 295
- CPUs, 223, 230
- operating systems, 99
- clone system calls, 94, 100
- Cloud computing, 579–580
- background, 580–581
- capacity planning, 582–584
- comparisons, 634–636
- vs. enterprise, 62
- exercises, 636–637
- lightweight virtualization, 630–633
- overview, 14
- PMCs, 158
- references, 637–639
- scalable architecture, 581–582
- storage, 584–585
- types, 634
- Cloud-[[native databases, 582
- Co-routines in applications, 178
- caches, 234–235
- models, 63
- Collisions
- hash, 180
- networks, 516
- Colors in flame graphs, 291
- Column quantizations, 82–83
- Community applications, 172–173
- Competition, benchmarking, 649
- optimizations, 183–184
- overview, 183
- CPU optimization, 229
- options, 295
- CPU scheduling, 241
- description, 243
- Complexity, 5
- Comprehension in flame graphs, 249
- btrfs, 382
- disks, 369
- ZFS, 381
- applications, 177–181
- micro-benchmarking, 390, 456
- applications, 172
- case study]], 786–788
- Linux kernel, 115
- networks, 508
- TCP, 510, 513
- tuning, 570
- connect system calls, 95
- Connections for networks, 509
- characteristics, 527–528
- firewalls, 517
- latency, 7, 24–25, 505–506, 528
- local, 509
- monitoring, 529
- NICs, 109
- QUIC, 515
- UDP, 514
- lightweight virtualization, 631–632
- observability, 617–630
- OS virtualization, 605–630
- locks, 198
- models, 63
- defined, 90
- kernels, 93
- Control units in CPUs, 230
- caches, 430
- disk, 426
- micro-benchmarking, 457
- network, 501–502, 516
- solid-state drives, 440–441
- tunable, 494–495
- USE method, 49, 451
- btrfs, 382
- ZFS, 380
- Cores
- defined, 220
- Counters, 8–9
- fixed, 133–135
- hardware, 156–158
- btrfs, 382
- ZFS, 380
- BCC, 755
- case study]], 790–791
- threads, 278–279
- CPUs, 219–220
- binding, 181–182
- bpftrace for, 763, 803–804
- clock rate, 223
- exercises, 299–300
- experiments, 293–294
- flame graphs. See Flame graphs
- garbage collection]], 185
- hardware virtualization, 589–592, 596–597
- instructions, defined, 220
- instructions, IPC, 225
- instructions, pipeline, 224
- instructions, size, 224
- instructions, steps, 223
- instructions, width, 224
- memory caches, 221–222
- models, 221–222
- OS virtualization, 611, 614, 627, 630
- preemption, 227
- references, 300–302
- saturation, 226–227
- schedulers, 105–106
- scheduling classes, 115
- terminology, 220
- USE method, 49–51, 795–797
- utilization, 226
- virtualization support, 588
- visualizations, 288–293
- CPUs architecture, 221, 229
- accelerators, 240–242
- associativity, 234
- caches, 230–235
- GPUs, 240–241
- hardware, 230–241
- idle threads, 244
- interconnects, 235–237
- latency, 233–234
- memory management units, 235
- NUMA grouping, 244
- PMCs, 237–239
- processors, 230
- schedulers, 241–242
- scheduling classes, 242–243
- software, 241–244
- micro-benchmarking, 253–254
- overview, 244–245
- performance monitoring, 251
- profiling, 247–250
- sample processing, 247–248
- static performance tuning, 252
- USE, 245–246
- workload characterization, 246–247
- bpftrace, 282–285
- GPUs, 287
- hardirqs, 282
- miscellaneous, 285–286
- mpstat, 259
- perf, 267–276
- pidstat, 262
- profile, 277–278
- ps, 260–261
- ptime, 263–264
- runqlat, 279–280
- runqlen, 280–281
- sar, 260
- softirqs, 281–282
- time, 263–264
- tlbstat, 266–267
- top, 261–262
- uptime, 255–258
- vmstat, 258
- applications, 187–189
- benchmarking, 660–661
- perf, 200–201
- record, 695–696
- steps, 247–250
- system-wide, 268–270
- overview, 294–295
- exclusive, 298
- CUMASK values in MSRs, 238–239
- CPUs, 251
- memory, 326
D
- Data integrity in magnetic rotational disks, 438
- Data paths in hardware virtualization, 594
- applications, 172
- cloud computing, 582
- OSI model, 502
- UDP, 514
- DAX (Direct Access), 118
- dbstat tool, 756
- dcsnoop tool, 409
- dcstat tool, 409
- dd command
- disks, 490–491
- file systems, 411–412
- Deflated disk I/O, 369
- Defragmentation in XFS, 380
- Degradation in scalability, 31–32
- kernel, 116
- overview, 145
- ext4, 379
- XFS, 380
- memory, 307–308
- Development, benchmarking for, 642
- drivers, 109–110, 522
- hardware virtualization, 588, 594, 597
- Dhrystone benchmark
- CPUs, 254
- simulations, 653
- Direct Access (DAX), 118
- Direct buses, 313
- Direct mapped caches, 234
- Directories in file systems, 107
- caches, 430
- magnetic rotational disks, 439
- tunable, 494–495
- USE method, 451
- Disks, 423–424
- exercises, 495–496
- experiments, 490–493
- IOPS, 432
- read/write ratio, 431
- references, 496–498
- saturation, 434
- terminology, 424
- tunable, 494
- tuning, 493–495
- USE method, 451
- utilization, 433
- visualizations, 487–490
- interfaces, 442–443
- magnetic rotational disks, 435–439
- persistent memory, 441
- solid-state drives, 439–441
- bpftrace for, 764, 806–807
- caching, 430
- errors, 483
- latency, 428–430, 454–455, 467–472, 482–483
- operating system stacks, 446–449
- OS virtualization, 613, 616
- scatter plots, 488
- size, 432, 480–481
- time measurements, 427–429
- wait, 434
- micro-benchmarking, 456–457
- overview, 449–450
- performance monitoring, 452
- scaling, 457–458
- static performance tuning, 455–456
- USE method, 450–451
- workload characterization, 452–454
- controllers, 426
- biolatency, 468–470
- biosnoop, 470–472
- biostacks, 474–475
- biotop, 473–474
- bpftrace, 479–483
- iostat, 459–463
- iotop, 472–473
- MegaCli, 484
- miscellaneous, 487
- overview, 458–459
- perf, 465–468
- pidstat, 464–465
- PSI, 464
- sar, 463–464
- Distributed operating systems, 123–124
- Distributed tracing, 199
- multimodal, 76–77
- normal, 75
- dmesg tool
- CPUs, 245
- description, 15
- memory, 348
- OS virtualization, 619
- Docker 607, 620–622
- application latency, 385
- BCC, 760–761
- bpftrace, 781
- Ftrace, 748–749
- kprobes, 153
- perf, 276, 703
- PMCs, 158
- sar, 165–166
- tracepoints, 150–151
- uprobes, 155
- USDT, 156
- scheduling, 244
- Xen, 589
- DRAM (dynamic random-access memory), 311
- overview, 55–56
- balloon, 597
- device, 109–110, 522
- parameterized, 593–595
- drsnoop tool
- BCC, 756
- memory, 342
- description, 12
- Solaris kernel, 114
- Duplicate ACK detection, 512
- Duration in RED method, 53
- kprobes, 151
- overview, 12
- Dynamic random-access memory (DRAM), 311
- D[[Trace, 114
- perf, 677–678
- tools, 12
- DynTicks, 116
E
- eBPF. See Extended BPF
- IP, 508–510
- TCP, 513
- tuning, 570
- EFS (Elastic File [[System), 585
- Elastic File [[System (EFS), 585
- Elevator seeking in magnetic rotational disks, 437–438
- description, 183
- missing symbols in, 214
- Encapsulation for networks, 504
- Enterprise models, 62
- benchmarking, 647
- processes, 101–102
- Ephemeral drives, 584
- Ephemeral ports, 531
- epoll system call, 115, 118
- Erlang virtual machines, 185
- applications, 193
- benchmarking, 647
- CPUs, 245–246, 796, 798
- disk controllers, 451
- I/O, 483, 798
- kernels, 798
- memory, 324–325, 796, 798
- networks, 526–527, 529, 796–797
- RED method, 53
- storage, 797
- USE method overview, 47–48, 51–53
- Event-based concurrency, 178
- Event-based tools, 133
- Event sources for Wireshark, 559
- disks, 454
- file systems, 388
- Ftrace, 707–708
- kprobes, 719–720
- methodologies, 57–58
- uprobes, 722–723
- case study]], 789–790
- CPUs, 273–274
- observability source, 159
- selecting, 274–275
- synthetic, 731–733
- trace, 148
- synchronous interrupts, 97
- user mode, 93
- kernel, 94
- processes, 100
- BCC, 756
- CPUs, 285
- static instrumentation, 11–12
- tracing, 136
- description, 183
- missing symbols in, 214
- execve system call, 11
- CPUs, 293–294
- disks, 490–493
- file systems, 411–414
- networks, 562–567
- overview, 13–14
- scientific method, 45–46
- Experts for applications, 173
- IP, 508–510
- TCP, 513
- tuning, 570
- Exporters for monitoring, 55, 79, 137
- description, 118
- event sources, 558
- kernel bypass, 523
- ext3 file system, 378–379
- features, 379
- tuning, 416–418
- Extended BPF, 12
- BCC 751–761
- bpftrace 752–753, 761–781, 803–808
- description, 118
- firewalls, 517
- histograms, 744
- kernel-mode applications, 92
- overview, 121–122
- tracing tools, 166
- Extents, 375–376
- btrfs, 382
- ext4, 380
F
- Failures, benchmarking, 645–651
- False sharing for hash table]]s, 181
- Fast File [[System (FFS)
- description, 113
- overview, 377–378
- in synchronous interrupts, 97
- page faults. See page faults
- FC (Fibre Channel) interface, 442–443
- FFS (Fast File [[System)
- description, 113
- overview, 377–378
- Fibre Channel (FC) interface, 442–443
- Field-programmable gate arrays (FPGAs), 240–241
- bpftrace for, 764, 805–806
- capacity, OS virtualization, 616
- capacity, performance issues, 371
- exercises, 419–420
- experiments, 411–414
- I/O, non-blocking, 366–367
- interfaces, 361
- latency, 362–363
- memory-mapped files, 367
- metadata]], 367–368
- micro-benchmark tools, 412–414
- models, 361–362
- OS virtualization, 611–612
- paging, 306
- pre[[fetch, 364–365
- reads, micro-benchmarking for, 61
- references, 420–421
- special, 371
- synchronous writes, 366
- tuning, 414–419
- visualizations, 410–411
- caches, 373–375
- features, 375–377
- VFS, 107, 373
- File [[systems caches, 361–363
- flushing, 414
- hit ratio, 17
- OS virtualization, 616
- tuning, 389
- usage, 309
- write-back, 365
- micro-benchmarking, 390–391
- overview, 383–384
- performance monitoring, 388
- static performance tuning, 389
- workload separation, 389
- bpftrace, 402–408
- fatrace, 395–396
- filetop, 398–399
- free, 392–393
- miscellaneous, 409–410
- mount, 392
- opensnoop, 397
- overview, 391–392
- sar, 393–394
- slabtop, 394–395
- strace, 395
- top, 393
- vmstat, 393
- btrfs, 381–382
- ext3, 378–379
- ext4, 379
- FFS, 377–378
- XFS, 379–380
- ZFS, 380–381
- bpftrace, 769, 776
- event, 693–694
- kprobes, 721–722
- PID, 729–730
- tracepoints, 717–718
- uprobes, 723
- disks, 493
- file systems, 413–414
- Firewalls, 503
- misconfigured, 505
- overview, 517
- tuning, 574
- automated, 201
- characteristics, 290–291
- colors, 291
- generating, 249, 270–272
- interactivity, 291
- interpretation, 291–292
- missing stacks, 215
- overview, 289–290
- page faults, 340–342, 346
- perf, 119
- performance wins, 250
- profiles, 278
- sample processing, 249–250
- scripts, 700
- disks, 493
- file systems, 413–414
- Flow [[control in bpftrace, 775–777
- fork system calls, 94, 100
- FPGAs (field-programmable gate arrays), 240–241
- FFS, 377
- file systems, 364
- memory, 321
- packets, 505
- reducing, 380
- defined, 500
- networks, 515
- OSI model, 502
- Free memory lists, 315–318
- description, 15
- file systems, 392–393
- memory, 348
- OS virtualization, 619
- jails, 606
- je[[malloc, 322
- kernel, 113
- TSA analysis, 217
- TCP LRO, 523
- fsrwstat tool, 409
- Ftrace, 13, 705–706
- capabilities overview, 706–708
- description, 166
- documentation, 748–749
- hist triggers, 727–733
- hwlat, 726
- kprobes, 719–722
- options, 716
- OS virtualization, 629
- perf, 741
- references, 749
- tracepoints, 717–718
- tracing, 136
- uprobes, 722–723
- Fully associative caches, 234
- BCC, 756–758
- example, 747
- Ftrace, 706–707
- BCC, 757
- description, 708
- options, 725
- Ftrace, 707, 711–712
- observability source, 159
- profiling, 248
- futex system calls, 95
G
- Garbage collection]], 185–186
- optimizations, 183–184
- PGO kernels, 122
- syscalls, 92
- gprof tool, 135
- Grafana, 8–9, 138
- tools, 287
- heap, 320
- memory, 185, 316, 327
- hardware virtualization, 590–593, 596–605
- lightweight virtualization, 632–633
- OS virtualization, 617, 627–629
H
- memory, 311–315
- networks, 515–517
- threads, 220
- tracing, 276
- CPUs, 273–274
- perf, 680–683
- selecting, 274–275
- Hardware RAID, 444
- comparisons, 634–636
- I/O, 593–595
- implementation, 588–589
- multi-tenant contention, 595
- observability, 597–605
- overhead, 589–595
- overview, 587–588
- Hash table]]s in applications, 180–181
- hdparm tool, 491–492
- Heads in magnetic rotational disks, 436
- description, 304
- growth, 320
- CPU utilization]], 288–289
- disk utilization, 490
- file systems, 410–411
- overview, 82–83
- Hello, World! program, 770
- Hist triggers
- fields, 728–729
- modifiers, 729
- stack trace keys, 730–731
- usage, 727
- Histogram, 76–77
- cloud computing, 581–582
- Hosts
- applications, 172
- cloud computing, 580
- hardware virtualization, 597–603
- OS virtualization, 617, 619–627
- Hue in flame graphs, 291
- Hybrid clouds, 580
- Hyper-V, 589
- cloud computing, 580
- hardware virtualization, 587–588
- kernels, 93
I
- Icicle graphs, 250
- icstat tool, 409
- Idle memory, 315
- Idle scheduling class, 243
- IDLE scheduling policy, 243
- Idle threads, 99, 244
- If statements, 776
- ifpps tool, 561
- iftop tool, 562
- caches, 375
- VFS, 373
- Industry benchmarking, 60–61
- Industry standards for benchmarking, 654–655
- caches, 375
- VFS, 373
- solid-state drive controllers, 440
- hardware virtualization, 593–595, 597
- latency, 424
- merging, 448
- non-blocking, 181, 366–367
- OS virtualization, 611–612, 616–617
- schedulers, 448
- scheduling, 115–116
- size, applications, 176
- stacks, 107–108, 372
- USE method, 798
- bpftrace, 210–212
- perf, 202–203
- BCC, 754
- bpftrace, 762
- description, 14
- types, 580
- defined, 220
- IPC, 225
- pipeline, 224
- size, 224
- steps, 223
- text, 304
- width, 224
- Instructions per cycle (IPC), 225, 251, 326
- Integrated caches, 232
- Interactivity in flame graphs, 291
- buses, 313
- CPUs, 235–237
- USE method, 49–51
- defined, 500
- device drivers, 109–110
- disks, 442–443
- file systems, 361
- kprobes, 153
- network, 109, 501
- network negotiation, 508
- PMCs, 157–158
- scheduling in NAPI, 522
- tracepoints, 149–150
- uprobes, 154–155
- Interleaving in FFS, 378
- congestion avoidance, 508
- overview, 509–510
- sockets]], 509
- Interpretation of flame graphs, 291–292
- Interpreted programming languages, 184–185
- asynchronous, 96–97
- defined, 91
- hardware, 282
- masking, 98–99
- network [[latency, 529
- overview, 96
- soft, 281–282
- synchronous, 97
- threads, 97–98
- IO accounting, 116
- io_uring_enter command, 181
- io_uring interface, 119
- ioctl system calls, 95
- ionice tool, 493–494
- ioping tool, 492
- defined, 22
- description, 7
- disks, 429, 431–432
- networks, 527–529
- iosched tool, 487
- iosnoop tool, 743
- bonnie++ tool, 658
- description, 15
- disks, 450, 459–463
- memory, 348
- options, 460
- OS virtualization, 619, 627
- iotop tool, 450, 472–473
- congestion avoidance, 508
- overview, 509–510
- sockets]], 509
- IPC (instructions per cycle), 225, 251, 326
- ipecn tool, 561
- example, 13–14
- network throughput, 564–565
- irqsoff tracer, 708
- Isolation in OS virtualization, 629
J
- analysis, 29
- case study]], 783–792
- flame graphs, 201, 271
- garbage colleciton, 185–186
- Java F[[light Recorder, 135
- stack traces, 215
- symbols, 214
- uprobes, 213
- virtual machines, 185
- JBOD (just a bunch of disks), 443
- je[[malloc allocator, 322
- Linux kernel, 117
- PGO kernels, 122
- Jitter in operating systems, 99
- jmaps tool, 214
- Journaling
- btrfs, 382
- ext3, 378–379
- file systems, 376
- XFS, 380
- Jumbo frames
- packets, 505
- tuning, 574
- Just a bunch of disks (JBOD), 443
- Linux kernel, 117
- PGO kernels, 122
K
- CPU quotas, 595
- description, 589
- I/O path, 594
- Linux kernel, 116
- observability, 600–603
- Kernel mode, 93
- Kernel space, 90
- CPUs, 226
- bpftrace for, 765
- BSD, 113
- comparisons, 124
- defined, 90
- developments, 115–120
- execution, 92–93
- file systems, 107
- filtering in OS virtualization, 629
- Linux, 114–122, 124
- monolithic, 123
- overview, 91–92
- PGO, 122
- PMU events, 680
- preemption, 110
- schedulers, 105–106
- Solaris, 114
- stacks, 103
- system calls, 94–95
- unikernels, 123
- Unix, 112
- USE method, 798
- user modes, 93–94
- versions, 111–112
- kfunc probes, 774
- killsnoop tool
- BCC, 756
- Knee points
- models, 62–64
- scalability, 31
- kprobes, 685–686
- arguments, 686–687, 720–721
- filters, 721–722
- overview, 151–153
- profiling, 722
- return values, 721
- triggers, 721–722
- kretfunc probes, 774
- kretprobes, 152–153, 774
- ksym function, 779
- node, 608
- OS virtualization, 620–621
L
- Language virtual machines, 185
- analysis methodologies, 56–57
- applications, 173
- biolatency, 468–470
- CPUs, 233–234
- defined, 22
- disk I/O, 428–430, 454–455, 467–472, 482–483
- distributions, 76–77
- file systems, 362–363, 384–386, 388
- hardware, 118
- interrupts, 98
- memory, 311, 441
- methodologies, 24–25
- networks, connections, 7, 24–25, 505–506, 528
- outliers, 58, 186, 424, 471–472
- overview, 6–7
- packets, 532–533
- percentiles, 413–414
- perf, 467–468
- scatter plots, 81–82, 488
- scheduler, 226, 272–273
- solid-state drives, 441
- ticks, 99
- VFS, 406–408
- workload analysis, 39–40
- data, 232
- instructions, 232
- memory, 314
- embedded, 232
- memory, 314
- LLC, 232
- memory, 314
- Level of appropriateness in methodologies, 28–29
- lhist function, 780
- Life cycle for processes, 100–101
- network connections, 507
- solid-state drives, 441
- Lightweight threads, 178
- comparisons, 634–636
- implementation, 631–632
- observability, 632–633
- overhead, 632
- overview, 630
- Limitations of averages, 75
- disks, 487–488
- working with, 80–81
- methodologies, 32
- models, 63
- Link-[[time optimization (LTO), 122
- extended BPF, 121–122
- kernel developments, 115–120
- KPTI patches, 121
- observability sources, 138–146
- observability tools, 130
- overview, 114–115
- static performance tools, 130–131
- perf, 673
- perf, 674–675
- llcstat tool
- BCC, 756
- CPUs, 285
- schedulers, 241
- Locks
- analysis, 198
- applications, 179–181
- tracing, 212–213
- applications, 172
- ZFS, 381
- defined, 220
- hardware threads, 221
- Logical operations in file systems, 361
- lsof tool, 561
- LTO (link-[[time optimization), 122
- LTTng tool, 166
M
- madvise system call, 367, 415–416
- Magnetic rotational disks, 435–439
- caching, 37–39
- defined, 90, 304
- latency, 26
- managing, 104–105
- overview, 311–312
- Marketing, benchmarking for, 642
- Masking interrupts, 98–99
- magnetic rotational disks, 436–437
- micro-benchmarking, 457
- MCS locks, 117
- Mean, 74
- MegaCli tool, 484
- Melo, Arnaldo Carvalho de, 671
- Meltdown vulnerability, 121
- meminfo tool, 142
- BCC, 756
- memory, 348
- Memory, 303–304
- allocators, 309, 353
- bpftrace for, 763–764, 804–805
- CPU caches, 221–222
- exercises, 354–355
- garbage collection]], 185
- hardware virtualization, 596–597
- internals, 346–347
- NUMA binding, 353
- OS virtualization, 611, 613, 615–616
- overcommit, 308
- overprovisioning in solid-state drives, 441
- paging, 306–307
- persistent, 441
- references, 355–357
- shared, 310
- terminology, 304
- tuning, 350–354
- USE method, 49–51, 796–798
- utilization and saturation, 309
- virtual, 90, 104–105, 304–305
- working set size, 310
- Memory architecture, 311
- buses, 312–313
- CPU caches, 314
- hardware, 311–315
- latency, 311
- main memory, 311–312
- MMU, 314
- software, 315–322
- TLB, 314
- displaying, 337–338
- files, 367
- hardware virtualization, 592–593
- kernel, 94
- OS virtualization, 611
- micro-benchmarking, 328
- overview, 323
- performance monitoring, 326
- static performance tuning, 327–328
- usage characterization, 325–326
- USE method, 324–325
- bpftrace, 343–347
- drsnoop, 342
- miscellaneous, 347–350
- numastat, 334–335
- overview, 328–329
- perf, 338–342
- pmap, 337–338
- ps, 335–336
- PSI, 330–331
- sar, 331–333
- slabtop, 333–334
- swapon, 331
- top, 336–337
- vmstat, 329–330
- wss, 342–343
- Metadata]]
- ext3, 378
- file systems, 367–368
- Method R, 57
- Methodologies, 21–22
- anti-methods, 42–43
- caching, 35–37
- capacity planning, 69–73
- exercises, 85–86
- general, 40–41
- level of appropriateness, 28–29
- Method R, 57
- metrics, 32–33
- micro-benchmarking, 60–61
- models, 23–24
- monitoring, 77–79
- performance, 41–42
- performance mantras, 61
- perspectives, 37–40
- point-in-[[time recommendations, 29–30
- profiling, 35
- RED method, 53
- references, 86–87
- saturation, 34–35
- scalability, 31–32
- scientific method, 44–46
- static performance tuning, 59–60
- statistics, 73–77
- terminology, 22–23
- trade-off]]s, 26–27
- USE method, 47–53
- utilization, 33–34
- workload analysis, 39–40
- Amdahl’s Law of Scalability, 64–65
- visual identification, 62–64
- scatter plots, 81–82
- surface plots, 84–85
- tools, 85
- Metrics, 8–9
- applications, 172
- methodologies, 32–33
- observability tools, 167–168
- USE method, 48–51
- CPUs, 253–254
- description, 13
- disks, 456–457, 491–492
- file systems, 390–391, 412–414
- memory, 328
- methodologies, 60–61
- networks, 533
- overview, 651–652
- cloud computing, 583–584
- USE method, 53
- MINIX operating system, 114
- Missing stacks, 215–216
- Missing symbols, 214
- Mixed-mode flame graphs, 187
- description, 95
- mmapsnoop tool, 348
- mnt control group]], 609
- defined, 90
- kernels, 93
- CPUs, 238
- observability source, 159
- Amdahl’s Law of Scalability, 64–65
- CPUs, 221–222
- disks, 425–426
- file systems, 361–362
- methodologies, 23–24
- networks, 501–502
- overview, 62
- visual identification, 62–64
- Monitoring, 77–79
- CPUs, 251
- disks, 452
- file systems, 388
- memory, 326
- networks, 529, 537
- observability tools, 137–138
- products, 79
- sar, 161–162
- file systems, 392
- options, 416–417
- Mounting]] file systems, 106, 392
- mpstat tool
- case study]], 785–786
- CPUs, 245, 259
- description, 15
- OS virtualization, 619
- CPUs, 238
- observability source, 159
- mtr tool, 567
- description, 119
- Multimodal distributions, 76–77
- applications, 177–181
- overview, 110
- Solaris kernel support, 114
- contention in hardware virtualization, 595
- contention in OS virtualization, 612–613
- applications, 177–181
- CPUs, 227–229
- SMT, 225
- applications, 179–180
- contention, 198
- tracing, 212–213
- USE method, 52
- CPU flame graph, 187–188
- memory [[allocation, 345
- page fault sampling, 339–341
- s[[hards, 582
- stack traces, 215
- working set size, 342
N
- Name [[resolution latency, 505, 528
- Namespaces in OS virtualization, 606–609, 620, 623–624
- Native hypervisors, 587
- description, 562
- socket information, 142
- Network interface cards (NICs)
- description, 501–502
- network connections, 109
- Networks, 499–500
- bpftrace for, 764–765, 807–808
- buffers, 27, 507
- congestion avoidance, 508
- controllers, 501–502
- encapsulation, 504
- exercises, 574–575
- experiments, 562–567
- interface negotiation, 508
- interfaces, 501
- latency, 505–507
- local connections, 509
- micro-benchmarking for, 61
- models, 501–502
- operating systems, 109
- OS virtualization, 611–613, 617, 630
- protocol stacks, 502
- protocols, 504
- references, 575–578
- routing, 503
- sniffing, 159
- stacks, 518–519
- terminology, 500
- throughput, 527–529
- USE method, 49–51, 796–797
- utilization, 508–509
- hardware, 515–517
- protocols, 509–515
- software, 517–524
- micro-benchmarking, 533
- overview, 524–525
- performance monitoring, 529
- static performance tuning, 531–532
- USE method, 526–527
- workload characterization, 527–528
- bpftrace, 550–558
- ethtool, 546–547
- ifconfig, 537–538
- ip, 536–537
- miscellaneous, 560–562
- nicstat, 545–546
- nstat, 538–539
- overview, 533–534
- sar, 543–545
- ss, 534–536
- tcp[[dump, 558–559
- tcpretrans, 549–550
- Wireshark, 560
- configuration, 574
- system-wide, 567–572
- BCC, 756
- file systems, 399
- nfsstat tool, 561
- nice command
- CPU priorities, 252
- resource management, 111
- scheduling priorities, 295
- NICs (network interface cards)
- description, 501–502
- network connections, 109
- nicstat tool, 132, 525, 545–546
- Nitro hardware virtualization
- description, 589
- I/O path, 594–595
- NMIs (non-maskable interrupts), 98
- event-based concurrency, 178
- non-blocking I/O, 181
- symbols, 214
- Nodes
- free lists, 317
- main memory, 312
- Noisy neighbors
- OS virtualization, 617
- applications, 181
- file systems, 366–367
- Non-idle time, 34
- Non-maskable interrupts (NMIs), 98
- benchmarking for, 642
- Non-uniform memory access (NUMA)
- CPUs, 244
- main memory, 312
- multiprocessors, 110
- nop tracer, 708
- Normal distribution, 75
- nsenter command, 624
- nstat tool, 134, 525, 538–539
- ntop function, 779
- NUMA. See Non-uniform memory access (NUMA)
- numactl command, 298, 353
- numastat tool, 334–335
O
- O(1) scheduling class, 243
- allocators, 321
- applications, 174
- benchmarks, 643
- hardware virtualization, 597–605
- operating systems, 111
- overview, 7–8
- profiling, 10–11
- RAID, 445
- tracing, 11–12
- Observability tools, 129
- coverage, 130
- crisis, 131–133
- evaluating results, 167–168
- exercises, 168
- monitoring, 137–138
- profiling, 135
- references, 168–169
- sar, 160–166
- static performance, 130–131
- tracing, 136, 166
- types, 133
- delay accounting, 145
- kprobes, 151–153
- miscellaneous, 159–160
- /proc file system, 140–143
- /sys file system, 143–144
- tracepoints, 146–151
- uprobes, 153–155
- USDT, 155–156
- footprints, 188–189
- time flame graphs, 205
- BCC, 756
- description, 285
- networks, 561
- stack traces, 204–205
- time flame graphs, 205
- On-die caches, 231
- Online defragmentation, 380
- oomkill tool
- BCC, 756
- description, 348
- description, 94
- non-blocking I/O, 181
- BCC, 756
- file systems, 397
- Operating systems, 89
- additional reading, 127–128
- caching, 108–109
- clocks and idle, 99
- defined, 90
- device drivers, 109–110
- distributed, 123–124
- exercises, 124–125
- file systems, 106–108
- interrupts, 96–99
- jitter, 99
- kernels, 91–95, 111–114, 124
- multiprocessors, 110
- networking, 109
- observability, 111
- PGO kernels, 122
- preemption, 110
- processes, 99–102
- references, 125–127
- resource management, 110–111
- schedulers, 105–106
- stacks, 102–103
- system calls, 94–95
- terminology, 90–91
- unikernels, 123
- virtual memory, 104–105
- defined, 22
- file systems, 387–388
- applications, 172
- file systems, 370–371
- applications, 174
- compiler, 183–184, 229
- networks, 524
- comparisons, 634–636
- control group]]s, 609–610
- implementation, 607–610
- namespaces, 606–609
- overhead, 610–613
- overview, 605–607
- containers, 620–621
- guests, 627–629
- hosts, 619–627
- namespaces, 623–624
- overview, 617–618
- strategy, 629–630
- tracing tools, 629
- OSI model, 502
- Outliers
- latency, 186, 424, 471–472
- normal distributions, 77
- Overcommit strategy, 115
- Overcommitted main memory, 305, 308
- PMCs, 157–158
- hardware virtualization, 589–595
- kprobes, 153
- metrics, 33
- OS virtualization, 610–613
- strace, 207
- ticks, 99
- tracepoints, 150
- uprobes, 154–155
- Overprovisioning cloud computing, 583
- Oversize arenas, 322
P
- Pacing in networks, 524
- defined, 500
- latency, 532–533
- networks, 504
- OSI model, 502
- size, 504–505
- sniffing, 530–531
- throttling, 522
- file systems, 374
- memory, 315
- defined, 304
- flame graphs, 340–342, 346
- sampling, 339–340
- daemons, 317
- working with, 306
- Paged virtual memory, 113
- Pages
- defined, 304
- kernel, 115
- sizes, 352–353
- anonymous, 305–307
- demand, 307–308
- file system, 306
- memory, 104–105
- overview, 306
- PAPI (performance application programming interface), 158
- Parallelism in applications, 177–181
- Paravirtualization (PV), 588, 590
- Parity in RAID, 445
- Passive benchmarking, 656–657
- Pathologies in solid-state drives, 441
- Patrol reads in RAID, 445
- pchar tool, 564
- /proc file system, 140–141
- profiling, 135
- tracing, 136
- description, 75
- latency, 413–414
- case study]], 789–790
- CPU flame graphs, 201
- description, 116
- disk block devices, 465–467
- disk I/O, 450, 467–468
- documentation, 276
- flame graphs, 119, 270–272
- hardware virtualization, 601–602, 604
- memory, 324
- networks, 526, 562
- one-liners for dynamic tracing, 677–678
- OS virtualization, 619, 629
- overview, 671–672
- page fault flame graphs, 340–342
- page fault sampling, 339–340
- PMCs, 157, 273–274
- tracepoint events, 684–685
- tracepoints, 147, 149
- tracing, 136, 166
- hardware, 274–275, 680–683
- kprobes, 685–687
- overview, 679–681
- software, 683–684
- uprobes, 687–689
- documentation, 703
- ftrace, 741
- miscellaneous, 702–703
- overview, 672–674
- record, 694–696
- report, 696–698
- script, 698–701
- stat, 691–694
- trace, 701–702
- coverage, 742
- documentation, 748
- example, 747
- one-liners, 745–747
- overview, 741–742
- applications, 172
- challenges, 5–6
- cloud computing, 14, 586
- CPUs, 251
- disks, 452
- file systems, 388
- memory, 326
- networks, 529
- OS virtualization, 620
- Performance application programming interface (PAPI), 158
- Performance engineers, 2–3
- applications, 182
- list of, 61
- case study]], 788–789
- challenges, 158
- CPUs, 237–239, 273–274
- documentation, 158
- example, 156–157
- interface, 157–158
- memory, 326
- Periods in OS virtualization, 615
- Persistent memory, 441
- Perspectives
- overview, 4–5
- performance analysis, 37–38
- workload analysis, 39–40
- Perturbations
- benchmarks, 648
- system test]]s, 23
- pfm-events, 681
- pids control group]], 610
- filters, 729–730
- process environment, 101
- CPUs, 245, 262
- description, 15
- disks, 464–465
- OS virtualization, 619
- pktgen tool, 567
- Platters in magnetic rotational disks, 435–436
- Plugins for monitoring software, 137
- pmap tool, 135, 337–338
- CPUs, 265–266
- memory, 348
- pmheld tool, 212–213
- poll system call, 177
- Polling applications, 177
- btrfs, 382
- overview, 382–383
- ZFS, 380
- Portability of benchmarks, 643
- Ports
- ephemeral, 531
- network, 501
- CPUs, 227
- Linux kernel, 116
- operating systems, 110
- schedulers, 241
- Solaris kernel, 114
- Pre[[fetch caches, 230
- Pre[[fetch for file systems
- overview, 364–365
- ZFS, 381
- CPUs, 257–258
- description, 119
- disks, 464
- memory, 323, 330–331
- applications, 173
- benchmarking for, 643
- CPUs, 227, 252–253
- OS virtualization resources, 613
- schedulers, 105–106
- scheduling classes, 242–243, 295
- Private clouds, 580
- bpftrace, 767–768, 774–775
- kprobes, 685–687
- perf, 685
- uprobes, 687–689
- USDT, 690–691
- wildcards, 768–769
- case study]], 16, 783–784
- determining, 44
- filters, 729–730
- process environment, 101
- Processes
- accounting, 159
- creating, 100
- defined, 90
- environment, 101–102
- life cycle, 100–101
- overview, 99–100
- profiling, 271–272
- schedulers, 105–106
- swapping, 104–105, 308–309
- tracing, 207–208
- USE method, 52
- virtual address space, 319–322
- binding, 181–182
- defined, 90, 220
- tuning, 299
- Products, monitoring, 79
- applications, 203–204
- BCC, 756
- CPUs, 245, 277–278
- profiling, 135
- Ftrace, 707
- I/O, 203–204, 210–212
- interpretation, 249–250
- kprobes, 722
- methodologies, 35
- observability tools, 135
- overview, 10–11
- perf, 675–676
- uprobes, 723
- compiled, 183–184
- garbage collection]], 185–186
- Interpreted, 184–185
- overview, 182–183
- virtual machines, 185
- Prometheus monitoring software, 138
- benchmarking for, 642
- testing, 3
- HTTP/3, 515
- IP, 509–510
- networks, 502, 504, 509–515
- QUIC, 515
- TCP, 510–514
- UDP, 514
- CPUs, 260–261
- memory, 335–336
- OS virtualization, 619
- Public clouds, 580
- PV (paravirtualization), 588, 590
Q
- Qspinlocks, 117–118
- Quantifying issues, 6
- Quantifying performance gains, 73–74
- networks, 521
- OS virtualization, 617
- tuning, 571
- interrupts, 98
- overview, 23–24
- TCP connections, 519–520
- QUIC protocol, 515
- Quotas in OS virtualization, 615
R
- Ramping load benchmarking, 662–664
- disks, 430–431, 436
- Raw tracepoints, 150
- description, 94
- tracing, 404–405
- Reaping memory, 316, 318
- example, 672
- options, 695
- overview, 694–695
- stack walking, 696
- RED method, 53
- Replay benchmarking, 654
- example, 672
- overview, 696–697
- STDIO, 697–698
- TUI interface, 697
- perf, 678–679
- sar, 163, 165
- Requests in workload analysis, 39
- CPUs, 253, 298
- disks, 456, 494
- hardware virtualization, 595–597
- memory, 328, 353–354
- networks, 532–533
- operating systems, 110–111
- OS virtualization, 613–617, 626–627
- tuning, 571
- USE method, 52
- Resource limits in capacity planning, 70–71
- Resources in USE method, 47
- Response time]]
- defined, 22
- disks, 452
- latency, 24
- latency, 528
- TCP, 510, 512, 529
- UDP, 514
- Retrospectives, 4
- kprobes, 721
- kretprobes, 152
- ukretprobes, 154
- uprobes, 723
- applications, 177
- networks, 522
- Roles, 2–3
- Rostedt, Steven, 705, 711, 734, 739–740
- Route tables, 537
- Router]]s, 516–517
- RT scheduling class, 242–243
- CPUs, 222
- defined, 220
- latency, 222
- schedulers, 105, 241
- runqlat tool
- CPUs, 279–280
- description, 756
- runqlen tool
- CPUs, 280–281
- description, 756
- CPUs, 285
- description, 756
S
- SACKs (selective acknowledgments), 510
- distributed tracing, 199
- page faults, 339–340
- PMCs, 157–158
- Sanity checks in benchmarking, 664–665
- sar (system activity reporter)
- configuration, 162
- coverage, 161
- CPUs, 260
- description, 15
- disks, 463–464
- documentation, 165–166
- file systems, 393–394
- memory, 331–333
- monitoring, 137, 161–165
- networks, 543–545
- options, 801–802
- OS virtualization, 619
- overview, 160
- reporting, 163
- applications, 193
- CPUs, 226–227, 245–246, 251, 795, 797
- defined, 22
- disk controllers, 451
- flame graphs, 291
- I/O, 798
- kernels, 798
- memory, 309, 324–326, 796–797
- methodologies, 34–35
- networks, 526–527, 796–797
- storage, 797
- USE method, 47–48, 51–53
- Scalability and scaling
- Amdahl’s Law of Scalability, 64–65
- capacity planning, 72–73
- cloud computing, 581–584
- CPU, 522–523
- disks, 457–458
- methodologies, 31–32
- models, 63–64
- multithreading, 227
- Scalability ceiling, 64
- Scatter plots
- disk I/O, 81–82
- sched command, 141
- schedstat tool, 141–142
- CPUs, 226, 272–273
- delay accounting, 145
- CPUs, 241–242
- defined, 220
- hardware virtualization, 596–597
- kernel, 105–106
- options, 295–296
- scheduling internals, 284–285
- CPUs, 115, 242–243
- I/O, 115, 493
- kernel, 106
- priority, 295
- Scientific method, 44–46
- flame graphs, 700
- overview, 698–700
- Scrubbing file systems, 376
- disks, 442
- SDT events, 681
- defined, 424
- size, 437
- zoning, 437
- SEDA (staged event-driven architecture]]), 178
- defined, 304
- OSI model, 502
- Selective acknowledgments (SACKs), 510
- Semaphores for applications, 179
- disks, 430–431, 436
- Server instances in cloud computing, 580
- defined, 22
- I/O, 427–429
- cloud computing, 582
- Shared memory]], 310
- Shares in OS virtualization, 614–615, 626
- shmsnoop tool, 348
- Short-stroking in magnetic rotational disks, 437
- Simple Network Management Protocol (SNMP), 55, 137
- Simulation benchmarking, 653–654
- Simultaneous multithreading (SMT), 220, 225
- Site reliability engineers (SREs), 4
- cloud computing, 583–584
- disk I/O, 432, 480–481
- free lists, 317
- instruction, 224
- packets, 504–505
- virtual memory, 308
- word, 229, 310
- Slab
- allocator, 114
- slabinfo tool, 142
- slabtop tool, 333–334, 394–395
- Sloth disks, 438
- disks, 442
- SMP (symmetric multiprocessing), 110
- SMs (streaming multiprocessors), 240
- SMT (simultaneous multithreading), 220, 225
- btrfs, 382
- ZFS, 381
- SNMP (Simple Network Management Protocol), 55, 137
- Sockets]]
- BSD, 113
- defined, 500
- description, 109
- local connections, 509
- options, 573
- statistics, 534–536
- tracing, 552–555
- tuning, 569
- soconnlat tool, 561
- memory, 315–322
- networks, 517–524
- case study]], 789–790
- observability source, 159
- perf, 680, 683–684
- USE method, 52, 798–799
- kernel, 114
- Kstat, 160
- Slab allocator, 322, 652
- zones, 606, 620
- overview, 439–441
- sormem tool, 561
- Source [[code for applications, 172
- Special file systems, 371
- applications, 179
- contention, 198
- queued, 118
- SREs (site reliability engineers), 4
- ss tool, 145–146, 525, 534–536
- overview, 439–441
- description, 102
- displaying, 204–205
- keys, 730–731
- Stack walking, 102, 696
- I/O, 107–108, 372
- missing, 215–216
- network, 109, 518–519
- operating system disk I/O, 446–449
- overview, 102
- protocol, 502
- reading, 102–103
- Staged event-driven architecture]] (SEDA), 178
- Starovoitov, Alexei, 121
- description, 635
- options, 692–693
- overview, 691–692
- TCP, 511–512
- overview, 11–12
- tracepoints, 146, 717
- applications methodology, 198–199
- CPUs, 252
- disks, 455–456
- file systems, 389
- memory, 327–328
- methodologies, 59–60
- networks, 531–532
- tools, 130–131
- Statistics, 8–9
- averages, 74–75
- baseline, 59
- case study]], 784–786
- multimodal distributions, 76–77
- outliers, 77
- quantifying performance gains, 73–74
- statm tool, 141
- statsnoop tool, 409
- cloud computing, 584–585
- sample processing, 248–249
- USE method, 49–51, 796–797
- str function, 770, 778
- bonnie++ tool, 660
- file system latency, 395
- limitations, 202
- networks, 561
- overhead, 207
- system call tracing, 205–207
- tracing, 136
- Streaming multiprocessors (SMs), 240
- Striped allocation in XFS, 380
- Stripes in RAID, 444–445
- strncmp function, 778
- Subjectivity, 5
- Surface plots, 84–85
- disks, 487
- memory, 331
- Swapping
- defined, 304
- memory, 316, 323
- overview, 305–307
- processes, 104–105, 308–309
- delay accounting, 145
- Symbol churn, 214
- Symbols, missing, 214
- Symmetric multiprocessing (SMP), 110
- Synchronization primitives for applications, 179
- Synchronous disk I/O, 434–435
- Synchronous interrupts, 97
- Synchronous writes, 366
- BCC, 756
- file systems, 409
- /sys file system, 143–144
- BCC, 756
- CPUs, 285
- file systems, 409
- system calls count, 208–209
- sysctl tool
- congestion control, 570
- schedulers, 296
- analysis, 192
- counting, 208–209
- defined, 90
- file system latency, 385
- kernel, 92, 94–95
- micro-benchmarking for, 61
- observability source, 159
- System design]], benchmarking for, 642
- /proc file system, 141–142
- profiling, 135
- tracing, 136
- ECN, 570
- networks, 567–572
- production example, 568
- activities, 3–4
- cloud computing, 14
- complexity, 5
- experiments, 13–14
- latency, 6–7
- methodologies, 15–16
- observability, 7–13
- performance challenges, 5–6
- perspectives, 4–5
- references, 19–20
- roles, 2–3
T
- Tasks
- defined, 90
- idle, 99
- tc tool, 566
- TC[[Malloc allocator, 322
- BSD, 113
- kernels, 109
- protocol, 502
- BPF for, 12
- description, 526
- overview, 558–559
- BCC, 756
- description, 525
- overview, 548
- BCC, 756
- overview, 549–550
- BCC, 756
- description, 526
- Tenancy in cloud computing, 580
- contention in hardware virtualization, 595
- contention in OS virtualization, 612–613
- Tensor processing units (TPUs), 241
- Text user interface (TUI), 697
- Linux kernel, 116
- memory, 353
- Linux, 195–197
- states, 194–195
- applications, 177–181
- CPUs, 227–229
- defined, 90
- flusher, 374
- hardware, 221
- idle, 99, 244
- interrupts, 97–98
- lightweight, 178
- micro-benchmarking, 653
- processes, 100
- schedulers, 105–106
- SMT, 225
- USE method, 52
- benchmarks, 661
- OS virtualization, 626
- packets, 522
- applications, 173
- defined, 22
- disks, 424
- magnetic rotational disks, 436–437
- networks, monitoring, 529
- solid-state drives, 441
- averages over, 74
- disk measurements, 427–429
- disks, 429–430
- methodologies, 25–26
- Time sharing for schedulers, 241
- Timerless multitasking, 117
- file systems, 371
- TCP, 511
- tiptop tool, 348
- TLBs. See Translation lookaside buffers (TLBs)
- tlbstat tool
- CPUs, 266–267
- memory, 348
- TLS (transport layer security), 113
- CPUs, 245
- disks, 450
- memory, 323–324
- networks, 525
- overview, 46
- Top-[[level directories, 107
- CPUs, 245, 261–262
- description, 15
- file systems, 393
- lightweight virtualization, 632–633
- memory, 324, 336–337
- OS virtualization, 619, 624
- TPC-A benchmark, 650–651
- TPUs (tensor processing units), 241
- documentation, 740
- one-liners, 736–737
- overview, 734
- contents, 709–711
- overview, 708–709
- tracepoint probes, 774
- description, 11
- documentation, 150–151
- example, 147–148
- filters, 717–718
- interface, 149–150
- Linux kernel, 116
- overhead, 150
- overview, 146
- triggers, 718
- tracepoints tracer, 707
- traceroute tool, 563–564
- BPF, 12–13
- case study]], 790–792
- distributed, 199
- locks, 212–213
- observability tools, 136
- OS virtualization, 620, 624–625, 629
- perf, 676–678
- schedulers, 189–190
- sockets]], 552–555
- software, 275–276
- static instrumentation, 11–12
- strace, 136, 205–207
- tools, 166
- virtual file system, 405–406
- Trade-off]]s in methodologies, 26–27
- Translation lookaside buffers (TLBs)
- CPUs, 232
- flushing, 121
- memory, 314–315
- MMU, 235
- analysis, 531
- anti-bufferbloat, 117
- autocorking, 117
- buffers, 520, 569
- congestion algorithms, 115
- congestion avoidance, 508
- congestion control, 118, 513, 570
- connection latency, 24, 506, 528
- connection queues, 519–520
- connection rate, 527–529
- duplicate ACK detection, 512
- features, 510–511
- friends, 509
- New Vegas, 118
- SACK, FACK, and RACK, 514
- transfer time, 24–25
- Linux kernel, 116
- memory, 353
- Transport layer security (TLS), 113
- Traps
- defined, 90
- synchronous interrupts, 97
- hist. See Hist triggers
- kprobes, 721–722
- tracepoints, 718
- uprobes, 723
- Troubleshooting, benchmarking for, 642
- TUI (text user interface), 697
- disks, 494
- memory, 350–351
- networks, 567
- operating systems, 493–495
- point-in-[[time recommendations, 29–30
- tradeoffs with, 27
- benchmarking for, 642
- caches, 60
- disks, 493–495
- file system caches, 389
- file systems, 414–419
- memory, 350–354
- methodologies, 27–28
- networks, 567–574
- targets, 27–28
- Type 1 hypervisors, 587
- Type 2 hypervisors, 587
U
- sar configuration, 162
- UMASK values in MSRs, 238–239
- UNICS (UNiplexed Information and Computing Service), 112
- Unified buffer caches, 374
- Unikernels, 92, 123, 634
- UNiplexed Information and Computing Service (UNICS), 112
- Unix kernels, 112
- uprobes, 687–688
- arguments, 154, 688–689, 723
- bpftrace, 774
- documentation, 155
- example, 154
- filters, 723
- Ftrace, 708
- Linux kernel, 117
- overview, 153
- profiling, 723
- return values, 723
- triggers, 723
- case study]], 784–785
- CPUs, 245
- description, 15
- OS virtualization, 619
- PSI, 257–258
- uretprobes, 154
- perf, 681
- probes, 690–691
- User land, 90
- perf, 681
- probes, 690–691
- User space, defined, 90
- usym function, 779
- applications, 173, 193
- CPUs, 226, 245–246, 251, 795, 797
- defined, 22
- disk controllers, 451
- disks, 433, 452
- I/O, 798
- kernels, 798
- memory, 309, 324–326, 796–797
- methodologies, 33–34
- networks, 508–509, 526–527, 796–797
- storage, 796–797
- USE method, 47–48, 51–53
- applications, 193
- benchmarking, 661
- CPUs, 245–246
- disks, 450–451
- memory, 324–325
- metrics, 48–51
- microservices, 53
- networks, 526–527
- overview, 47
- procedure, 47–48
- references, 799
V
- valgrind tool
- memory, 348
- benchmarks, 647
- description, 75
- vCPUs (virtual CPUs), 595
- applications, 172
- kernel, 111–112
- cloud computing, 581
- vfsstat tool, 409
- Vibration in magnetic rotational disks, 438
- Virtual CPUs (vCPUs), 595
- defined, 424
- utilization, 433
- description, 107
- interface, 373
- latency, 406–408
- Solaris kernel, 114
- tracing, 405–406
- cloud computing, 580
- hardware virtualization, 587–605
- Virtual machines (VMs)
- cloud computing, 580
- hardware virtualization, 587–605
- programming languages, 185
- defined, 90, 304
- managing, 104–105
- overview, 305
- size, 308
- OS. See OS virtualization
- Visualizations, 79
- CPUs, 288–293
- disks, 487–490
- file systems, 410–411
- flame graphs. See Flame graphs
- scatter plots, 81–82
- surface plots, 84–85
- tools, 85
- cloud computing, 580
- hardware virtualization, 587–588
- VMs (virtual machines)
- cloud computing, 580
- hardware virtualization, 587–588
- programming languages, 185
- CPUs, 245, 258
- description, 15
- disks, 487
- file systems, 393
- memory, 323, 329–330
- OS virtualization, 619
- VMware ESX, 589
- file systems, 382–383
W
- disks, 434
- I/O, 427
- wakeup tracer, 708
- wakeup_rt tracer, 708
- Warm caches, 37
- Warmth of caches, 37
- Wear leveling in solid-state drives, 441
- Whetstone benchmark, 254, 653
- flame graphs, 290–291
- instruction, 224
- DiskMon, 493
- fibers, 178
- Hyper-V, 589
- LTO and PGO, 122
- ProcMon, 207
- TIME_WAIT, 512
- CPUs, 229
- memory, 310
- benchmarking, 664
- memory, 310, 328, 342–343
- micro-benchmarking, 390–391, 653
- Workload analysis perspectives, 4–5, 39–40
- benchmarking, 662
- CPUs, 246–247
- disks, 452–454
- file systems, 386–388
- methodologies, 54
- networks, 527–528
- Workload separation in file systems, 389
- Write amplification in solid-state drives, 440
- file systems, 365
- on-disk, 425
- virtual disks, 433
- write system calls, 94
- Write-through caches, 425
- wss tool, 342–343
- benchmarking, 664
- memory, 310, 328, 342–343
- micro-benchmarking, 390–391, 653
X
- description, 118
- event sources, 558
- kernel bypass, 523
- CPU usage, 595
- description, 589
- I/O path, 594
- network performance, 597
- observability, 599
- XFS file system, 379–380
- BCC, 756
- file systems, 399
Y
- Yearly patterns, monitoring, 79
Z
- zero function, 780
- features, 380–381
- options, 418–419
- pool statistics, 410
- Solaris kernel, 114
- BCC, 757
- file systems, 399
- zfsslower tool, 757
- ZIO pipeline in ZFS, 381
- zoneinfo tool, 142
- Zones
- free lists, 317
- magnetic rotational disks, 437
- OS virtualization, 606, 620
- Solaris kernel, 114
- zpool tool, 410
Fair Use Sources
Performance: Systems performance, Systems performance bibliography, Systems Performance Outline: (Systems Performance Introduction, Systems Performance Methodologies, Systems Performance Operating Systems, Systems Performance Observability Tools, Systems Performance Applications, Systems Performance CPUs, Systems Performance Memory, Systems Performance File Systems, Systems Performance Disks, Systems Performance Network, Systems Performance Cloud Computing, Systems Performance Benchmarking, Systems Performance perf, Systems Performance Ftrace, Systems Performance BPF, Systems Performance Case Study), Accuracy, Algorithmic efficiency (Big O notation), Algorithm performance, Amdahl's Law, Android performance, Application performance engineering, Async programming, Bandwidth, Bandwidth utilization, bcc, Benchmark (SPECint and SPECfp), BPF, bpftrace, Performance bottleneck (“Hotspots”), Browser performance, C performance, C Plus Plus performance | C++ performance, C Sharp performance | performance, Cache hit, Cache performance, Capacity planning, Channel capacity, Clock rate, Clojure performance, Compiler performance (Just-in-time (JIT) compilation - Ahead-of-time compilation (AOT), Compile-time, Optimizing compiler), Compression ratio, Computer performance, Concurrency, Concurrent programming, Concurrent testing, Container performance, CPU cache, CPU cooling, CPU cycle, CPU overclocking (CPU boosting, CPU multiplier), CPU performance, CPU speed, CPU throttling (Dynamic frequency scaling - Dynamic voltage scaling - Automatic underclocking), CPU time, CPU load - CPU usage - CPU utilization, Cycles per second (Hz), CUDA (Nvidia), Data transmission time, Database performance (ACID-CAP theorem, Database sharding, Cassandra performance, Kafka performance, IBM Db2 performance, MongoDB performance, MySQL performance, Oracle Database performance, PostgreSQL performance, Spark performance, SQL Server performance), Disk I/O, Disk latency, Disk performance, Disk speed, Disk usage - Disk utilization, Distributed computing performance (Fallacies of distributed computing), DNS performance, Efficiency - Relative efficiency, Encryption performance, Energy efficiency, Environmental impact, Fast, Filesystem performance, Fortran performance, FPGA, Gbps, Global Interpreter Lock - GIL, Golang performance, GPU - GPGPU, GPU performance, Hardware performance, Hardware performance testing, Hardware stress test, Haskell performance, High availability (HA), Hit ratio, IOPS - I/O operations per second, IPC - Instructions per cycle, IPS - Instructions per second, Java performance (Java data structure performance - Java ArrayList is ALWAYS faster than LinkedList, Apache JMeter), JavaScript performance (V8 JavaScript engine performance, Node.js performance - Deno performance), JVM performance (GraalVM, HotSpot), Kubernetes performance, Kotlin performance, Lag (video games) (Frame rate - Frames per second (FPS)), Lagometer, Latency, Lazy evaluation, Linux performance, Load balancing, Load testing, Logging, macOS performance, Mainframe performance, Mbps, Memory footprint, Memory speed, Memory performance, Memory usage - Memory utilization, Micro-benchmark, Microsecond, Monitoring
Linux/UNIX commands for assessing system performance include:
- uptime the system reliability and load average
- Top (Unix) | top for an overall system view
- Vmstat (Unix) | vmstat vmstat reports information about runnable or blocked processes, memory, paging, block I/O, traps, and CPU.
- Htop (Unix) | htop interactive process viewer
- dstat, atop helps correlate all existing resource data for processes, memory, paging, block I/O, traps, and CPU activity.
- iftop interactive network traffic viewer per interface
- nethogs interactive network traffic viewer per process
- iotop interactive I/O viewer
- Iostat (Unix) | iostat for storage I/O statistics
- Netstat (Unix) | netstat for network statistics
- mpstat for CPU statistics
- tload load average graph for terminal
- xload load average graph for X
- /proc/loadavg text file containing load average
(Event monitoring - Event log analysis, Google Cloud's operations suite (formerly Stackdriver), htop, mpstat, macOS Activity Monitor, Nagios Core, Network monitoring, netstat-iproute2, proc filesystem (procfs)]] - ps (Unix), System monitor, sar (Unix) - systat (BSD), top - top (table of processes), vmstat), Moore’s law, Multicore - Multi-core processor, Multiprocessor, Multithreading, mutex, Network capacity, Network congestion, Network I/O, Network latency (Network delay, End-to-end delay, packet loss, ping - ping (networking utility) (Packet InterNet Groper) - traceroute - netsniff-ng, Round-trip delay (RTD) - Round-trip time (RTT)), Network performance, Network switch performance, Network usage - Network utilization, NIC performance, NVMe, NVMe performance, Observability, Operating system performance, Optimization (Donald Knuth: “Premature optimization is the root of all evil), Parallel processing, Parallel programming (Embarrassingly parallel), Perceived performance, Performance analysis (Profiling), Performance design, Performance engineer, Performance equation, Performance evaluation, Performance gains, Performance Mantras, Performance measurement (Quantifying performance, Performance metrics), Perfmon, Performance testing, Performance tuning, PowerShell performance, Power consumption - Performance per watt, Processing power, Processing speed, Productivity, Python performance (CPython performance, PyPy performance - PyPy JIT), Quality of service (QOS) performance, Refactoring, Reliability, Response time, Resource usage - Resource utilization, Router performance (Processing delay - Queuing delay), Ruby performance, Rust performance, Scala performance, Scalability, Scalability test, Server performance, Size and weight, Slow, Software performance, Software performance testing, Speed, Stress testing, SSD, SSD performance, Swift performance, Supercomputing, Tbps, Throughput, Time (Time units, Nanosecond, Millisecond, Frequency (rate), Startup time delay - Warm-up time, Execution time), TPU - Tensor processing unit, Tracing, Transistor count, TypeScript performance, Virtual memory performance (Thrashing), Volume testing, WebAssembly, Web framework performance, Web performance, Windows performance (Windows Performance Monitor). (navbar_performance)
Cloud Monk is Retired ( for now). Buddha with you. © 2025 and Beginningless Time - Present Moment - Three Times: The Buddhas or Fair Use. Disclaimers
SYI LU SENG E MU CHYWE YE. NAN. WEI LA YE. WEI LA YE. SA WA HE.