๐Ÿ”ฌProfiling with pprofLESSON

Profiling with pprof

The profiling workflow

Never optimise without data. The Go toolchain ships with pprof, a profiling tool that identifies exactly which functions consume the most CPU time or allocate the most memory.

The workflow is always:

  1. Collect a profile (CPU or memory)
  2. Analyse with go tool pprof
  3. Identify the hottest functions
  4. Optimise and re-measure

CPU profiling via go test

The easiest way to collect a CPU profile is through the test runner:

Inside pprof:

Memory profiling

Key pprof commands for memory:

  • top โ€” functions by allocated bytes
  • list funcName โ€” show allocations per line
  • -alloc_objects vs -inuse_objects โ€” total allocated vs currently live

net/http/pprof โ€” always-on profiling in production

Import the pprof HTTP handler as a side effect in your main.go:

Then expose a debug server (on a separate port, internal-only):

Now you can collect live profiles from a running production server:

Security: Never expose :6060 publicly. Bind to localhost or protect with a firewall rule.

Reading a flame graph

A flame graph shows call stacks as nested rectangles:

  • Width = proportion of CPU time (or allocations)
  • The widest frames at the top are the hottest call sites
  • Click a frame to zoom in

runtime.MemStats โ€” inline memory inspection

For quick in-process checks without pprof:

  • Alloc โ€” bytes currently allocated on the heap
  • TotalAlloc โ€” bytes allocated over the entire lifetime (monotonically increasing)
  • NumGC โ€” number of GC cycles completed

Common findings and fixes

FindingFix
strings.Builder missingStop using += in a loop
json.Marshal hotCache marshalled bytes or use a streaming encoder
sync.Mutex contentionUse sync.RWMutex, sharding, or sync/atomic
Excessive GC (high NumGC)Use sync.Pool, reduce allocations in hot paths
fmt.Sprintf in hot pathSwitch to strconv or pre-allocated buffers

Knowledge Check

What is the correct order of steps in the Go profiling workflow?

Why should the net/http/pprof debug server only bind to localhost?

What does runtime.MemStats.TotalAlloc represent?