File Caching and Container Memory: What Docker Stats Isn't Telling You
Table of Contents
Tl;dr:
Docker’s memory stats can trick you, especially with databases. In this post I explore why file caching creates misleading memory graphs and explore alternative monitoring metrics for container performance.
When monitoring containerized applications, particularly databases, relying solely on <code>docker stats</code> for understanding memory usage can be misleading.
This post explores the nuances of memory management in containerized environments,
focusing on how file caching impacts memory metrics and why default monitoring formulas (docker stats
for example)
might need adjustment in the context of databases.
The Reality Behind Docker Stats
Docker Stats calculates memory usage by excluding cache (specifically inactive_file
in cgroup v2) to prevent misinterpretation.
While this approach seems logical, it can create confusion when monitoring memory-intensive applications
like databases that heavily utilize file caching.
Here’s how Docker Stats calculates memory usage:
func calculateMemUsageUnixNoCache(mem container.MemoryStats) float64 {
// cgroup v1 calculation is skipped for brevity,
// below is only for cgroup v2
if v := mem.Stats["inactive_file"]; v < mem.Usage {
return float64(mem.Usage - v)
}
return float64(mem.Usage)
}
The Database Dilemma
When running databases in containers, the standard memory monitoring approach can be misleading.
This is mainly due to file-caching filling up the memory, which is not accounted for in Docker Stats.
Below is how PostgreSQL is using OS cache, when the pages requested are not in shared buffers.
A Real-World Example
In this repo, you will find a simple example of how memory usage can be misleading when monitoring database containers.
To reproduce the scenario, I run the following:
- Start a PostgreSQL container with a memory limit of 1.2GB
- Run a simple I/O-intensive workload with BenchBase, specifically the
resourcestresser
benchmark - Added Grafana, Prometheus and cAdvisor exporter to monitor the container’s memory usage
If you want to reproduce the scenario, you can run the running commands in the repo’s README.
Now let’s explore the results in the Grafana.
Misleading Memory Usage
To plot container’s memory, I am replicating the numbers docker stats
is reporting, by using following PromQL:
100 * (container_memory_working_set_bytes{name=~".+"})
/ container_spec_memory_limit_bytes{name=~".+"}
The memory usage graph shows the following pattern:
- Memory usage starts at ~50%
- Grows steadily toward 95%
- Instead of OOM, memory is reclaimed from file caching
- Usage drops back to 50%
A valid question arises:
Why is memory reclaimed from file caching all at once instead of incrementally and does it affect the database performance?
For this I will need to dig deeper and perform some more tests.
Better Monitoring Approach
After investigating with below CLI from facebook,
an amazing alternative to atop
, with fantastic support for cgroup v2, it became apparent what was happening.
Notice the active_file
column, and how rapidly it is reclaimed, resulting in a big drop in metrics.
Rather than using Docker Stats’ default calculation:
100*(container_memory_working_set_bytes{name=~".+"})
/container_spec_memory_limit_bytes{name=~".+"}
if we focus on RSS (Resident Set Size):
container_memory_rss{name=~".+"} /
container_spec_memory_limit_bytes{name=~".+"} * 100
the result will be the following.
This metric shows a more stable memory consumption pattern, especially for database containers, without the misleading spikes that might be incorrectly interpreted as memory leaks.
Conclusion
Don’t let Docker Stats fool you - memory management in containerized environments is more nuanced than it appears.
When monitoring database containers, focus on RSS metrics rather than raw memory usage to get a clearer picture of your application’s actual memory health.
Remember: not all memory usage is created equal, and sometimes what looks like a problem is just efficient system resource utilization at work.
Resources:
- Repository with example code
- cgroup v2 and Page cache
- BenchBase
- PostgreSQL mailing list discussion
- cAdvisor exporter memory calculation
comments powered by Disqus