The value of Endurance Testing
The best way to gauge the health of a system is to run a large series of suspend/resumes over an extended period and analyze the data for flaws. This can be accomplished with sleepgraph's -multi argument. It will run a series of suspend/resume events while capturing timelines for each and generating a high level summary at the end for easy perusal of the data. You can include any other options you like to gather as much information as necessary.
Relevant Sleepgraph Arguments
-multi n d
Execute n consecutive tests at d seconds intervals. The outputs will be created in a new subdirectory: suspend-x{count}-{date}-{time}. When the multitest run is done, the -summary command is called automatically to create summary html files for all the data (unless you use -skiphtml). -skiphtml will speed up the testing by not creating timelines or summary html files, but you will need to run the tool again at a later time with -summary and -genhtml.
-m mode
Mode to initiate for suspend e.g. mem, freeze, standby (default: mem).
-rtcwake t
Use rtcwake to autoresume after t seconds (default: 15). Make this as small as possible to minimize total testing time.
-gzip (optional)
Gzip the ftrace and dmesg logs to save space. This reduces the multitest folder size.
-dev (optional)
Add kernel source calls and threads to the timeline (default: disabled). This provides a significant amount of trace data with minimal impact to perfrormance.
-skiphtml (optional)
Run the multitest and capture the ftrace & dmesg logs, but skip the timeline and summary html generation. This can greatly speed up overall testing. You can then copy the data to a faster host machine and run -summary with -genhtml to generate the timelines and summary.
These are the relevant commands to use after running -multi with -skiphtml. They finish the timeline and summary html generation.
-summary indir (optional post processing after using -skiphtml)
Generate or regenerate the summary for a -multi test run. Creates three files: summary.html, summary-issues.html, and summary-devices.html in the current folder. summary.html is a table of tests with relevant info sorted by kernel/host/mode, and links to the test html files. summary-issues.html is a list of kernel issues found in dmesg from all the tests. summary-devices.html is a list of devices and times from all the tests.
-genhtml (optional post processing after using -skiphtml)
Used with -summary to regenerate any missing html timelines from their dmesg and ftrace logs. This will require a significant amount of time if there are thousands of tests, so if possible the data should be moved to a fast system for post processing.
Usage
Initiate a multtest with 2000 iterations for S3 suspend with no delay between tests. Include dev mode trace data, and reduce output data size with gzip.
%> sudo sleepgraph.py -m mem -rtcwake 10 -dev -gzip -multi 2000 0
Or you can skip timeline generation in order to speed things up
%> sudo ./sleepgraph.py -m mem -rtcwake 10 -dev -gzip -multi 2000 0 -skiphtml
The tool will produce an output folder with all the test subfolders inside. Each test subfolder contains the dmesg/ftrace logs and/or the html timeline depending on whether you used the -skiphtml option. The root folder contains the summary.html files. If you've used -skiphtml you can generate the html timelines and summaries later via this command:
%> cd suspend-x2000-{date}-{time} %> sleepgraph.py -summary . -genhtml -dev
On completion, the output folder contains a series of folders for the individual test data and a set of summary pages in the root. The summary.html file is a tabular list of the tests with relevant info and links. The summary-issues.html and summary-devices.html files include data taken from all tests on kernel issues and device performance. The folder looks like this:
- suspend-x{count}-{date}-{time}:
- summary.html
- summary-issues.html
- summary-devices.html
- suspend-{date}-{time}
- host_mode.html
- host_mode_dmesg.txt.gz
- host_mode_ftrace.txt.gz
- suspend-{date}-{time}
- ...
Results
Summary.html
This file includes a tabular list of all the tests run. Each row includes the most important information on the tests, along with links to the actual timeline. The minimum, median, and maximum suspend & resume tests are shown highlighted in color along with links to the entry.
Summary-issues.html
This file includes a tabular list of all kernel issues found in the dmesg logs of the tests. Sleepgraph greps for any instance of "warning", "error", "fault", "bug", as well as several other known issue strings. Each row shows the issue found along with a frequency of occurence and a link to the first timeline where the issue occurred.
Summary-devices.html
This file includes a tabular list of all device callback times from the timeline data. Each device found has both a suspend and resume time which are calculated as the total callback times for the given phase. i.e. total suspend time for a device is the total of the suspend_prepare, suspend, suspend_late, suspend_noirq, and suspend_machine callback times (assuming any or all exist). Each row shows the count of how many timelines the device showed up in, its average and worst time, and a link to the worst timeline (where the device took the longest).
Project: pm-graph