As you may be aware, for and unknown reason SamplingProfiler doesn’t work under Windows 7, the technical details as to what doesn’t work under Windows 7 are in this stackoverflow question:
GetThreadContext fails after a successful SuspendThread in Windows 7
DWScript includes a debugging facility, in the form of the IDebugger interface. The TdwsSimpleDebugger component implements that interface and can be used to simply surface the events.
I recently bumped on a post by François on FieldByName performance, and was bit surprised by the magnitude of speedups reported by Marco Cantu in a comment:
A new version has been released which adds support for the new Delphi XE paths, you can download it here.
Note that there is still a pseudo-random issue of unknown origin under Windows 7, where the utility may or may not be able to gather profiling information. If that happens, close the application and launch it again. The issue doesn’t happen with Windows OSes before 7.
Passing parameters as “const” is a classic Delphi optimization trick, but the mechanisms behind that “trick” go beyond cargo-cult recipes, and may actually stumble into the “good practice” territory.
SamplingProfiler v1.7.4 is now available. This version adds an option for Delphi 2010 paths, and fixes a bug with the silent mode execution that would render it inoperative. There also have been other minor changes, mostly cosmetic.
This release also includes preparation for an “attach to process” option, which is currently not enabled, but should hopefully make in the next version (available “when ready”).
SamplingProfiler has a few options to help profile a multi-threaded application which I’ll go over here.
In the current version, those options allow identifying CPU-related bottlenecks, as in “threads taking too much CPU resources or execution time”. However, they do not provide much clues yet to pinpoint bottlenecks arising from thread synchronization issues or serialization (insufficient parallelism). Hopefully, more support for profiling multi-threaded applications will come in future versions.
SamplingProfiler v1.7.3 has now been released and should be used in place of 1.7.2 which was pulled.
1.7.2 had a nasty bug in the timings statistics (promptly spotted by Robert Houdart) which should be fixed in 1.7.3, there are no other changes and additions in this version.
SamplingProfiler v1.7.2 has now been released.
This version includes the following changes:
- added an option to display line numbers in the source preview
- extended the process CPU affinity options to allow individually selecting up to 16 cores
The UI has been slightly rearrange to accomodate the CPU affinity options (I guess I’ll need to find something prettier for those upcoming 256 core CPUs…). There may be other indirect minor changes.
Code optimization can sometimes be experienced as a lengthy process, with disruptive effects on code readability and maintainability. For effective optimization, it is crucial to focus efforts on areas where minimal work and minimal changes will have to most impact, ie. go for the jugular