Intel® Fortran Compiler 17.0 Developer Guide and Reference
A lightweight profiling mechanism is available that can be used to achieve many of the benefits of instrumentation based profiling, but without the overhead of inserting instrumentation into the application binary. This mode of operation can be beneficial in cases where increase in code/data size or changes in run time due to instrumentation may make regular Performance-Guided Optimization (PGO) infeasible. This approach requires the use of Intel® VTune™ Amplifier to collect information from the hardware counters. The information is collected with minimal overhead, and combined with debug information produced by the compiler to identify the primary code path for optimizations.
Follow these steps to use this method:
Phase 1: Compile the application with the option prof-gen-sampling.
This option will instruct the compiler to generate additional debug information for the application, which is used to map the information collect by the hardware counters to specific source code. However, use of the option does not affect the generated instruction sequence in the way instrumented PGO would. Optimizations may be enabled during this build, however it is recommended to disable function inlining during this build.
Phase 2: Run the generated executable on one or more representative workloads with the Intel VTune Amplifier tool:
<installation-root>/bin64/amplxe-pgo-report.sh <your application and command line>
Additional information regarding options for data collection can be found in the Intel VTune Amplifier documentation. This step will generate files of the form rNNNpgo_icc.pgo (where NNN is a 3 digit number) which will be used as input to the following phases.
Phase 3: (optional) Merge the report files produced during phase 2.
The tool profmergesampling can be used to produce an indexed file of results that will speed up the processing of the data during the next phase.
profmergesampling -file <input-file[:input_file]*> -out <output_name>
Phase 4: Compile the application with the option prof-use-sampling:input-file[:input_file]*
In phase 4, one or more result files produced during phase 2 (or an indexed file from phase 3) can be fed into the compiler to direct the optimizations.