Intel® Fortran Compiler 17.0 Developer Guide and Reference
This topic only applies when targeting Intel® Many Integrated Core Architecture (Intel® MIC Architecture).
You can measure the amount of time it takes to execute an offload region of code, as well as the amount of data transferred during the execution of the offload region.
You can print an offload report, which contains information about an offload as the execution proceeds on the host and on the target. An offload report includes the following information:
the amount of time it takes to execute an offload region of code
the amount of data transferred between the host and the target
additional details, including device initialization, and individual variable transfers
The following mechanisms enable and disable offload reporting:
the OFFLOAD_REPORT environment variable.
the _Offload_report API.
A compiler offload report line starts with [Offload] to clearly mark prints from compiler offloads, as opposed to other offloads, such as those from the Intel® Math Kernel Library.
Activities on the host are marked with [HOST], while activities on the target are marked with [MIC n] where n is the logical number of the coprocessor to which the offload is sent. The top of the report shows the mapping of logical devices to physical devices. (Note that
An offloaded program can use a subset of physical devices when you specify the OFFLOAD_DEVICES environment variable.
Because multiple offloads may be in progress concurrently, either when multiple host threads initiate offloads or when asynchronous offloads are used, it is necessary to tag all the output associated with a specific offload directive. Otherwise the reports from several concurrent offloads would be interleaved, making it impossible to determine to which offload a particular line of output belongs. A tag of the form [Tag n] uniquely identify lines in the offload report that belong to a particular offload.
For each offload, the first two report lines are the source file name and the line number of the offload directive. After that, a line that assigns a Tag to that offload is printed. Subsequent report lines printed for that offload each use the tag [Tag n] to associate that line with that offload.
The rest of the report contains a line for each major activity. These lines contain an annotation of the activity after the tag that identifies which offload the activity belongs to. The annotations are as follows:
Line Marker | Descrption |
---|---|
[State] | Activity being performed as part of the offload. |
[Var] | The name of a variable transferred and the direction(s) of transfer. |
[CPU Time] | The total time measured for that offload directive on the host. |
[MIC Time] | The total time measured for executing the offload on the target. This excludes the data transfer time between the host and the target, and counts only the execution time on the target. |
[CPU->MIC Data] | The number of bytes of data transferred from the host to the target. |
[MIC->CPU Data] | The number of bytes of data transferred from the target to the host. |
The various activities printed after [State] describe the internal operation of the Offload Library and are helpful in diagnosing the point at which a runtime failure may occur. In most cases the description is self-explanatory.
For this example, example.F90, the offload report output is explained below.
1 integer function Hysum(abc,efg,siz)
2 use mic_lib
3
4 integer, dimension(:) :: abc
5 integer, dimension(:) :: efg
6 integer :: siz
7
8 integer :: sumT
9 integer :: k
10
11 !DIR$ OFFLOAD BEGIN target(mic:0) &
12 IN(abc : length(siz )) &
13 OUT(efg : length(siz/2)) &
14 nocopy(k)
15
16 if (OFFLOAD_GET_DEVICE_NUMBER() > -1) then
17 print "(A,I0)","On device : ",OFFLOAD_GET_DEVICE_NUMBER()
18 endif
19
20 sumT = 0
21 do k=1,(siz/2)
22 efg(k) = abc(k) + abc(k + (siz/2))
23 sumT = sumT + efg(k)
24 enddo
25 !DIR$ END OFFLOAD
26
27 Hysum = sumT
28
29 return
30 end function Hysum
31
32 program example
33
34 integer, allocatable, dimension(:) :: tuv
35 integer, allocatable, dimension(:) :: xyz
36
37 integer :: j = 10
38 integer :: i = 0
39 integer :: n
40
41 interface
42 integer function Hysum(abc,efg,siz)
43 integer, dimension(:) :: abc
44 integer, dimension(:) :: efg
45 integer :: siz
46 end function Hysum
47 end interface
48
49 allocate( tuv(j) )
50 allocate( xyz(j/2) )
51
52 do n = 1, j
53 tuv(n) = n - 1
54 enddo
55 xyz = 0
56
57 i = Hysum(tuv,xyz,j)
58
59 do n = 1, (j/2)
60 print "(3X,2(A,I0),$)","xyz(",n,")=",xyz(n)
61 enddo
62 print "(/,3X,A,I2)","sum total=",i
63
64 end program example
The compiler option [Q]opt-report-phase with the offload keyword provides summary information about data transfers between the host and the target. There are two reports for each offload code section defined in the source code. The first report beginning with Offload to target MIC is from the host compilation. The second report beginning with Outlined offload region is from the target compilation. The information in this report option reflects similar information in the output when the OFFLOAD_REPORT environment variable is set to 3.
$ ifort example.F90 -o exampleF_exe -opt-report-phase=offload example.F90(11-11):OFFLOAD:hysum_: Offload to target MIC <expr> hysum_$SUMT_V$31, default of INOUT changed to OUT hysum_$SIZ_V$2f, default of INOUT changed to IN hysum_$SUMT_V$31, default of INOUT changed to OUT hysum_$SIZ_V$2f, default of INOUT changed to IN Data sent from host to target hysum_$ABC_V$2d, pointer to dope-vector with element count (<expr>) elements hysum_$SIZ_V$2f, pointer to (<expr>) elements Data received by host from target hysum_$EFG_V$2e, pointer to dope-vector with element count (<expr>) elements hysum_$SUMT_V$31, scalar size 4 bytes example.F90(11-11):OFFLOAD:hysum_: Outlined offload region hysum_$SUMT_V$31, default of INOUT changed to OUT hysum_$SIZ_V$2f, default of INOUT changed to IN hysum_$SUMT_V$31, default of INOUT changed to OUT hysum_$SIZ_V$2f, default of INOUT changed to IN Data received by target from host hysum_$ABC_V$2d, pointer to dope-vector with element count (<expr>) elements hysum_$SIZ_V$2f, pointer to (<expr>) elements Data sent from target to host hysum_$EFG_V$2e, pointer to dope-vector with element count (<expr>) elements hysum_$SUMT_V$31, scalar size 4 bytes
The rest of this example explains the offload report for the source program shown above.
The host and the target execute independently, so once an offload has been initiated the sequence between host prints and target prints is unpredictable, and can vary from run to run. However, all the host prints will be in the same sequence, as well as all the target prints.
Here, the target device is initialized, and the report shows the mapping between logical and physical devices:
[Offload] [HOST] [State] Initialize logical card 0 = physical card 0 [Offload] [HOST] [State] Initialize logical card 1 = physical card 1
Offload code in example.F90 at line number 11 has started executing on the target.
[Offload] [MIC 0] [File] example.F90 [Offload] [MIC 0] [Line] 11
Tag value Tag0 is assigned to this offload to enable identifying reports printed for this offload.
[Offload] [MIC 0] [Tag] Tag0
The offload is initiated on the host.
[Offload] [HOST] [Tag 0] [State] Start Offload
The target function corresponding to this offload.
[Offload] [HOST] [Tag 0] [State] Initialize function __offload_entry_example_F90_11hysum_
Create data transfer buffers for pointer data:
Four buffers for abc: 2 for dope vector and 2 for memory.
Four buffers for efg: 2 for dope vector and 2 for memory.
Two buffers for sumt: This variable was added to the OUT clause by default. The variable sumt is used in the offload region.
[Offload] [HOST] [Tag 0] [State] Create buffer from Host memory [Offload] [HOST] [Tag 0] [State] Create buffer from MIC memory [Offload] [HOST] [Tag 0] [State] Create buffer from Host memory [Offload] [HOST] [Tag 0] [State] Create buffer from MIC memory [Offload] [HOST] [Tag 0] [State] Create buffer from Host memory [Offload] [HOST] [Tag 0] [State] Create buffer from MIC memory [Offload] [HOST] [Tag 0] [State] Create buffer from Host memory [Offload] [HOST] [Tag 0] [State] Create buffer from MIC memory [Offload] [HOST] [Tag 0] [State] Create buffer from Host memory [Offload] [HOST] [Tag 0] [State] Create buffer from MIC memory
Pointer data sent from the host to the target using DMA, for the two dope vectors, and the variables abc and siz.
[Offload] [HOST] [Tag 0] [State] Send pointer data [Offload] [HOST] [Tag 0] [State] CPU->MIC pointer data 188
Non-pointer data is collected together and sent from host to target.
[Offload] [HOST] [Tag 0] [State] Gather copyin data [Offload] [HOST] [Tag 0] [State] CPU->MIC copyin data 0
The offloaded code (line 11) on the target is started.
[Offload] [HOST] [Tag 0] [State] Compute task on MIC
The offload has completed. Start receiving data from target.
Pointer data from the variable efg received from the target using DMA.
[Offload] [HOST] [Tag 0] [State] Receive pointer data [Offload] [HOST] [Tag 0] [State] MIC->CPU pointer data 20
This is output from the target. The offload code is invoked on the target.
[Offload] [MIC 0] [Tag 0] [State] Start target function __offload_entry_example_F90_11hysum_
The variable abc is an IN for this offload. The two entries are a result of a dope vector for an adjustable array abc and memory for abc.
[Offload] [MIC 0] [Tag 0] [Var] hysum_$ABC_V$2d IN [Offload] [MIC 0] [Tag 0] [Var] hysum_$ABC_V$2d IN
The variable efg is an OUT for this offload. It is a dope vector for an adjustable array efg that must be transferred to the target, hence the two entries.
[Offload] [MIC 0] [Tag 0] [Var] hysum_$EFG_V$2e IN [Offload] [MIC 0] [Tag 0] [Var] hysum_$EFG_V$2e OUT
The variable sumT is an OUT for this offload.
[Offload] [MIC 0] [Tag 0] [Var] hysum_$SUMT_V$31 OUT
The variable siz is an IN for this offload.
[Offload] [MIC 0] [Tag 0] [Var] hysum_$SIZ_V$2f IN
On the target, non-pointer data is copied into target memory.
[Offload] [MIC 0] [Tag 0] [State] Scatter copyin data
User program output:
On device : 0
On the target, non-pointer data, the variable sumT, is gathered and sent from the target to the host.
[Offload] [MIC 0] [Tag 0] [State] Gather copyout data [Offload] [MIC 0] [Tag 0] [State] MIC->CPU copyout data 4
Non-pointer data is copied into target memory.
[Offload] [HOST] [Tag 0] [State] Scatter copyout data
No host time on the target.
[Offload] [MIC 0] [CPU Time] 0.000000 (seconds)
The total amount of pointer and non-pointer data, the variables abc and siz, transferred from host to target.
[Offload] [MIC 0] [CPU->MIC Data] 188 (bytes)
Computation time on the target.
[Offload] [MIC 0] [MIC Time] 0.000918 (seconds)
The total amount of pointer and non-pointer data, the variables efg and sumT, transferred from target to host.
[Offload] [MIC 0] [MIC->CPU Data] 24 (bytes)
User Program output:
xyz(1)=5 xyz(2)=7 xyz(3)=9 xyz(4)=11 xyz(5)=13 sum total=45
Cleanup at end of program:
[Offload] [MIC 1] [State] Unregister data tables [Offload] [MIC 0] [State] Unregister data tables [Offload] [HOST] [State] Unregister data tables