User-mandated or SIMD Vectorization

User-mandated or SIMD vectorization supplements automatic vectorization just like OpenMP* parallelization supplements automatic parallelization. The following figure illustrates this relationship. User-mandated vectorization is implemented as a single-instruction-multiple-data (SIMD) feature and is referred to as SIMD vectorization.

Note

The SIMD vectorization feature is available for both Intel® microprocessors and non-Intel microprocessors. Vectorization may call library routines that can result in additional performance gain on Intel® microprocessors than on non-Intel microprocessors. The vectorization can also be affected by certain options, such as /arch (Windows*), -m (Linux* and OS X*), or [Q]x.

The following figure illustrates how SIMD vectorization is positioned among various approaches that you can take to generate vector code that exploits vector hardware capabilities. The programs written with SIMD vectorization are very similar to those written using auto-vectorization hints. You can use SIMD vectorization to minimize the amount of code changes that you may have to go through in order to obtain vectorized code.

SIMD vectorization uses the !$OMP SIMD directive to effect loop vectorization. You must add this directive to a loop and recompile to vectorize the loop using the option -qopenmp-simd (Linux and OS X*) or Qopenmp-simd (Windows*).

Consider an example in Fortran where the compiler does not automatically vectorize the loop due to the unknown data dependence distance between I and 2*I. If you know that X is large enough that data A(I) and A(2*I) don’t overlap within a reasonable number of iterations, perhaps 64, you can enforce vectorization of the loop using !$OMP SIMD. If you know that they don’t overlap in at least 8 iterations you may additionally specify !$OMP SIMD SIMDLEN(8) to avoid vectorization that is too wide, which might lead to overlap.

Example: without !$OMP SIMD
`[D:/simd] cat example1.f subroutine add(A, N, X) integer N, X real A(N) DO I=X, N A(I) = A(I) + A(2*I) ENDDO End`
[D:/simd] ifort example1.f -c -nologo -Qopt-report2 -Qopt-report-phase=vec -Qopt-report-file=stderr Begin optimization report for: ADD Report from: Vector optimizations [vec] LOOP BEGIN at example1.f(5,9) <Multiversioned v1> remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details remark #15346: vector dependence: assumed FLOW dependence between A(I) (6:11) and A(I*2) (5:11) LOOP END LOOP BEGIN at example1.f(5,9) <Remainder, Multiversioned v1> LOOP END LOOP BEGIN at example1.f(5,9) <Multiversioned v2> remark #15304: loop was not vectorized: non-vectorizable loop instance from multiversioning LOOP END LOOP BEGIN at example1.f(5,9) <Remainder, Multiversioned v2> LOOP END ===========================================================================
Example: with !$OMP SIMD
`[D:/simd] cat example1.f subroutine add(A, N, X) integer N, X real A(N) !X may be 8 or more, so on overlap with 8 iterations at least !$OMP SIMD SIMDLEN(8) DO I=X, N A(I) = A(I) + A(2*I) ENDDO End`
[D:/simd] ifort example1.f -c -nologo -Qopt-report2 -Qopt-report-phase=vec -Qopt-report-file=stderr –Qopenp-simd Begin optimization report for: ADD Report from: Vector optimizations [vec] LOOP BEGIN at example1.f(6,9) <Peeled loop for vectorization> LOOP END LOOP BEGIN at example1.f(6,9) remark #15301: OpenMP SIMD LOOP WAS VECTORIZED LOOP END LOOP BEGIN at example1.f(6,9) <Remainder loop for vectorization> LOOP END ===========================================================================

The one big difference between using !$OMP SIMD and auto-vectorization hints is that with !$OMP SIMD, the compiler generates a warning when it is unable to vectorize the loop. With auto-vectorization hints, actual vectorization is still under the discretion of the compiler, even when you use the !DIR$ VECTOR ALWAYS hint.

!$OMP SIMD has optional clauses to guide the compiler on how vectorization must proceed. Use these clauses appropriately so that the compiler obtains enough information to generate correct vector code. For more information on the clauses, see the !$OMP SIMD description.

Additional Semantics

Note the following points when using the !$OMP SIMD directive.

A variable may belong to zero or one of the following: private, linear, or reduction.
Within the vector loop, an expression is evaluated as a vector value if it is private, linear, reduction, or it has a sub-expression that is evaluated to a vector value. Otherwise, it is evaluated as a scalar value (that is, broadcast the same value to all iterations). Scalar value does not necessarily mean loop invariant, although that is the most frequently seen usage pattern of scalar value.
A vector value may not be assigned to a scalar L-value. It is an error.
A scalar L-value may not be assigned under a vector condition. It is an error.
The computed GOTO statement is not supported.

Using vector Declaration

Consider the following Intel® Visual Fortran example code for a program to compare serial and vector computations using a user-defined function, foo().

Note

All code examples in this section are applicable for Fortran on Windows* only.

Example: Where user-defined function is not vectorized
!! file simdmain.f90 program simdtest use IFPORT ! Test vector function in external file. implicit none interface integer function foo(a, b) integer a, b end function foo end interface integer, parameter :: M = 48, N = 64 integer i, j integer, dimension(M,N) :: a1 integer, dimension(M,N) :: a2 integer, dimension(M,N) :: s_a3 integer, dimension(M,N) :: v_a3 logical :: err_flag = .false. ! compute random numbers for arrays do j = 1, N do i = 1, M a1(i,j) = rand() * M a2(i,j) = rand() * M end do end do ! compute serial results do j = 1, N !dir$ novector do i = 1, M s_a3(i,j) = foo(a1(i,j), a2(i,j)) end do end do ! compute vector results do j = 1, N do i = 1, M v_a3(i,j) = foo(a1(i,j), a2(i,j)) end do end do ! compare serial and vector results do j = 1, N do i = 1, M if (s_a3(i,j) .ne. v_a3(i,j)) then err_flag = .true. print , s_a3(i, j), v_a3(i,j) end if end do end do if (err_flag .eq. .true.) then write(,) "FAILED" else write(,*) "PASSED" end if end program !! file: vecfoo.f90 integer function foo(a, b) implicit none integer, intent(in) :: a, b foo = a - b end function
[49 C:/temp] ifort -nologo -qopt-report2 -qopt-report-phase=vec -qopt-report-file=stderr simdmain.f90 vecfoo.f90 Begin optimization report for: SIMDTEST Report from: Vector optimizations [vec] LOOP BEGIN at simdmain.f90(33,3) remark #15319: loop was not vectorized: novector directive used LOOP END LOOP BEGIN at simdmain.f90(54,2) remark #15541: outer loop was not auto-vectorized: consider using SIMD directive LOOP BEGIN at simdmain.f90(47,3) remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details remark #15346: vector dependence: assumed OUTPUT dependence between at(50:5) and at (50:5) LOOP END LOOP END Non-optimizable loops: LOOP BEGIN at simdmain.f90(28,2) remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate. LOOP BEGIN at simdmain.f90(27,3) remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate. LOOP END LOOP END LOOP BEGIN at simdmain.f90(36,2) remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate. LOOP END LOOP BEGIN at simdmain.f90(43,3) remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate. LOOP BEGIN at simdmain.f90(42,4) remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate. LOOP END LOOP END ===========================================================================

Example: Where user-defined function is not vectorized

!! file simdmain.f90 
program simdtest
use IFPORT
! Test vector function in external file.
implicit none
interface
   integer function foo(a, b)
   integer a, b
   end function foo
end interface

integer, parameter :: M = 48, N = 64

  integer  i, j
  integer, dimension(M,N) :: a1  
  integer, dimension(M,N) :: a2
  integer, dimension(M,N) :: s_a3
  integer, dimension(M,N) :: v_a3 
logical :: err_flag = .false.

! compute random numbers for arrays
do j = 1, N
  do i = 1, M
   a1(i,j) = rand() * M
   a2(i,j) = rand() * M
  end do
end do

! compute serial results
do j = 1, N 
!dir$ novector 
  do i = 1, M
   s_a3(i,j) = foo(a1(i,j), a2(i,j))
  end do
end do

! compute vector results
  do j = 1, N 
   do i = 1, M
    v_a3(i,j) = foo(a1(i,j), a2(i,j))
   end do
  end do

! compare serial and vector results
do j = 1, N 
  do i = 1, M
   if (s_a3(i,j) .ne. v_a3(i,j)) then
    err_flag = .true. 
    print *, s_a3(i, j), v_a3(i,j)
   end if
  end do 
 end do
if (err_flag .eq. .true.) then
  write(*,*) "FAILED"
   else
  write(*,*) "PASSED"
end if 
end program

!! file: vecfoo.f90 
integer function foo(a, b)
implicit none
integer, intent(in) :: a, b
  foo = a - b 
end function

[49 C:/temp] ifort -nologo -qopt-report2 -qopt-report-phase=vec -qopt-report-file=stderr simdmain.f90 vecfoo.f90

Begin optimization report for: SIMDTEST

    Report from: Vector optimizations [vec]


LOOP BEGIN at simdmain.f90(33,3)
   remark #15319: loop was not vectorized: novector directive used
LOOP END

LOOP BEGIN at simdmain.f90(54,2)
   remark #15541: outer loop was not auto-vectorized: consider using SIMD directive

   LOOP BEGIN at simdmain.f90(47,3)
      remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below.
Use level 5 report for details
      remark #15346: vector dependence: assumed OUTPUT dependence between at(50:5) and at (50:5)
   LOOP END
LOOP END


Non-optimizable loops:


LOOP BEGIN at simdmain.f90(28,2)
   remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate.

   LOOP BEGIN at simdmain.f90(27,3)
      remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate.
   LOOP END
LOOP END

LOOP BEGIN at simdmain.f90(36,2)
   remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate.
LOOP END

LOOP BEGIN at simdmain.f90(43,3)
   remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate.

   LOOP BEGIN at simdmain.f90(42,4)
      remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate.
   LOOP END
LOOP END
===========================================================================

When you compile the above code, the loop containing the foo() function is not auto-vectorized because the auto-vectorizer does not know what foo() does unless it is inlined to this call site.

In such cases where the function call is not inlined, you can use the !DIR$ attributes vector::function-name-list declaration to vectorize the loop and the function foo(). All you need to do is add the vector declaration to the function declaration, and recompile the code. The loop and function are vectorized.

Example: Where loop with user-defined function with vector declaration is auto-vectorized
!! file simdmain.f90 program simdtest ! Test vector function in external file. use IFPORT implicit none interface integer function foo(a, b) !$omp declare simd integer a, b end function foo end interface integer, parameter :: M = 48, N = 64 integer i, j integer, dimension(M,N) :: a1 integer, dimension(M,N) :: a2 integer, dimension(M,N) :: s_a3 integer, dimension(M,N) :: v_a3 logical :: err_flag = .false. ! compute random numbers for arrays do j = 1, N do i = 1, M a1(i,j) = rand() * M a2(i,j) = rand() * M end do end do ! compute serial results do j = 1, N !dir$ novector do i = 1, M s_a3(i,j) = foo(a1(i,j), a2(i,j)) end do end do ! compute vector results do j = 1, N do i = 1, M v_a3(i,j) = foo(a1(i,j), a2(i,j)) end do end do ! compare serial and vector results do j = 1, N do i = 1, M if (s_a3(i,j) .ne. v_a3(i,j)) then err_flag = .true. print , s_a3(i, j), v_a3(i,j) end if end do end do if (err_flag .eq. .true.) then write(,) "FAILED" else write(,*) "PASSED" end if end program !! file: vecfoo.f90 integer function foo(a, b) !$omp declare simd implicit none integer, intent(in) :: a, b foo = a - b end function
[49 C:/temp] ifort -nologo -qopt-report2 -qopt-report-phase=vec -qopt-report-file=stderr simdmain.f90 vecfoo.f90 –qopenmp Begin optimization report for: SIMDTEST Report from: Vector optimizations [vec] LOOP BEGIN at simdmain.f90(32,2) remark #15541: outer loop was not auto-vectorized: consider using SIMD directive LOOP BEGIN at simdmain.f90(34,3) remark #15319: loop was not vectorized: novector directive used LOOP END LOOP END LOOP BEGIN at simdmain.f90(40,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at simdmain.f90(41,4) remark #15300: LOOP WAS VECTORIZED LOOP END LOOP END LOOP BEGIN at simdmain.f90(55,2) remark #15541: outer loop was not auto-vectorized: consider using SIMD directive LOOP BEGIN at simdmain.f90(48,3) remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below Use level 5 report for details remark #15346: vector dependence: assumed OUTPUT dependence between at (51:5) and at (51:5) LOOP END LOOP END Non-optimizable loops: LOOP BEGIN at simdmain.f90(29,2) remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate. LOOP BEGIN at simdmain.f90(28,3) remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate. LOOP END LOOP END =========================================================================== Begin optimization report for: FOO..xN4vv Report from: Vector optimizations [vec] remark #15347: FUNCTION WAS VECTORIZED with xmm, simdlen=4, unmasked, formal parameter types: (vector,vector) =========================================================================== Begin optimization report for: FOO..xM4vv Report from: Vector optimizations [vec] remark #15347: FUNCTION WAS VECTORIZED with xmm, simdlen=4, masked, formal parameter types: (vector,vector) ===========================================================================.

Example: Where loop with user-defined function with vector declaration is auto-vectorized

!! file simdmain.f90 
program simdtest 
! Test vector function in external file.
use IFPORT
implicit none
interface
   integer function foo(a, b) 
!$omp declare simd
   integer a, b
   end function foo
end interface

 integer, parameter :: M = 48, N = 64

  integer  i, j
  integer, dimension(M,N) :: a1  
  integer, dimension(M,N) :: a2
  integer, dimension(M,N) :: s_a3
  integer, dimension(M,N) :: v_a3 
logical :: err_flag = .false.

! compute random numbers for arrays
do j = 1, N
  do i = 1, M
   a1(i,j) = rand() * M
   a2(i,j) = rand() * M
  end do
end do

 ! compute serial results
do j = 1, N 
!dir$ novector 
  do i = 1, M
   s_a3(i,j) = foo(a1(i,j), a2(i,j))
  end do
end do

 ! compute vector results
  do j = 1, N 
   do i = 1, M
    v_a3(i,j) = foo(a1(i,j), a2(i,j))
   end do
  end do

 ! compare serial and vector results
do j = 1, N 
  do i = 1, M
   if (s_a3(i,j) .ne. v_a3(i,j)) then
    err_flag = .true. 
    print *, s_a3(i, j), v_a3(i,j)
   end if
  end do 
 end do
if (err_flag .eq. .true.) then
  write(*,*) "FAILED"
   else
  write(*,*) "PASSED"
end if 
end program

!! file: vecfoo.f90 
integer function foo(a, b) 
!$omp declare simd
implicit none
integer, intent(in) :: a, b
  foo = a - b 
end function

[49 C:/temp] ifort -nologo -qopt-report2 -qopt-report-phase=vec -qopt-report-file=stderr simdmain.f90 vecfoo.f90 –qopenmp

Begin optimization report for: SIMDTEST

    Report from: Vector optimizations [vec]


LOOP BEGIN at simdmain.f90(32,2)
   remark #15541: outer loop was not auto-vectorized: consider using SIMD directive

   LOOP BEGIN at simdmain.f90(34,3)
      remark #15319: loop was not vectorized: novector directive used
   LOOP END
LOOP END

LOOP BEGIN at simdmain.f90(40,3)
   remark #15542: loop was not vectorized: inner loop was already vectorized

   LOOP BEGIN at simdmain.f90(41,4)
      remark #15300: LOOP WAS VECTORIZED
   LOOP END
LOOP END

LOOP BEGIN at simdmain.f90(55,2)
   remark #15541: outer loop was not auto-vectorized: consider using SIMD directive

   LOOP BEGIN at simdmain.f90(48,3)
      remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below
Use level 5 report for details
      remark #15346: vector dependence: assumed OUTPUT dependence between at (51:5) and at (51:5)
   LOOP END
LOOP END


Non-optimizable loops:


LOOP BEGIN at simdmain.f90(29,2)
   remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate.

   LOOP BEGIN at simdmain.f90(28,3)
      remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate.
   LOOP END
LOOP END
===========================================================================

Begin optimization report for: FOO..xN4vv

    Report from: Vector optimizations [vec]

remark #15347: FUNCTION WAS VECTORIZED with xmm, simdlen=4, unmasked, formal parameter types: (vector,vector)
===========================================================================

Begin optimization report for: FOO..xM4vv

    Report from: Vector optimizations [vec]

remark #15347: FUNCTION WAS VECTORIZED with xmm, simdlen=4, masked, formal parameter types: (vector,vector)
===========================================================================.

Restrictions on Using `!$omp declare simd` declaration

Vectorization depends on two major factors: hardware and the style of source code. When using the vector declaration, the following features are not allowed:

Locks, barriers, atomic construct, critical sections (These are allowed inside !$OMP ORDERED SIMD blocks)
Computed and assigned GOTO and SELECT CASE constructs (in some cases these may be supported and converted to IF statements)
The GOTO statement, into or out of a function
ENTRY statement

Non-vector function calls are generally allowed within vector functions but calls to such functions are serialized lane-by-lane and so might perform poorly. Also for SIMD-enabled functions it is not allowed to have side effects except writes by their arguments. This rule can be violated by non-vector function calls, so be careful executing such calls in SIMD-enabled functions and subroutines.

Formal parameters must be of the following data types:

(un)signed 8, 16, 32, or 64-bit integer
32- or 64-bit floating point
64- or 128-bit complex

User-mandated or SIMD Vectorization

Note

Additional Semantics

Using vector Declaration

Note

Restrictions on Using !$omp declare simd declaration

See Also

Restrictions on Using `!$omp declare simd` declaration