Continuo da qui, copio qui.

**Annotazioni sulla prestazione**

Sometimes you can enable better optimization by promising certain program properties.

Use `@inbounds`

to eliminate array bounds checking within expressions. Be certain before doing this. If the subscripts are ever out of bounds, you may suffer crashes or silent corruption.

Use `@fastmath`

to allow floating point optimizations that are correct for real numbers, but lead to differences for IEEE numbers. Be careful when doing this, as this may change numerical results. This corresponds to the `-ffast-math`

option of `clang`

.

Write `@simd`

in front of for loops that are amenable to vectorization. This feature is experimental and could change or disappear in future versions of Julia.

Note: While `@simd`

needs to be placed directly in front of a loop, both `@inbounds`

and `@fastmath`

can be applied to several statements at once, e.g. using `begin ... end`

, or even to a whole function.

Here is an example with both `@inbounds`

and `@simd`

markup:

```
function inner(x, y)
s = zero(eltype(x))
for i=1:length(x)
@inbounds s += x[i]*y[i]
end
s
end
function innersimd(x, y)
s = zero(eltype(x))
@simd for i=1:length(x)
@inbounds s += x[i]*y[i]
end
s
end
function timeit(n, reps)
x = rand(Float32,n)
y = rand(Float32,n)
s = zero(Float64)
time = @elapsed for j in 1:reps
s+=inner(x,y)
end
println("GFlop/sec = ",2.0*n*reps/time*1E-9)
time = @elapsed for j in 1:reps
s+=innersimd(x,y)
end
println("GFlop/sec (SIMD) = ",2.0*n*reps/time*1E-9)
end
timeit(1000,1000)
```

(`GFlop/sec`

measures the performance, and larger numbers are better.) The range for a `@simd`

for loop should be a one-dimensional range. A variable used for accumulating, such as s in the example, is called a reduction variable. By using `@simd`

, you are asserting several properties of the loop:

It is safe to execute iterations in arbitrary or overlapping order, with special consideration for reduction variables.

Floating-point operations on reduction variables can be reordered, possibly causing different results than without `@simd`

.

No iteration ever waits on another iteration to make forward progress.

A loop containing `break`

, `continue`

, or `@goto`

will cause a compile-time error.

Using `@simd`

merely gives the compiler license to vectorize. Whether it actually does so depends on the compiler. To actually benefit from the current implementation, your loop should have the following additional properties:

The loop must be an innermost loop.

The loop body must be straight-line code. This is why `@inbounds`

is currently needed for all array accesses. The compiler can sometimes turn short `&&`

, `||`

, and `?:`

expressions into straight-line code, if it is safe to evaluate all operands unconditionally. Consider using `ifelse()`

instead of `?:`

in the loop if it is safe to do so.

Accesses must have a stride pattern and cannot be “gathers” (random-index reads) or “scatters” (random-index writes).

The stride should be unit stride.

In some simple cases, for example with 2-3 arrays accessed in a loop, the LLVM auto-vectorization may kick in automatically, leading to no further speedup with `@simd`

.

Here is an example with all three kinds of markup. This program first calculates the finite difference of a one-dimensional array, and then evaluates the L2-norm of the result:

**wave.jl**

```
function init!(u)
n = length(u)
dx = 1.0 / (n-1)
@fastmath @inbounds @simd for i in 1:n
u[i] = sin(2pi*dx*i)
end
end
function deriv!(u, du)
n = length(u)
dx = 1.0 / (n-1)
@fastmath @inbounds du[1] = (u[2] - u[1]) / dx
@fastmath @inbounds @simd for i in 2:n-1
du[i] = (u[i+1] - u[i-1]) / (2*dx)
end
@fastmath @inbounds du[n] = (u[n] - u[n-1]) / dx
end
function norm(u)
n = length(u)
T = eltype(u)
s = zero(T)
@fastmath @inbounds @simd for i in 1:n
s += u[i]^2
end
@fastmath @inbounds return sqrt(s/n)
end
function main()
n = 2000
u = Array{Float64}(n)
init!(u)
du = similar(u)
deriv!(u, du)
nu = norm(du)
@time for i in 1:10^6
deriv!(u, du)
nu = norm(du)
end
println(nu)
end
main()
```

Here, the option `--math-mode=ieee`

disables the `@fastmath`

macro, so that we can compare results.

In this case, the speedup due to `@fastmath`

is a factor of about 3.7. This is unusually large – in general, the speedup will be smaller. (In this particular example, the working set of the benchmark is small enough to fit into the L1 cache of the processor, so that memory access latency does not play a role, and computing time is dominated by CPU usage. In many real world programs this is not the case.) Also, in this case this optimization does not change the result – in general, the result will be slightly different. In some cases, especially for numerically unstable algorithms, the result can be very different.

The annotation `@fastmath`

re-arranges floating point expressions, e.g. changing the order of evaluation, or assuming that certain special cases (`inf`

, `nan`

) cannot occur. In this case (and on this particular computer), the main difference is that the expression `1 / (2*dx)`

in the function `deriv`

is hoisted out of the loop (i.e. calculated outside the loop), as if one had written `idx = 1 / (2*dx)`

. In the loop, the expression `... / (2*dx)`

then becomes `... * idx`

, which is much faster to evaluate. Of course, both the actual optimization that is applied by the compiler as well as the resulting speedup depend very much on the hardware. You can examine the change in generated code by using Julia’s `code_native()`

function.

**Trattare subnormal numbers come zero**

Subnormal numbers, formerly called denormal numbers, are useful in many contexts, but incur a performance penalty on some hardware. A call `set_zero_subnormals(true)`

grants permission for floating-point operations to treat subnormal inputs or outputs as zeros, which may improve performance on some hardware. A call `set_zero_subnormals(false)`

enforces strict IEEE behavior for subnormal numbers.

Below is an example where subnormals noticeably impact performance on some hardware:

```
function timestep(b::Vector{T}, a::Vector{T}, Δt::T) where T
@assert length(a)==length(b)
n = length(b)
b[1] = 1 # Boundary condition
for i=2:n-1
b[i] = a[i] + (a[i-1] - T(2)*a[i] + a[i+1]) * Δt
end
b[n] = 0 # Boundary condition
end
function heatflow(a::Vector{T}, nstep::Integer) where T
b = similar(a)
for t=1:div(nstep,2) # Assume nstep is even
timestep(b,a,T(0.1))
timestep(a,b,T(0.1))
end
end
heatflow(zeros(Float32,10),2) # Force compilation
for trial=1:6
a = zeros(Float32,1000)
set_zero_subnormals(iseven(trial)) # Odd trials use strict IEEE arithmetic
@time heatflow(a,1000)
end
```

This example generates many subnormal numbers because the values in `a`

become an exponentially decreasing curve, which slowly flattens out over time.

Treating subnormals as zeros should be used with caution, because doing so breaks some identities, such as `x-y == 0`

implies `x == y`

:

In some applications, an alternative to zeroing subnormal numbers is to inject a tiny bit of noise. For example, instead of initializing a with zeros, initialize it with:

`a = rand(Float32,1000) * 1.f-9`

`@code_warntype`

The macro `@code_warntype`

(or its function variant `code_warntype()`

) can sometimes be helpful in diagnosing type-related problems. Here’s an example:

Ecco l’applicazione di `@code_warntype`

a `f(3.2)`

:

```
@code_warntype f(3.2)
Variables:
#self#::#f
x::Float64
y::Union{Float64, Int64}
fy::Float64
#temp#@_5::Union{Float64, Int64}
#temp#@_6::Core.MethodInstance
#temp#@_7::Float64
#temp#@_8::Float64
Body:
begin
$(Expr(:inbounds, false))
# meta: location REPL[8] pos 1
# meta: location float.jl < 491
fy::Float64 = (Base.sitofp)(Float64, 0)::Float64
# meta: pop location
unless (Base.or_int)((Base.lt_float)(x::Float64, fy::Float64)::Bool, (Base.and_int)((Base.and_int)((Base.eq_float)(x::Float64, fy::Float64)::Bool, (Base.lt_float)(fy::Float64, 9.223372036854776e18)::Bool)::Bool, (Base.slt_int)((Base.fptosi)(Int64, fy::Float64)::Int64, 0)::Bool)::Bool)::Bool goto 9
#temp#@_5::Union{Float64, Int64} = 0
goto 11
9:
#temp#@_5::Union{Float64, Int64} = x::Float64
11:
# meta: pop location
$(Expr(:inbounds, :pop))
y::Union{Float64, Int64} = #temp#@_5::Union{Float64, Int64} # line 3:
unless (y::Union{Float64, Int64} isa Int64)::Bool goto 19
#temp#@_6::Core.MethodInstance = MethodInstance for *(::Int64, ::Float64)
goto 28
19:
unless (y::Union{Float64, Int64} isa Float64)::Bool goto 23
#temp#@_6::Core.MethodInstance = MethodInstance for *(::Float64, ::Float64)
goto 28
23:
goto 25
25:
#temp#@_7::Float64 = (y::Union{Float64, Int64} * x::Float64)::Float64
goto 30
28:
#temp#@_7::Float64 = $(Expr(:invoke, :(#temp#@_6), :(Main.*), :(y), :(x)))
30:
SSAValue(0) = (Base.add_float)(#temp#@_7::Float64, (Base.sitofp)(Float64, 1)::Float64)::Float64
$(Expr(:inbounds, false))
# meta: location math.jl sin 419
SSAValue(2) = $(Expr(:foreigncall, ("sin", "libopenlibm"), Float64, svec(Float64), SSAValue(0), 0))
# meta: location math.jl nan_dom_err 300
unless (Base.and_int)((Base.ne_float)(SSAValue(2), SSAValue(2))::Bool, (Base.not_int)((Base.ne_float)(SSAValue(0), SSAValue(0))::Bool)::Bool)::Bool goto 39
#temp#@_8::Float64 = (Base.Math.throw)($(QuoteNode(DomainError())))::Union{}
goto 41
39:
#temp#@_8::Float64 = SSAValue(2)
41:
# meta: pop location
# meta: pop location
$(Expr(:inbounds, :pop))
return #temp#@_8::Float64
end::Float64
```

Interpreting the output of `@code_warntype`

, like that of its cousins `@code_lowered`

, `@code_typed`

, `@code_llvm`

, and `@code_native`

, takes a little practice. Your code is being presented in form that has been partially digested on its way to generating compiled machine code. Most of the expressions are annotated by a type, indicated by the `::T`

(where `T`

might be `Float64`

, for example). The most important characteristic of `@code_warntype`

is that non-concrete types are displayed in red.

The top part of the output summarizes the type information for the different variables internal to the function. You can see that `y`

, one of the variables you created, is a `Union{Int64,Float64}`

, due to the type-instability of `pos`

. There is another variable, `_var4`

, which you can see also has the same type.

The next lines represent the body of `f`

. The lines starting with a number followed by a colon (1:, 2:) are labels, and represent targets for jumps (via `goto`

) in your code. Looking at the body, you can see that `pos`

has been inlined into `f`

–everything before 2: comes from code defined in `pos`

.

Starting at 2:, the variable `y`

is defined, and again annotated as a `Union`

type. Next, we see that the compiler created the temporary variable `_var1`

to hold the result of `y*x`

. Because a `Float64`

times either an `Int64`

or `Float64`

yields a `Float64`

, all type-instability ends here. The net result is that `f(x::Float64)`

will not be type-unstable in its output, even if some of the intermediate computations are type-unstable.

How you use this information is up to you. Obviously, it would be far and away best to fix `pos`

to be type-stable: if you did so, all of the variables in `f`

would be concrete, and its performance would be optimal. However, there are circumstances where this kind of ephemeral type instability might not matter too much: for example, if `pos`

is never used in isolation, the fact that `f`

‘s output is type-stable (for `Float64`

inputs) will shield later code from the propagating effects of type instability. This is particularly relevant in cases where fixing the type instability is difficult or impossible: for example, currently it’s not possible to infer the return type of an anonymous function. In such cases, the tips above (e.g., adding type annotations and/or breaking up functions) are your best tools to contain the “damage” from type instability.

The following examples may help you interpret expressions marked as containing non-leaf types:

- Function body ending in
`end::Union{T1,T2}`

)

Interpretation: function with unstable return type

Suggestion: make the return value type-stable, even if you have to annotate it
`f(x::T)::Union{T1,T2}`

Interpretation: call to a type-unstable function

Suggestion: fix the function, or if necessary annotate the return value
`(top(arrayref))(A::Array{Any,1},1)::Any`

Interpretation: accessing elements of poorly-typed arrays

Suggestion: use arrays with better-defined types, or if necessary annotate the type of individual element accesses
`(top(getfield))`

(A::ArrayContainer{Float64},:data)::Array{Float64,N}

Interpretation: getting a field that is of non-leaf type. In this case, `ArrayContainer`

had a field `data::Array{T}`

. But `Array`

needs the dimension `N`

, too, to be a concrete type.

Suggestion: use concrete types like `Array{T,3}`

or `Array{T,N}`

, where `N`

is now a parameter of `ArrayContainer`

.

🤢