Category Archives: Python

SciPy – 56 – elaborazione di immagini multidimensionali – 8

Continuo da qui, copio qui.

Segmentazione ed etichettatura
Segmentation is the process of separating objects of interest from the background. The most simple approach is probably intensity thresholding, which is easily done with numpy functions:

The result is a binary image, in which the individual objects still need to be identified and labeled. The function label generates an array where each object is assigned a unique number:

The label function generates an array where the objects in the input are labeled with an integer index. It returns a tuple consisting of the array of object labels and the number of objects found, unless the output parameter is given, in which case only the number of objects is returned. The connectivity of the objects is defined by a structuring element. For instance, in two dimensions using a four-connected structuring element gives:

These two objects are not connected because there is no way in which we can place the structuring element such that it overlaps with both objects. However, an 8-connected structuring element results in only a single object:

If no structuring element is provided, one is generated by calling generate_binary_structure (see Binary morphology [qui]) using a connectivity of one (which in 2D is the 4-connected structure of the first example). The input can be of any type, any value not equal to zero is taken to be part of an object. This is useful if you need to ‘re-label’ an array of object indices, for instance after removing unwanted objects. Just apply the label function again to the index array. For instance:

Note: The structuring element used by label is assumed to be symmetric.

There is a large number of other approaches for segmentation, for instance from an estimation of the borders of the objects that can be obtained for instance by derivative filters. One such an approach is watershed segmentation. The function watershed_ift generates an array where each object is assigned a unique label, from an array that localizes the object borders, generated for instance by a gradient magnitude filter. It uses an array containing initial markers for the objects:

The watershed_ift function applies a watershed from markers algorithm, using an Iterative Forest Transform, as described in P. Felkel, R. Wegenkittl, and M. Bruckschwaiger, “Implementation and Complexity of the Watershed-from-Markers Algorithm Computed as a Minimal Cost Forest.”, Eurographics 2001, pp. C:26-35.

The inputs of this function are the array to which the transform is applied, and an array of markers that designate the objects by a unique label, where any non-zero value is a marker. For instance:

Here two markers were used to designate an object (marker = 2) and the background (marker = 1). The order in which these are processed is arbitrary: moving the marker for the background to the lower right corner of the array yields a different result:

The result is that the object (marker = 2) is smaller because the second marker was processed earlier. This may not be the desired effect if the first marker was supposed to designate a background object. Therefore watershed_ift treats markers with a negative value explicitly as background markers and processes them after the normal markers. For instance, replacing the first marker by a negative marker gives a result similar to the first example:

The connectivity of the objects is defined by a structuring element. If no structuring element is provided, one is generated by calling generate_binary_structure (see Binary morphology [stesso link]) using a connectivity of one (which in 2D is a 4-connected structure.) For example, using an 8-connected structure with the last example yields a different object:

Note: The implementation of watershed_ift limits the data types of the input to UInt8 and UInt16.

:mrgreen:

SciPy – 55 – elaborazione di immagini multidimensionali – 7

Continuo da qui, copio qui.

Trasformate di distanza
Distance transforms are used to calculate the minimum distance from each element of an object to the background. The following functions implement distance transforms for three different distance metrics: Euclidean, City Block, and Chessboard distances.

The function distance_transform_cdt uses a chamfer type algorithm to calculate the distance transform of the input, by replacing each object element (defined by values larger than zero) with the shortest distance to the background (all non-object elements). The structure determines the type of chamfering that is done. If the structure is equal to 'cityblock' a structure is generated using generate_binary_structure with a squared distance equal to 1. If the structure is equal to 'chessboard', a structure is generated using generate_binary_structure with a squared distance equal to the rank of the array. These choices correspond to the common interpretations of the cityblock and the chessboard distance metrics in two dimensions.

In addition to the distance transform, the feature transform can be calculated. In this case the index of the closest background element is returned along the first axis of the result. The return_distances, and return_indices flags can be used to indicate if the distance transform, the feature transform, or both must be returned.

The distances and indices arguments can be used to give optional output arrays that must be of the correct size and type (both Int32). The basics of the algorithm used to implement this function is described in G. Borgefors, “Distance transformations in arbitrary dimensions.”, Computer Vision, Graphics, and Image Processing, 27:321-345, 1984.

The function distance_transform_edt calculates the exact euclidean distance transform of the input, by replacing each object element (defined by values larger than zero) with the shortest euclidean distance to the background (all non-object elements).

In addition to the distance transform, the feature transform can be calculated. In this case the index of the closest background element is returned along the first axis of the result. The return_distances, and return_indices flags can be used to indicate if the distance transform, the feature transform, or both must be returned.

Optionally the sampling along each axis can be given by the sampling parameter which should be a sequence of length equal to the input rank, or a single number in which the sampling is assumed to be equal along all axes.

The distances and indices arguments can be used to give optional output arrays that must be of the correct size and type (Float64 and Int32).The algorithm used to implement this function is described in C. R. Maurer, Jr., R. Qi, and V. Raghavan, “A linear time algorithm for computing exact euclidean distance transforms of binary images in arbitrary dimensions. IEEE Trans. PAMI 25, 265-270, 2003.

The function distance_transform_bf uses a brute-force algorithm to calculate the distance transform of the input, by replacing each object element (defined by values larger than zero) with the shortest distance to the background (all non-object elements). The metric must be one of 'euclidean', 'cityblock', or 'chessboard'.

In addition to the distance transform, the feature transform can be calculated. In this case the index of the closest background element is returned along the first axis of the result. The return_distances, and return_indices flags can be used to indicate if the distance transform, the feature transform, or both must be returned.

Optionally the sampling along each axis can be given by the sampling parameter which should be a sequence of length equal to the input rank, or a single number in which the sampling is assumed to be equal along all axes. This parameter is only used in the case of the euclidean distance transform.

The distances and indices arguments can be used to give optional output arrays that must be of the correct size and type (Float64 and Int32).

Note: This function uses a slow brute-force algorithm, the function distance_transform_cdt can be used to more efficiently calculate cityblock and chessboard distance transforms. The function distance_transform_edt can be used to more efficiently calculate the exact euclidean distance transform.

:mrgreen:

SciPy – 54 – elaborazione di immagini multidimensionali – 6

Continuo da qui, copio qui.

Morfologia
morfologia binaria
The generate_binary_structure functions generates a binary structuring element for use in binary morphology operations. The rank of the structure must be provided. The size of the structure that is returned is equal to three in each direction. The value of each element is equal to one if the square of the Euclidean distance from the element to the center is less or equal to connectivity. For instance, two dimensional 4-connected and 8-connected structures are generated as follows:

Most binary morphology functions can be expressed in terms of the basic operations erosion and dilation.

  • The binary_erosion function implements binary erosion of arrays of arbitrary rank with the given structuring element. The origin parameter controls the placement of the structuring element as described in Filter functions [qui]. If no structuring element is provided, an element with connectivity equal to one is generated using generate_binary_structure. The border_value parameter gives the value of the array outside boundaries. The erosion is repeated iterations times. If iterations is less than one, the erosion is repeated until the result does not change anymore. If a mask array is given, only those elements with a true value at the corresponding mask element are modified at each iteration.
  • The binary_dilation function implements binary dilation of arrays of arbitrary rank with the given structuring element. The origin parameter controls the placement of the structuring element as described in Filter functions [stesso link]. If no structuring element is provided, an element with connectivity equal to one is generated using generate_binary_structure. The border_value parameter gives the value of the array outside boundaries. The dilation is repeated iterations times. If iterations is less than one, the dilation is repeated until the result does not change anymore. If a mask array is given, only those elements with a true value at the corresponding mask element are modified at each iteration.

Here is an example of using binary_dilation to find all elements that touch the border, by repeatedly dilating an empty array from the border using the data array as the mask:

The binary_erosion and binary_dilation functions both have an iterations parameter which allows the erosion or dilation to be repeated a number of times. Repeating an erosion or a dilation with a given structure n times is equivalent to an erosion or a dilation with a structure that is n-1 times dilated with itself. A function is provided that allows the calculation of a structure that is dilated a number of times with itself:

The iterate_structure function returns a structure by dilation of the input structure iteration-1 times with itself.

For instance:

If the origin of the original structure is equal to 0, then it is also equal to 0 for the iterated structure. If not, the origin must also be adapted if the equivalent of the iterations erosions or dilations must be achieved with the iterated structure. The adapted origin is simply obtained by multiplying with the number of iterations. For convenience the iterate_structure also returns the adapted origin if the origin parameter is not None:

Other morphology operations can be defined in terms of erosion and d dilation. The following functions provide a few of these operations for convenience:

  • The binary_opening function implements binary opening of arrays of arbitrary rank with the given structuring element. Binary opening is equivalent to a binary erosion followed by a binary dilation with the same structuring element. The origin parameter controls the placement of the structuring element as described in Filter functions [stesso link]. If no structuring element is provided, an element with connectivity equal to one is generated using generate_binary_structure. The iterations parameter gives the number of erosions that is performed followed by the same number of dilations.
  • The binary_closing function implements binary closing of arrays of arbitrary rank with the given structuring element. Binary closing is equivalent to a binary dilation followed by a binary erosion with the same structuring element. The origin parameter controls the placement of the structuring element as described in Filter functions [stesso link]. If no structuring element is provided, an element with connectivity equal to one is generated using generate_binary_structure. The iterations parameter gives the number of dilations that is performed followed by the same number of erosions.
  • The binary_fill_holes function is used to close holes in objects in a binary image, where the structure defines the connectivity of the holes. The origin parameter controls the placement of the structuring element as described in Filter functions [stesso link]. If no structuring element is provided, an element with connectivity equal to one is generated using generate_binary_structure.
  • The binary_hit_or_miss function implements a binary hit-or-miss transform of arrays of arbitrary rank with the given structuring elements. The hit-or-miss transform is calculated by erosion of the input with the first structure, erosion of the logical not of the input with the second structure, followed by the logical and of these two erosions. The origin parameters control the placement of the structuring elements as described in Filter functions [stesso link]. If origin2 equals None it is set equal to the origin1 parameter. If the first structuring element is not provided, a structuring element with connectivity equal to one is generated using generate_binary_structure, if structure2 is not provided, it is set equal to the logical not of structure1.

morfologia grey-scale
Grey-scale morphology operations are the equivalents of binary morphology operations that operate on arrays with arbitrary values. Below we describe the grey-scale equivalents of erosion, dilation, opening and closing. These operations are implemented in a similar fashion as the filters described in Filter functions [stesso link], and we refer to this section for the description of filter kernels and footprints, and the handling of array borders. The grey-scale morphology operations optionally take a structure parameter that gives the values of the structuring element. If this parameter is not given the structuring element is assumed to be flat with a value equal to zero. The shape of the structure can optionally be defined by the footprint parameter. If this parameter is not given, the structure is assumed to be rectangular, with sizes equal to the dimensions of the structure array, or by the size parameter if structure is not given. The size parameter is only used if both structure and footprint are not given, in which case the structuring element is assumed to be rectangular and flat with the dimensions given by size. The size parameter, if provided, must be a sequence of sizes or a single number in which case the size of the filter is assumed to be equal along each axis. The footprint parameter, if provided, must be an array that defines the shape of the kernel by its non-zero elements.

Similar to binary erosion and dilation there are operations for grey-scale erosion and dilation:

  • The grey_erosion function calculates a multidimensional grey- scale erosion.
  • The grey_dilation function calculates a multidimensional grey-scale dilation.

Grey-scale opening and closing operations can be defined similar to their binary counterparts:

  • The grey_opening function implements grey-scale opening of arrays of arbitrary rank. Grey-scale opening is equivalent to a grey-scale erosion followed by a grey-scale dilation.
  • The grey_closing function implements grey-scale closing of arrays of arbitrary rank. Grey-scale opening is equivalent to a grey-scale dilation followed by a grey-scale erosion.
  • The morphological_gradient function implements a grey-scale morphological gradient of arrays of arbitrary rank. The grey-scale morphological gradient is equal to the difference of a grey-scale dilation and a grey-scale erosion.
  • The morphological_laplace function implements a grey-scale morphological laplace of arrays of arbitrary rank. The grey-scale morphological laplace is equal to the sum of a grey-scale dilation and a grey-scale erosion minus twice the input.
  • The white_tophat function implements a white top-hat filter of arrays of arbitrary rank. The white top-hat is equal to the difference of the input and a grey-scale opening.
  • The black_tophat function implements a black top-hat filter of arrays of arbitrary rank. The black top-hat is equal to the difference of a grey-scale closing and the input.

:mrgreen:

SciPy – 53 – elaborazione di immagini multidimensionali – 5

Continuo da qui, copio qui.

Filtri del dominio di Fourier
The functions described in this section perform filtering operations in the Fourier domain. Thus, the input array of such a function should be compatible with an inverse Fourier transform function, such as the functions from the numpy.fft module. We therefore have to deal with arrays that may be the result of a real or a complex Fourier transform. In the case of a real Fourier transform only half of the of the symmetric complex transform is stored. Additionally, it needs to be known what the length of the axis was that was transformed by the real fft. The functions described here provide a parameter n that in the case of a real transform must be equal to the length of the real transform axis before transformation. If this parameter is less than zero, it is assumed that the input array was the result of a complex Fourier transform. The parameter axis can be used to indicate along which axis the real transform was executed.

  • The fourier_shift function multiplies the input array with the multidimensional Fourier transform of a shift operation for the given shift. The shift parameter is a sequences of shifts for each dimension, or a single value for all dimensions.
  • The fourier_gaussian function multiplies the input array with the multidimensional Fourier transform of a Gaussian filter with given standard-deviations sigma. The sigma parameter is a sequences of values for each dimension, or a single value for all dimensions.
  • The fourier_uniform function multiplies the input array with the multidimensional Fourier transform of a uniform filter with given sizes size. The size parameter is a sequences of values for each dimension, or a single value for all dimensions.
  • The fourier_ellipsoid function multiplies the input array with the multidimensional Fourier transform of a elliptically shaped filter with given sizes size. The size parameter is a sequences of values for each dimension, or a single value for all dimensions. This function is only implemented for dimensions 1, 2, and 3.

No, niente esempi.

Funzioni di interpolazione
This section describes various interpolation functions that are based on B-spline theory. A good introduction to B-splines can be found in M. Unser, “Splines: A Perfect Fit for Signal and Image Processing,” IEEE Signal Processing Magazine, vol. 16, no. 6, pp. 22-38, November 1999.

pre-filtri per splines
Interpolation using splines of an order larger than 1 requires a pre-filtering step. The interpolation functions [here] described […] apply pre-filtering by calling spline_filter, but they can be instructed not to do this by setting the prefilter keyword equal to False. This is useful if more than one interpolation operation is done on the same array. In this case it is more efficient to do the pre-filtering only once and use a prefiltered array as the input of the interpolation functions. The following two functions implement the pre-filtering:

  • The spline_filter1d function calculates a one-dimensional spline filter along the given axis. An output array can optionally be provided. The order of the spline must be larger then 1 and less than 6.
  • The spline_filter function calculates a multidimensional spline filter.

Note: The multidimensional filter is implemented as a sequence of one-dimensional spline filters. The intermediate arrays are stored in the same data type as the output. Therefore, if an output with a limited precision is requested, the results may be imprecise because intermediate results may be stored with insufficient precision. This can be prevented by specifying a output type of high precision.

funzioni di interpolazione
Following functions all employ spline interpolation to effect some type of geometric transformation of the input array. This requires a mapping of the output coordinates to the input coordinates, and therefore the possibility arises that input values outside the boundaries are needed. This problem is solved in the same way as described in Filter functions [qui] for the multidimensional filter functions. Therefore these functions all support a mode parameter that determines how the boundaries are handled, and a cval parameter that gives a constant value in case that the ‘constant’ mode is used.

The geometric_transform function applies an arbitrary geometric transform to the input. The given mapping function is called at each point in the output to find the corresponding coordinates in the input. mapping must be a callable object that accepts a tuple of length equal to the output array rank and returns the corresponding input coordinates as a tuple of length equal to the input array rank. The output shape and output type can optionally be provided. If not given they are equal to the input shape and type.

For example:

Optionally extra arguments can be defined and passed to the filter function. The extra_arguments and extra_keywords arguments can be used to pass a tuple of extra arguments and/or a dictionary of named arguments that are passed to derivative at each call. For example, we can pass the shifts in our example as arguments

or

Note: The mapping function can also be written in C and passed using a scipy.LowLevelCallable. See Extending scipy.ndimage in C [prossimamente] for more information.

The function map_coordinates applies an arbitrary coordinate transformation using the given array of coordinates. The shape of the output is derived from that of the coordinate array by dropping the first axis. The parameter coordinates is used to find for each point in the output the corresponding coordinates in the input. The values of coordinates along the first axis are the coordinates in the input array at which the output value is found. (See also the numarray coordinates function.) Since the coordinates may be non- integer coordinates, the value of the input at these coordinates is determined by spline interpolation of the requested order.

Here is an example that interpolates a 2D array at (0.5, 0.5) and (1, 2):

The affine_transform function applies an affine transformation to the input array. The given transformation matrix and offset are used to find for each point in the output the corresponding coordinates in the input. The value of the input at the calculated coordinates is determined by spline interpolation of the requested order. The transformation matrix must be two-dimensional or can also be given as a one-dimensional sequence or array. In the latter case, it is assumed that the matrix is diagonal. A more efficient interpolation algorithm is then applied that exploits the separability of the problem. The output shape and output type can optionally be provided. If not given they are equal to the input shape and type.

The shift function returns a shifted version of the input, using spline interpolation of the requested order.

The zoom function returns a rescaled version of the input, using spline interpolation of the requested order.

The rotate function returns the input array rotated in the plane defined by the two axes given by the parameter axes, using spline interpolation of the requested order. The angle must be given in degrees. If reshape is true, then the size of the output array is adapted to contain the rotated input.

:mrgreen:

SciPy – 52 – elaborazione di immagini multidimensionali – 4

Continuo da qui, copio qui.

Funzioni filtro generiche
To implement filter functions, generic functions can be used that accept a callable object that implements the filtering operation. The iteration over the input and output arrays is handled by these generic functions, along with such details as the implementation of the boundary conditions. Only a callable object implementing a callback function that does the actual filtering work must be provided. The callback function can also be written in C and passed using a PyCapsule (see Extending scipy.ndimage in C [prossimamente] for more information).

The generic_filter1d function implements a generic one-dimensional filter function, where the actual filtering operation must be supplied as a Python function (or other callable object). The generic_filter1d function iterates over the lines of an array and calls function at each line. The arguments that are passed to function are one-dimensional arrays of the tFloat64 type. The first contains the values of the current line. It is extended at the beginning end the end, according to the filter_size and origin arguments. The second array should be modified in-place to provide the output values of the line. For example consider a correlation along one dimension:

The same operation can be implemented using generic_filter1d as follows:

Here the origin of the kernel was (by default) assumed to be in the middle of the filter of length 3. Therefore, each input line was extended by one value at the beginning and at the end, before the function was called.

Optionally extra arguments can be defined and passed to the filter function. The extra_arguments and extra_keywords arguments can be used to pass a tuple of extra arguments and/or a dictionary of named arguments that are passed to derivative at each call. For example, we can pass the parameters of our filter as an argument

or

The generic_filter function implements a generic filter function, where the actual filtering operation must be supplied as a Python function (or other callable object). The generic_filter function iterates over the array and calls function at each element. The argument of function is a one-dimensional array of the tFloat64 type, that contains the values around the current element that are within the footprint of the filter. The function should return a single value that can be converted to a double precision number. For example consider a correlation:

The same operation can be implemented using generic_filter as follows:

Here a kernel footprint was specified that contains only two elements. Therefore the filter function receives a buffer of length equal to two, which was multiplied with the proper weights and the result summed.

When calling generic_filter, either the sizes of a rectangular kernel or the footprint of the kernel must be provided. The size parameter, if provided, must be a sequence of sizes or a single number in which case the size of the filter is assumed to be equal along each axis. The footprint, if provided, must be an array that defines the shape of the kernel by its non-zero elements.

Optionally extra arguments can be defined and passed to the filter function. The extra_arguments and extra_keywords arguments can be used to pass a tuple of extra arguments and/or a dictionary of named arguments that are passed to derivative at each call. For example, we can pass the parameters of our filter as an argument

or

These functions iterate over the lines or elements starting at the last axis, i.e. the last index changes the fastest. This order of iteration is guaranteed for the case that it is important to adapt the filter depending on spatial location. Here is an example of using a class that implements the filter and keeps track of the current coordinates while iterating. It performs the same filter operation as described above for generic_filter, but additionally prints the current coordinates. Non sono riuscito a riprodurlo, mi da errori per range 😡

:mrgreen:

SciPy – 51 – elaborazione di immagini multidimensionali – 3

Continuo da qui, copio qui.

Derivate
Derivative filters can be constructed in several ways. The function gaussian_filter1d described in Smoothing filters can be used to calculate derivatives along a given axis using the order parameter. Other derivative filters are the Prewitt and Sobel filters:

  • The prewitt function calculates a derivative along the given axis.
  • The sobel function calculates a derivative along the given axis.

The Laplace filter is calculated by the sum of the second derivatives along all axes. Thus, different Laplace filters can be constructed using different second derivative functions. Therefore we provide a general function that takes a function argument to calculate the second derivative along a given direction.

The function generic_laplace calculates a laplace filter using the function passed through derivative2 to calculate second derivatives. The function derivative2 should have the following signature

derivative2(input, axis, output, mode, cval, *extra_arguments, **extra_keywords)

It should calculate the second derivative along the dimension axis. If output is not None it should use that for the output and return None, otherwise it should return the result. mode, cval have the usual meaning.

For example:

To demonstrate the use of the extra_arguments argument we could do

or

The following two functions are implemented using generic_laplace by providing appropriate functions for the second derivative function:

  • The function laplace calculates the Laplace using discrete differentiation for the second derivative (i.e. convolution with [1, -2, 1]).
  • The function gaussian_laplace calculates the Laplace filter using gaussian_filter to calculate the second derivatives. The standard-deviations of the Gaussian filter along each axis are passed through the parameter sigma as a sequence or numbers. If sigma is not a sequence but a single number, the standard deviation of the filter is equal along all directions.

The gradient magnitude is defined as the square root of the sum of the squares of the gradients in all directions. Similar to the generic Laplace function there is a generic_gradient_magnitude function that calculats the gradient magnitude of an array.

The function generic_gradient_magnitude calculates a gradient magnitude using the function passed through derivative to calculate first derivatives. The function derivative should have the following signature

derivative(input, axis, output, mode, cval, *extra_arguments, **extra_keywords)

It should calculate the derivative along the dimension axis. If output is not None it should use that for the output and return None, otherwise it should return the result. mode, cval have the usual meaning.

The extra_arguments and extra_keywords arguments can be used to pass a tuple of extra arguments and a dictionary of named arguments that are passed to derivative at each call.

For example, the sobel function fits the required signature

See the documentation of generic_laplace for examples of using the extra_arguments and extra_keywords arguments.

The sobel and prewitt functions fit the required signature and can therefore directly be used with generic_gradient_magnitude.

The function gaussian_gradient_magnitude calculates the gradient magnitude using gaussian_filter to calculate the first derivatives. The standard-deviations of the Gaussian filter along each axis are passed through the parameter sigma as a sequence or numbers. If sigma is not a sequence but a single number, the standard deviation of the filter is equal along all directions.

:mrgreen:

SciPy – 50 – elaborazione di immagini multidimensionali – 2

Continuo da qui, copio qui.

Correlazione e convoluzione
The correlate1d function calculates a one-dimensional correlation along the given axis. The lines of the array along the given axis are correlated with the given weights. The weights parameter must be a one-dimensional sequences of numbers.

The function correlate implements multidimensional correlation of the input array with a given kernel.

The convolve1d function calculates a one-dimensional convolution along the given axis. The lines of the array along the given axis are convoluted with the given weights. The weights parameter must be a one-dimensional sequences of numbers.

Note: A convolution is essentially a correlation after mirroring the kernel. As a result, the origin parameter behaves differently than in the case of a correlation: the result is shifted in the opposite directions.

The function convolve implements multidimensional convolution of the input array with a given kernel.

Note: A convolution is essentially a correlation after mirroring the kernel. As a result, the origin parameter behaves differently than in the case of a correlation: the results is shifted in the opposite direction.

Filtri di appianamento (smoothing)
The gaussian_filter1d function implements a one-dimensional Gaussian filter. The standard-deviation of the Gaussian filter is passed through the parameter sigma. Setting order = 0 corresponds to convolution with a Gaussian kernel. An order of 1, 2, or 3 corresponds to convolution with the first, second or third derivatives of a Gaussian. Higher order derivatives are not implemented.

The gaussian_filter function implements a multidimensional Gaussian filter. The standard-deviations of the Gaussian filter along each axis are passed through the parameter sigma as a sequence or numbers. If sigma is not a sequence but a single number, the standard deviation of the filter is equal along all directions. The order of the filter can be specified separately for each axis. An order of 0 corresponds to convolution with a Gaussian kernel. An order of 1, 2, or 3 corresponds to convolution with the first, second or third derivatives of a Gaussian. Higher order derivatives are not implemented. The order parameter must be a number, to specify the same order for all axes, or a sequence of numbers to specify a different order for each axis.

Note: The multidimensional filter is implemented as a sequence of one-dimensional Gaussian filters. The intermediate arrays are stored in the same data type as the output. Therefore, for output types with a lower precision, the results may be imprecise because intermediate results may be stored with insufficient precision. This can be prevented by specifying a more precise output type.

The uniform_filter1d function calculates a one-dimensional uniform filter of the given size along the given axis.

The uniform_filter implements a multidimensional uniform filter. The sizes of the uniform filter are given for each axis as a sequence of integers by the size parameter. If size is not a sequence, but a single number, the sizes along all axis are assumed to be equal.

Note: The multidimensional filter is implemented as a sequence of one-dimensional uniform filters. The intermediate arrays are stored in the same data type as the output. Therefore, for output types with a lower precision, the results may be imprecise because intermediate results may be stored with insufficient precision. This can be prevented by specifying a more precise output type.

Filtri basati sulle statistiche degli ordini
The minimum_filter1d function calculates a one-dimensional minimum filter of given size along the given axis.

The maximum_filter1d function calculates a one-dimensional maximum filter of given size along the given axis.

The minimum_filter function calculates a multidimensional minimum filter. Either the sizes of a rectangular kernel or the footprint of the kernel must be provided. The size parameter, if provided, must be a sequence of sizes or a single number in which case the size of the filter is assumed to be equal along each axis. The footprint, if provided, must be an array that defines the shape of the kernel by its non-zero elements.

The maximum_filter function calculates a multidimensional maximum filter. Either the sizes of a rectangular kernel or the footprint of the kernel must be provided. The size parameter, if provided, must be a sequence of sizes or a single number in which case the size of the filter is assumed to be equal along each axis. The footprint, if provided, must be an array that defines the shape of the kernel by its non-zero elements.

The rank_filter function calculates a multidimensional rank filter. The rank may be less then zero, i.e., rank = -1 indicates the largest element. Either the sizes of a rectangular kernel or the footprint of the kernel must be provided. The size parameter, if provided, must be a sequence of sizes or a single number in which case the size of the filter is assumed to be equal along each axis. The footprint, if provided, must be an array that defines the shape of the kernel by its non-zero elements.

The percentile_filter function calculates a multidimensional percentile filter. The percentile may be less then zero, i.e., percentile = -20 equals percentile = 80. Either the sizes of a rectangular kernel or the footprint of the kernel must be provided. The size parameter, if provided, must be a sequence of sizes or a single number in which case the size of the filter is assumed to be equal along each axis. The footprint, if provided, must be an array that defines the shape of the kernel by its non-zero elements.

The median_filter function calculates a multidimensional median filter. Either the sizes of a rectangular kernel or the footprint of the kernel must be provided. The size parameter, if provided, must be a sequence of sizes or a single number in which case the size of the filter is assumed to be equal along each axis. The footprint if provided, must be an array that defines the shape of the kernel by its non-zero elements.

:mrgreen:

SciPy – 49 – elaborazione di immagini multidimensionali – 1

Continuo da qui, copio qui.

Image processing and analysis are generally seen as operations on two-dimensional arrays of values. There are however a number of fields where images of higher dimensionality must be analyzed. Good examples of these are medical imaging and biological imaging. numpy is suited very well for this type of applications due its inherent multidimensional nature. The scipy.ndimage packages provides a number of general image processing and analysis functions that are designed to operate with arrays of arbitrary dimensionality. The packages currently includes functions for linear and non-linear filtering, binary morphology, B-spline interpolation, and object measurements.

Proprietà condivise da tutte le funzioni
All functions share some common properties. Notably, all functions allow the specification of an output array with the output argument. With this argument you can specify an array that will be changed in-place with the result with the operation. In this case the result is not returned. Usually, using the output argument is more efficient, since an existing array is used to store the result.

The type of arrays returned is dependent on the type of operation, but it is in most cases equal to the type of the input. If, however, the output argument is used, the type of the result is equal to the type of the specified output argument. If no output argument is given, it is still possible to specify what the result of the output should be. This is done by simply assigning the desired numpy type object to the output argument. For example:

Funzioni filtro
The functions described in this section all perform some type of spatial filtering of the input array: the elements in the output are some function of the values in the neighborhood of the corresponding input element. We refer to this neighborhood of elements as the filter kernel, which is often rectangular in shape but may also have an arbitrary footprint. Many of the functions described below allow you to define the footprint of the kernel, by passing a mask through the footprint parameter. For example a cross shaped kernel can be defined as follows:

Usually the origin of the kernel is at the center calculated by dividing the dimensions of the kernel shape by two. For instance, the origin of a one-dimensional kernel of length three is at the second element. Take for example the correlation of a one-dimensional array with a filter of length 3 consisting of ones:

Sometimes it is convenient to choose a different origin for the kernel. For this reason most functions support the origin parameter which gives the origin of the filter relative to its center. For example:

The effect is a shift of the result towards the left. This feature will not be needed very often, but it may be useful especially for filters that have an even size. A good example is the calculation of backward and forward differences:

We could also have calculated the forward difference as follows:

However, using the origin parameter instead of a larger kernel is more efficient. For multidimensional kernels origin can be a number, in which case the origin is assumed to be equal along all axes, or a sequence giving the origin along each axis.

Since the output elements are a function of elements in the neighborhood of the input elements, the borders of the array need to be dealt with appropriately by providing the values outside the borders. This is done by assuming that the arrays are extended beyond their boundaries according certain boundary conditions. In the functions described below, the boundary conditions can be selected using the mode parameter which must be a string with the name of the boundary condition. The following boundary conditions are currently supported:

  • "nearest" Use the value at the boundary [1 2 3]->[1 1 2 3 3]
  • "wrap" Periodically replicate the array [1 2 3]->[3 1 2 3 1]
  • "reflect" Reflect the array at the boundary [1 2 3]->[1 1 2 3 3]
  • "constant" Use a constant value, default is 0.0 [1 2 3]->[0 1 2 3 0]

The "constant" mode is special since it needs an additional parameter to specify the constant value that should be used.

Note: The easiest way to implement such boundary conditions would be to copy the data to a larger array and extend the data at the borders according to the boundary conditions. For large arrays and large filter kernels, this would be very memory consuming, and the functions described below therefore use a different approach that does not require allocating large temporary buffers.

Uhmmm… mi sa che questa cosa dei filtri sarà lunga; pausa 😊

:mrgreen:

SciPy – 48 – statistica – 5

Continuo da qui, copio qui.

Comparare due campioni
In the following, we are given two samples, which can come either from the same or from different distribution, and we want to test whether these samples have the same statistical properties.

medie dei campioni
Test with sample with identical means:

Test with sample with different means:

test di Kolmogorov-Smirnov per due campioni (ks_2samp)
For the example where both samples are drawn from the same distribution, we cannot reject the null hypothesis since the pvalue is high

In the second example, with different location, i.e. means, we can reject the null hypothesis since the pvalue is below 1%

Stima della densità del kernel
A common task in statistics is to estimate the probability density function (PDF) of a random variable from a set of data samples. This task is called density estimation. The most well-known tool to do this is the histogram. A histogram is a useful tool for visualization (mainly because everyone understands it), but doesn’t use the available data very efficiently. Kernel density estimation (KDE) is a more efficient tool for the same task. The gaussian_kde estimator can be used to estimate the PDF of univariate as well as multivariate data. It works best if the data is unimodal.

stima univariata
We start with a minimal amount of data in order to see how gaussian_kde works, and what the different options for bandwidth selection do. The data sampled from the PDF is show as blue dashes at the bottom of the figure (this is called a rug plot):

Uhmmm… ci sono errori nel codice 😡 mancano le label. La curva nera è quella di Scott, la rossa di Silverman.

We see that there is very little difference between Scott’s Rule and Silverman’s Rule, and that the bandwidth selection with a limited amount of data is probably a bit too wide. We can define our own bandwidth function to get a less smoothed out result.

We see that if we set bandwidth to be very narrow, the obtained estimate for the probability density function (PDF) is simply the sum of Gaussians around each data point.

We now take a more realistic example, and look at the difference between the two available bandwidth selection rules. Those rules are known to work well for (close to) normal distributions, but even for unimodal distributions that are quite strongly non-normal they work reasonably well. As a non-normal distribution we take a Student’s T distribution with 5 degrees of freedom.

Codice troppo lungo, lo raccolgo nel file sp-1.py

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(12456)
x1 = np.random.normal(size=200)  # random data, normal distribution
xs = np.linspace(x1.min()-1, x1.max()+1, 200)

kde1 = stats.gaussian_kde(x1)
kde2 = stats.gaussian_kde(x1, bw_method='silverman')

fig = plt.figure(figsize=(8, 6))

ax1 = fig.add_subplot(211)
ax1.plot(x1, np.zeros(x1.shape), 'b+', ms=12)  # rug plot
ax1.plot(xs, kde1(xs), 'k-', label="Scott's Rule")
ax1.plot(xs, kde2(xs), 'b-', label="Silverman's Rule")
ax1.plot(xs, stats.norm.pdf(xs), 'r--', label="True PDF")

ax1.set_xlabel('x')
ax1.set_ylabel('Density')
ax1.set_title("Normal (top) and Student's T$_{df=5}$ (bottom) distributions")
ax1.legend(loc=1)

x2 = stats.t.rvs(5, size=200)  # random data, T distribution
xs = np.linspace(x2.min() - 1, x2.max() + 1, 200)

kde3 = stats.gaussian_kde(x2)
kde4 = stats.gaussian_kde(x2, bw_method='silverman')

ax2 = fig.add_subplot(212)
ax2.plot(x2, np.zeros(x2.shape), 'b+', ms=12)  # rug plot
ax2.plot(xs, kde3(xs), 'k-', label="Scott's Rule")
ax2.plot(xs, kde4(xs), 'b-', label="Silverman's Rule")
ax2.plot(xs, stats.t.pdf(xs, 5), 'r--', label="True PDF")

ax2.set_xlabel('x')
ax2.set_ylabel('Density')

plt.savefig('sp361.png')

We now take a look at a bimodal distribution with one wider and one narrower Gaussian feature. We expect that this will be a more difficult density to approximate, due to the different bandwidths required to accurately resolve each feature (sp-2.py).

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from functools import partial

def my_kde_bandwidth(obj, fac=1./5):
    """We use Scott's Rule, multiplied by a constant factor."""
    return np.power(obj.n, -1./(obj.d+4)) * fac

loc1, scale1, size1 = (-2, 1, 175)
loc2, scale2, size2 = (2, 0.2, 50)
x2 = np.concatenate([np.random.normal(loc=loc1, scale=scale1, size=size1),
                     np.random.normal(loc=loc2, scale=scale2, size=size2)])

x_eval = np.linspace(x2.min() - 1, x2.max() + 1, 500)

kde = stats.gaussian_kde(x2)
kde2 = stats.gaussian_kde(x2, bw_method='silverman')
kde3 = stats.gaussian_kde(x2, bw_method=partial(my_kde_bandwidth, fac=0.2))
kde4 = stats.gaussian_kde(x2, bw_method=partial(my_kde_bandwidth, fac=0.5))

pdf = stats.norm.pdf
bimodal_pdf = pdf(x_eval, loc=loc1, scale=scale1) * float(size1) / x2.size + 
              pdf(x_eval, loc=loc2, scale=scale2) * float(size2) / x2.size

fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111)

ax.plot(x2, np.zeros(x2.shape), 'b+', ms=12)
ax.plot(x_eval, kde(x_eval), 'k-', label="Scott's Rule")
ax.plot(x_eval, kde2(x_eval), 'b-', label="Silverman's Rule")
ax.plot(x_eval, kde3(x_eval), 'g-', label="Scott * 0.2")
ax.plot(x_eval, kde4(x_eval), 'c-', label="Scott * 0.5")
ax.plot(x_eval, bimodal_pdf, 'r--', label="Actual PDF")

ax.set_xlim([x_eval.min(), x_eval.max()])
ax.legend(loc=2)
ax.set_xlabel('x')
ax.set_ylabel('Density')

plt.savefig('sp362.png')

As expected, the KDE is not as close to the true PDF as we would like due to the different characteristic size of the two features of the bimodal distribution. By halving the default bandwidth (Scott * 0.5) we can do somewhat better, while using a factor 5 smaller bandwidth than the default doesn’t smooth enough. What we really need though in this case is a non-uniform (adaptive) bandwidth.

Stime multivariate
With gaussian_kde we can perform multivariate as well as univariate estimation. We demonstrate the bivariate case. First we generate some random data with a model in which the two variates are correlated (sp-3.py).

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

def measure(n):
    """Measurement model, return two coupled measurements."""
    m1 = np.random.normal(size=n)
    m2 = np.random.normal(scale=0.5, size=n)
    return m1+m2, m1-m2

m1, m2 = measure(2000)
xmin = m1.min()
xmax = m1.max()
ymin = m2.min()
ymax = m2.max()

X, Y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([X.ravel(), Y.ravel()])
values = np.vstack([m1, m2])
kernel = stats.gaussian_kde(values)
Z = np.reshape(kernel.evaluate(positions).T, X.shape)

fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111)

ax.imshow(np.rot90(Z), cmap=plt.cm.gist_earth_r,
          extent=[xmin, xmax, ymin, ymax])
ax.plot(m1, m2, 'k.', markersize=2)

ax.set_xlim([xmin, xmax])
ax.set_ylim([ymin, ymax])

plt.savefig('sp363.png')

Come già riportato la pagina della reference è in costruzione, ci sono ancora bug nel codice. Per il post corrente sono ricorso a Ralf Gommers, qui.

:mrgreen:

SciPy – 47 – statistica – 4

Continuo da qui, copio qui.

Analizzare un campione
First, we create some random variables. We set a seed so that in each run we get identical results to look at. As an example we take a sample from the Student t distribution:

Here, we set the required shape parameter of the t distribution, which in statistics corresponds to the degrees of freedom, to 10. Using size=1000 means that our sample consists of 1000 independently drawn (pseudo) random numbers. Since we did not specify the keyword arguments loc and scale, those are set to their default values zero and one.

statistiche descrittive
x is a numpy array, and we have direct access to all array methods, e.g.

How do the some sample properties compare to their theoretical counterparts?

Note: stats.describe uses the unbiased estimator for the variance, while np.var is the biased estimator.

For our sample the sample statistics differ a by a small amount from their theoretical counterparts.

test t e KS
We can use the t-test to test whether the mean of our sample differs in a statistically significant way from the theoretical expectation.

The pvalue is 0.7, this means that with an alpha error of, for example, 10%, we cannot reject the hypothesis that the sample mean is equal to zero, the expectation of the standard t-distribution.

As an exercise, we can calculate our t-test also directly without using the provided function, which should give us the same answer, and so it does:

The Kolmogorov-Smirnov test can be used to test the hypothesis that the sample comes from the standard t-distribution

Again the pvalue is high enough that we cannot reject the hypothesis that the random sample really is distributed according to the t-distribution. In real applications, we don’t know what the underlying distribution is. If we perform the Kolmogorov-Smirnov test of our sample against the standard normal distribution, then we also cannot reject the hypothesis that our sample was generated by the normal distribution given that in this example the pvalue is almost 40%.

However, the standard normal distribution has a variance of 1, while our sample has a variance of 1.29. If we standardize our sample and test it against the normal distribution, then the p-value is again large enough that we cannot reject the hypothesis that the sample came form the normal distribution.

Note: The Kolmogorov-Smirnov test assumes that we test against a distribution with given parameters, since in the last case we estimated mean and variance, this assumption is violated, and the distribution of the test statistic on which the p-value is based, is not correct.

OK, confesso: sono niubbassay 😊

code della distribuzione
Finally, we can check the upper tail of the distribution. We can use the percent point function ppf, which is the inverse of the cdf function, to obtain the critical values, or, more directly, we can use the inverse of the survival function

In all three cases, our sample has more weight in the top tail than the underlying distribution. We can briefly check a larger sample to see if we get a closer match. In this case the empirical frequency is quite close to the theoretical probability, but if we repeat this several times the fluctuations are still pretty large.

We can also compare it with the tail of the normal distribution, which has less weight in the tails:

The chisquare test can be used to test, whether for a finite number of bins, the observed frequencies differ significantly from the probabilities of the hypothesized distribution.

We see that the standard normal distribution is clearly rejected while the standard t-distribution cannot be rejected. Since the variance of our sample differs from both standard distribution, we can again redo the test taking the estimate for scale and location into account.

The fit method of the distributions can be used to estimate the parameters of the distribution, and the test is repeated using probabilities of the estimated distribution.

Taking account of the estimated parameters, we can still reject the hypothesis that our sample came from a normal distribution (at the 5% level), but again, with a p-value of 0.95, we cannot reject the t distribution.

test speciali per distribuzioni normali
Since the normal distribution is the most common distribution in statistics, there are several additional functions available to test whether a sample could have been drawn from a normal distribution

First we can test if skew and kurtosis of our sample differ significantly from those of a normal distribution:

These two tests are combined in the normality test

In all three tests the p-values are very low and we can reject the hypothesis that the our sample has skew and kurtosis of the normal distribution.

Since skew and kurtosis of our sample are based on central moments, we get exactly the same results if we test the standardized sample:

Because normality is rejected so strongly, we can check whether the normaltest gives reasonable results for other cases:

When testing for normality of a small sample of t-distributed observations and a large sample of normal distributed observation, then in neither case can we reject the null hypothesis that the sample comes from a normal distribution. In the first case this is because the test is not powerful enough to distinguish a t and a normally distributed random variable in a small sample.

Già detto che sono niubbassay 😊 per queste cose qui?

:mrgreen: