SciPy – 49 – elaborazione di immagini multidimensionali – 1

Continuo da qui, copio qui.

Image processing and analysis are generally seen as operations on two-dimensional arrays of values. There are however a number of fields where images of higher dimensionality must be analyzed. Good examples of these are medical imaging and biological imaging. numpy is suited very well for this type of applications due its inherent multidimensional nature. The scipy.ndimage packages provides a number of general image processing and analysis functions that are designed to operate with arrays of arbitrary dimensionality. The packages currently includes functions for linear and non-linear filtering, binary morphology, B-spline interpolation, and object measurements.

Proprietà condivise da tutte le funzioni
All functions share some common properties. Notably, all functions allow the specification of an output array with the output argument. With this argument you can specify an array that will be changed in-place with the result with the operation. In this case the result is not returned. Usually, using the output argument is more efficient, since an existing array is used to store the result.

The type of arrays returned is dependent on the type of operation, but it is in most cases equal to the type of the input. If, however, the output argument is used, the type of the result is equal to the type of the specified output argument. If no output argument is given, it is still possible to specify what the result of the output should be. This is done by simply assigning the desired numpy type object to the output argument. For example:

Funzioni filtro
The functions described in this section all perform some type of spatial filtering of the input array: the elements in the output are some function of the values in the neighborhood of the corresponding input element. We refer to this neighborhood of elements as the filter kernel, which is often rectangular in shape but may also have an arbitrary footprint. Many of the functions described below allow you to define the footprint of the kernel, by passing a mask through the footprint parameter. For example a cross shaped kernel can be defined as follows:

Usually the origin of the kernel is at the center calculated by dividing the dimensions of the kernel shape by two. For instance, the origin of a one-dimensional kernel of length three is at the second element. Take for example the correlation of a one-dimensional array with a filter of length 3 consisting of ones:

Sometimes it is convenient to choose a different origin for the kernel. For this reason most functions support the origin parameter which gives the origin of the filter relative to its center. For example:

The effect is a shift of the result towards the left. This feature will not be needed very often, but it may be useful especially for filters that have an even size. A good example is the calculation of backward and forward differences:

We could also have calculated the forward difference as follows:

However, using the origin parameter instead of a larger kernel is more efficient. For multidimensional kernels origin can be a number, in which case the origin is assumed to be equal along all axes, or a sequence giving the origin along each axis.

Since the output elements are a function of elements in the neighborhood of the input elements, the borders of the array need to be dealt with appropriately by providing the values outside the borders. This is done by assuming that the arrays are extended beyond their boundaries according certain boundary conditions. In the functions described below, the boundary conditions can be selected using the mode parameter which must be a string with the name of the boundary condition. The following boundary conditions are currently supported:

  • "nearest" Use the value at the boundary [1 2 3]->[1 1 2 3 3]
  • "wrap" Periodically replicate the array [1 2 3]->[3 1 2 3 1]
  • "reflect" Reflect the array at the boundary [1 2 3]->[1 1 2 3 3]
  • "constant" Use a constant value, default is 0.0 [1 2 3]->[0 1 2 3 0]

The "constant" mode is special since it needs an additional parameter to specify the constant value that should be used.

Note: The easiest way to implement such boundary conditions would be to copy the data to a larger array and extend the data at the borders according to the boundary conditions. For large arrays and large filter kernels, this would be very memory consuming, and the functions described below therefore use a different approach that does not require allocating large temporary buffers.

Uhmmm… mi sa che questa cosa dei filtri sarà lunga; pausa 😊

:mrgreen:

cit. & loll – 56

Ancora troppo caldo, il temporale non è stato sufficiente, ne ordino un altro appena finisco questo post 😊

Debugging secondo Brian Kernighan
::: CodeWisdom

Trees, 1973
::: Programming Research Laboratory

I need this sign for my office
::: wallingf

To keep large problems well structured
::: CodeWisdom

The Good Parts
::: CodeWisdom

Twitter verso il tracollo e Instagram in ascesa
::: alecerio

I didn’t just want to write unit tests
::: DLion92

There are more and more days
::: Donearm

It’s funny because it’s true!
::: Freeyourmindkid

My Advice To Anyone Starting A Business Is To Remember That Someday I Will Crush You
The Onion, nèh! 😉
::: csertoglu

C vs. metro
::: codebutler

I’m crying 😂
::: denysdovhan

Programming isn’t about what you know
::: CodeWisdom

Your honor, the code I wrote is not the code that was executed in the commission of these crimes
::: GonzoHacker

Di Maio te che ci hai la testa… 😜
yep! auto-cit.
::: _juhan

this week’s New Scientist
::: pollollups

Crap, did North Carolina just pass a law against debugging?
::: Nick_Craver

—How many Googlers does it take to change a light bulb?
::: Pinboard

Foxes are just cat software running on dog hardware
::: FluffSociety

The greatest enemy of knowledge is not ignorance
::: ValaAfshar

No code is faster
::: CodeWisdom

Any sufficiently advanced bug
::: CodeWisdom

🎵 When you spout
::: seldo

The purpose of software engineering is to control complexity
::: CodeWisdom

This thread should have a thousand retweets
::: williampietri

the two hardest problems in computer science are dealing with obnoxious men and…
::: sailorhg

Fuuuuuuuck
::: peterseibel

WE SUPPORT DIVERSITY OF VIEWPOINTS HERE
::: PHP_CEO

What are we going to do when the robots learn to check the “I am not a robot” box?
::: lizardbill

Programming can be fun, so can cryptography
::: CodeWisdom

Documentation is a love letter
::: CodeWisdom

My “smart TV” is supposed to be programmable, but
::: lizardbill

SICP – cap. 2 – painters – 61

 

Continuo da qui, copio qui.

A painter is represented as a procedure that, given a frame as argument, draws a particular image shifted and scaled to fit the frame. That is to say, if p is a painter and f is a frame, then we produce p’s image in f by calling p with f as argument.

The details of how primitive painters are implemented depend on the particular characteristics of the graphics system and the type of image to be drawn. For instance, suppose we have a procedure draw-line that draws a line on the screen between two specified points. Then we can create painters for line drawings, such as the wave painter in Figure 2.10, from lists of line segments as follows ( Segments->painter uses the representation for line segments described in Exercise 2.48 below [prossimamente]. It also uses the for-each procedure described in Exercise 2.23 [qui].):

(define (segments->painter segment-list)
  (lambda (frame)
    (for-each
     (lambda (segment)
       (draw-line
        ((frame-coord-map frame) 
         (start-segment segment))
        ((frame-coord-map frame) 
         (end-segment segment))))
     segment-list)))

Figure 2.10: Images produced by the wave painter, with respect to four different frames. The frames, shown with dotted lines, are not part of the images.

The segments are given using coordinates with respect to the unit square. For each segment in the list, the painter transforms the segment endpoints with the frame coordinate map and draws a line between the transformed points.

Representing painters as procedures erects a powerful abstraction barrier in the picture language. We can create and intermix all sorts of primitive painters, based on a variety of graphics capabilities. The details of their implementation do not matter. Any procedure can serve as a painter, provided that it takes a frame as argument and draws something scaled to fit the frame.

:mrgreen:

Julia – 32 – tipi – 1

Continuo da qui, copio qui.

Tipi
Type systems have traditionally fallen into two quite different camps: static type systems, where every program expression must have a type computable before the execution of the program, and dynamic type systems, where nothing is known about types until run time, when the actual values manipulated by the program are available. Object orientation allows some flexibility in statically typed languages by letting code be written without the precise types of values being known at compile time. The ability to write code that can operate on different types is called polymorphism. All code in classic dynamically typed languages is polymorphic: only by explicitly checking types, or when objects fail to support operations at run-time, are the types of any values ever restricted.

Julia’s type system is dynamic, but gains some of the advantages of static type systems by making it possible to indicate that certain values are of specific types. This can be of great assistance in generating efficient code, but even more significantly, it allows method dispatch on the types of function arguments to be deeply integrated with the language. Method dispatch is explored in detail in Methods [prossimamente], but is rooted in the type system presented here.

The default behavior in Julia when types are omitted is to allow values to be of any type. Thus, one can write many useful Julia programs without ever explicitly using types. When additional expressiveness is needed, however, it is easy to gradually introduce explicit type annotations into previously “untyped” code. Doing so will typically increase both the performance and robustness of these systems, and perhaps somewhat counterintuitively, often significantly simplify them.

Describing Julia in the lingo of type systems, it is: dynamic, nominative and parametric. Generic types can be parameterized, and the hierarchical relationships between types are explicitly declared, rather than implied by compatible structure. One particularly distinctive feature of Julia’s type system is that concrete types may not subtype each other: all concrete types are final and may only have abstract types as their supertypes. While this might at first seem unduly restrictive, it has many beneficial consequences with surprisingly few drawbacks. It turns out that being able to inherit behavior is much more important than being able to inherit structure, and inheriting both causes significant difficulties in traditional object-oriented languages. Other high-level aspects of Julia’s type system that should be mentioned up front are:

  • There is no division between object and non-object values: all values in Julia are true objects having a type that belongs to a single, fully connected type graph, all nodes of which are equally first-class as types.
  • There is no meaningful concept of a “compile-time type”: the only type a value has is its actual type when the program is running. This is called a “run-time type” in object-oriented languages where the combination of static compilation with polymorphism makes this distinction significant.
  • Only values, not variables, have types – variables are simply names bound to values.
  • Both abstract and concrete types can be parameterized by other types. They can also be parameterized by symbols, by values of any type for which isbits() returns true (essentially, things like numbers and bools that are stored like C types or structs with no pointers to other objects), and also by tuples thereof. Type parameters may be omitted when they do not need to be referenced or restricted.

Julia’s type system is designed to be powerful and expressive, yet clear, intuitive and unobtrusive. Many Julia programmers may never feel the need to write code that explicitly uses types. Some kinds of programming, however, become clearer, simpler, faster and more robust with declared types.

Dichiarazione del tipo
The :: operator can be used to attach type annotations to expressions and variables in programs. There are two primary reasons to do this:

  • As an assertion to help confirm that your program works the way you expect,
  • To provide extra type information to the compiler, which can then improve performance in some cases

When appended to an expression computing a value, the :: operator is read as “is an instance of”. It can be used anywhere to assert that the value of the expression on the left is an instance of the type on the right. When the type on the right is concrete, the value on the left must have that type as its implementation – recall that all concrete types are final, so no implementation is a subtype of any other. When the type is abstract, it suffices for the value to be implemented by a concrete type that is a subtype of the abstract type. If the type assertion is not true, an exception is thrown, otherwise, the left-hand value is returned:

This allows a type assertion to be attached to any expression in-place.

When appended to a variable on the left-hand side of an assignment, or as part of a local declaration, the :: operator means something a bit different: it declares the variable to always have the specified type, like a type declaration in a statically-typed language such as C. Every value assigned to the variable will be converted to the declared type using convert():

This feature is useful for avoiding performance “gotchas” that could occur if one of the assignments to a variable changed its type unexpectedly.

This “declaration” behavior only occurs in specific contexts:

local x::Int8  # in a local declaration
x::Int8 = 10   # as the left-hand side of an assignment

and applies to the whole current scope, even before the declaration. Currently, type declarations cannot be used in global scope, e.g. in the REPL, since Julia does not yet have constant-type globals.

Declarations can also be attached to function definitions:

Returning from this function behaves just like an assignment to a variable with a declared type: the value is always converted to Float64.

A questo punto inizia l’esame dei vati tipi di tipi (loll). Dopo la pausa 😊

:mrgreen:

SciPy – 48 – statistica – 5

Continuo da qui, copio qui.

Comparare due campioni
In the following, we are given two samples, which can come either from the same or from different distribution, and we want to test whether these samples have the same statistical properties.

medie dei campioni
Test with sample with identical means:

Test with sample with different means:

test di Kolmogorov-Smirnov per due campioni (ks_2samp)
For the example where both samples are drawn from the same distribution, we cannot reject the null hypothesis since the pvalue is high

In the second example, with different location, i.e. means, we can reject the null hypothesis since the pvalue is below 1%

Stima della densità del kernel
A common task in statistics is to estimate the probability density function (PDF) of a random variable from a set of data samples. This task is called density estimation. The most well-known tool to do this is the histogram. A histogram is a useful tool for visualization (mainly because everyone understands it), but doesn’t use the available data very efficiently. Kernel density estimation (KDE) is a more efficient tool for the same task. The gaussian_kde estimator can be used to estimate the PDF of univariate as well as multivariate data. It works best if the data is unimodal.

stima univariata
We start with a minimal amount of data in order to see how gaussian_kde works, and what the different options for bandwidth selection do. The data sampled from the PDF is show as blue dashes at the bottom of the figure (this is called a rug plot):

Uhmmm… ci sono errori nel codice 😡 mancano le label. La curva nera è quella di Scott, la rossa di Silverman.

We see that there is very little difference between Scott’s Rule and Silverman’s Rule, and that the bandwidth selection with a limited amount of data is probably a bit too wide. We can define our own bandwidth function to get a less smoothed out result.

We see that if we set bandwidth to be very narrow, the obtained estimate for the probability density function (PDF) is simply the sum of Gaussians around each data point.

We now take a more realistic example, and look at the difference between the two available bandwidth selection rules. Those rules are known to work well for (close to) normal distributions, but even for unimodal distributions that are quite strongly non-normal they work reasonably well. As a non-normal distribution we take a Student’s T distribution with 5 degrees of freedom.

Codice troppo lungo, lo raccolgo nel file sp-1.py

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(12456)
x1 = np.random.normal(size=200)  # random data, normal distribution
xs = np.linspace(x1.min()-1, x1.max()+1, 200)

kde1 = stats.gaussian_kde(x1)
kde2 = stats.gaussian_kde(x1, bw_method='silverman')

fig = plt.figure(figsize=(8, 6))

ax1 = fig.add_subplot(211)
ax1.plot(x1, np.zeros(x1.shape), 'b+', ms=12)  # rug plot
ax1.plot(xs, kde1(xs), 'k-', label="Scott's Rule")
ax1.plot(xs, kde2(xs), 'b-', label="Silverman's Rule")
ax1.plot(xs, stats.norm.pdf(xs), 'r--', label="True PDF")

ax1.set_xlabel('x')
ax1.set_ylabel('Density')
ax1.set_title("Normal (top) and Student's T$_{df=5}$ (bottom) distributions")
ax1.legend(loc=1)

x2 = stats.t.rvs(5, size=200)  # random data, T distribution
xs = np.linspace(x2.min() - 1, x2.max() + 1, 200)

kde3 = stats.gaussian_kde(x2)
kde4 = stats.gaussian_kde(x2, bw_method='silverman')

ax2 = fig.add_subplot(212)
ax2.plot(x2, np.zeros(x2.shape), 'b+', ms=12)  # rug plot
ax2.plot(xs, kde3(xs), 'k-', label="Scott's Rule")
ax2.plot(xs, kde4(xs), 'b-', label="Silverman's Rule")
ax2.plot(xs, stats.t.pdf(xs, 5), 'r--', label="True PDF")

ax2.set_xlabel('x')
ax2.set_ylabel('Density')

plt.savefig('sp361.png')

We now take a look at a bimodal distribution with one wider and one narrower Gaussian feature. We expect that this will be a more difficult density to approximate, due to the different bandwidths required to accurately resolve each feature (sp-2.py).

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from functools import partial

def my_kde_bandwidth(obj, fac=1./5):
    """We use Scott's Rule, multiplied by a constant factor."""
    return np.power(obj.n, -1./(obj.d+4)) * fac

loc1, scale1, size1 = (-2, 1, 175)
loc2, scale2, size2 = (2, 0.2, 50)
x2 = np.concatenate([np.random.normal(loc=loc1, scale=scale1, size=size1),
                     np.random.normal(loc=loc2, scale=scale2, size=size2)])

x_eval = np.linspace(x2.min() - 1, x2.max() + 1, 500)

kde = stats.gaussian_kde(x2)
kde2 = stats.gaussian_kde(x2, bw_method='silverman')
kde3 = stats.gaussian_kde(x2, bw_method=partial(my_kde_bandwidth, fac=0.2))
kde4 = stats.gaussian_kde(x2, bw_method=partial(my_kde_bandwidth, fac=0.5))

pdf = stats.norm.pdf
bimodal_pdf = pdf(x_eval, loc=loc1, scale=scale1) * float(size1) / x2.size + 
              pdf(x_eval, loc=loc2, scale=scale2) * float(size2) / x2.size

fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111)

ax.plot(x2, np.zeros(x2.shape), 'b+', ms=12)
ax.plot(x_eval, kde(x_eval), 'k-', label="Scott's Rule")
ax.plot(x_eval, kde2(x_eval), 'b-', label="Silverman's Rule")
ax.plot(x_eval, kde3(x_eval), 'g-', label="Scott * 0.2")
ax.plot(x_eval, kde4(x_eval), 'c-', label="Scott * 0.5")
ax.plot(x_eval, bimodal_pdf, 'r--', label="Actual PDF")

ax.set_xlim([x_eval.min(), x_eval.max()])
ax.legend(loc=2)
ax.set_xlabel('x')
ax.set_ylabel('Density')

plt.savefig('sp362.png')

As expected, the KDE is not as close to the true PDF as we would like due to the different characteristic size of the two features of the bimodal distribution. By halving the default bandwidth (Scott * 0.5) we can do somewhat better, while using a factor 5 smaller bandwidth than the default doesn’t smooth enough. What we really need though in this case is a non-uniform (adaptive) bandwidth.

Stime multivariate
With gaussian_kde we can perform multivariate as well as univariate estimation. We demonstrate the bivariate case. First we generate some random data with a model in which the two variates are correlated (sp-3.py).

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

def measure(n):
    """Measurement model, return two coupled measurements."""
    m1 = np.random.normal(size=n)
    m2 = np.random.normal(scale=0.5, size=n)
    return m1+m2, m1-m2

m1, m2 = measure(2000)
xmin = m1.min()
xmax = m1.max()
ymin = m2.min()
ymax = m2.max()

X, Y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([X.ravel(), Y.ravel()])
values = np.vstack([m1, m2])
kernel = stats.gaussian_kde(values)
Z = np.reshape(kernel.evaluate(positions).T, X.shape)

fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111)

ax.imshow(np.rot90(Z), cmap=plt.cm.gist_earth_r,
          extent=[xmin, xmax, ymin, ymax])
ax.plot(m1, m2, 'k.', markersize=2)

ax.set_xlim([xmin, xmax])
ax.set_ylim([ymin, ymax])

plt.savefig('sp363.png')

Come già riportato la pagina della reference è in costruzione, ci sono ancora bug nel codice. Per il post corrente sono ricorso a Ralf Gommers, qui.

:mrgreen:

Julia – 31 – ambito (scope) delle variabili – 3

Continuo da qui, copio qui.

blocchi let
Unlike assignments to local variables, let statements allocate new variable bindings each time they run. An assignment modifies an existing value location, and let creates new locations. This difference is usually not important, and is only detectable in the case of variables that outlive their scope via closures. The let syntax accepts a comma-separated series of assignments and variable names:

The assignments are evaluated in order, with each right-hand side evaluated in the scope before the new variable on the left-hand side has been introduced. Therefore it makes sense to write something like let x = x since the two x variables are distinct and have separate storage. Here is an example where the behavior of let is needed:

Here we create and store two closures that return variable i. However, it is always the same variable i, so the two closures behave identically. We can use let to create a new binding for i:

Since the begin construct does not introduce a new scope, it can be useful to use a zero-argument let to just introduce a new scope block without creating any new bindings:

Since let introduces a new scope block, the inner local x is a different variable than the outer local x.

Cicli for e comprehensions
for loops and Comprehensions [prossimamente] have the following behavior: any new variables introduced in their body scopes are freshly allocated for each loop iteration. This is in contrast to while loops which reuse the variables for all iterations. Therefore these constructs are similar to while loops with let blocks inside:

for loops will reuse existing variables for its iteration variable:

However, comprehensions do not do this, and always freshly allocate their iteration variables:

Costanti
A common use of variables is giving names to specific, unchanging values. Such variables are only assigned once. This intent can be conveyed to the compiler using the const keyword:

The const declaration is allowed on both global and local variables, but is especially useful for globals. It is difficult for the compiler to optimize code involving global variables, since their values (or even their types) might change at almost any time. If a global variable will not change, adding a const declaration solves this performance problem.

Local constants are quite different. The compiler is able to determine automatically when a local variable is constant, so local constant declarations are not necessary for performance purposes.

Special top-level assignments, such as those performed by the function and struct keywords, are constant by default.

Note that const only affects the variable binding; the variable may be bound to a mutable object (such as an array), and that object may still be modified.

Una cosa da ricordare: Julia permette di ridefinire le costanti, emette solo un warning:

:mrgreen:

SciPy – 47 – statistica – 4

Continuo da qui, copio qui.

Analizzare un campione
First, we create some random variables. We set a seed so that in each run we get identical results to look at. As an example we take a sample from the Student t distribution:

Here, we set the required shape parameter of the t distribution, which in statistics corresponds to the degrees of freedom, to 10. Using size=1000 means that our sample consists of 1000 independently drawn (pseudo) random numbers. Since we did not specify the keyword arguments loc and scale, those are set to their default values zero and one.

statistiche descrittive
x is a numpy array, and we have direct access to all array methods, e.g.

How do the some sample properties compare to their theoretical counterparts?

Note: stats.describe uses the unbiased estimator for the variance, while np.var is the biased estimator.

For our sample the sample statistics differ a by a small amount from their theoretical counterparts.

test t e KS
We can use the t-test to test whether the mean of our sample differs in a statistically significant way from the theoretical expectation.

The pvalue is 0.7, this means that with an alpha error of, for example, 10%, we cannot reject the hypothesis that the sample mean is equal to zero, the expectation of the standard t-distribution.

As an exercise, we can calculate our t-test also directly without using the provided function, which should give us the same answer, and so it does:

The Kolmogorov-Smirnov test can be used to test the hypothesis that the sample comes from the standard t-distribution

Again the pvalue is high enough that we cannot reject the hypothesis that the random sample really is distributed according to the t-distribution. In real applications, we don’t know what the underlying distribution is. If we perform the Kolmogorov-Smirnov test of our sample against the standard normal distribution, then we also cannot reject the hypothesis that our sample was generated by the normal distribution given that in this example the pvalue is almost 40%.

However, the standard normal distribution has a variance of 1, while our sample has a variance of 1.29. If we standardize our sample and test it against the normal distribution, then the p-value is again large enough that we cannot reject the hypothesis that the sample came form the normal distribution.

Note: The Kolmogorov-Smirnov test assumes that we test against a distribution with given parameters, since in the last case we estimated mean and variance, this assumption is violated, and the distribution of the test statistic on which the p-value is based, is not correct.

OK, confesso: sono niubbassay 😊

code della distribuzione
Finally, we can check the upper tail of the distribution. We can use the percent point function ppf, which is the inverse of the cdf function, to obtain the critical values, or, more directly, we can use the inverse of the survival function

In all three cases, our sample has more weight in the top tail than the underlying distribution. We can briefly check a larger sample to see if we get a closer match. In this case the empirical frequency is quite close to the theoretical probability, but if we repeat this several times the fluctuations are still pretty large.

We can also compare it with the tail of the normal distribution, which has less weight in the tails:

The chisquare test can be used to test, whether for a finite number of bins, the observed frequencies differ significantly from the probabilities of the hypothesized distribution.

We see that the standard normal distribution is clearly rejected while the standard t-distribution cannot be rejected. Since the variance of our sample differs from both standard distribution, we can again redo the test taking the estimate for scale and location into account.

The fit method of the distributions can be used to estimate the parameters of the distribution, and the test is repeated using probabilities of the estimated distribution.

Taking account of the estimated parameters, we can still reject the hypothesis that our sample came from a normal distribution (at the 5% level), but again, with a p-value of 0.95, we cannot reject the t distribution.

test speciali per distribuzioni normali
Since the normal distribution is the most common distribution in statistics, there are several additional functions available to test whether a sample could have been drawn from a normal distribution

First we can test if skew and kurtosis of our sample differ significantly from those of a normal distribution:

These two tests are combined in the normality test

In all three tests the p-values are very low and we can reject the hypothesis that the our sample has skew and kurtosis of the normal distribution.

Since skew and kurtosis of our sample are based on central moments, we get exactly the same results if we test the standardized sample:

Because normality is rejected so strongly, we can check whether the normaltest gives reasonable results for other cases:

When testing for normality of a small sample of t-distributed observations and a large sample of normal distributed observation, then in neither case can we reject the null hypothesis that the sample comes from a normal distribution. In the first case this is because the test is not powerful enough to distinguish a t and a normally distributed random variable in a small sample.

Già detto che sono niubbassay 😊 per queste cose qui?

:mrgreen:

Julia – 30 – ambito (scope) delle variabili – 2

Continuo da qui, copio qui.

Ambito locale
A new local scope is introduced by most code-blocks, see above table for a complete list [post precedente]. A local scope usually inherits all the variables from its parent scope, both for reading and writing. There are two subtypes of local scopes, hard and soft, with slightly different rules concerning what variables are inherited. Unlike global scopes, local scopes are not namespaces, thus variables in an inner scope cannot be retrieved from the parent scope through some sort of qualified access.

The following rules and examples pertain to both hard and soft local scopes. A newly introduced variable in a local scope does not back-propagate to its parent scope. For example, here the z is not introduced into the top-level scope:

Inside a local scope a variable can be forced to be a local variable using the local keyword:

Inside a local scope a new global variable can be defined using the keyword global:

The location of both the local and global keywords within the scope block is irrelevant. The following is equivalent to the last example (although stylistically worse):

Ambito locale soft
In a soft local scope, all variables are inherited from its parent scope unless a variable is specifically marked with the keyword local.

Soft local scopes are introduced by for-loops, while-loops, comprehensions, try-catch-finally-blocks, and let-blocks. There are some extra rules for let blocks and for for loops and comprehensions [prossimamente].

In the following example the x and y refer always to the same variables as the soft local scope inherits both read and write variables:

Within soft scopes, the global keyword is never necessary, although allowed. The only case when it would change the semantics is (currently) a syntax error:

Ambito locale hard
Hard local scopes are introduced by function definitions (in all their forms), struct type definition blocks, and macro-definitions.

In a hard local scope, all variables are inherited from its parent scope unless:

  • an assignment would result in a modified global variable, or
  • a variable is specifically marked with the keyword local.

Thus global variables are only inherited for reading but not for writing:

An explicit global is needed to assign to a global variable:

Note that nested functions can behave differently to functions defined in the global scope as they can modify their parent scope’s local variables:

The distinction between inheriting global and local variables for assignment can lead to some slight differences between functions defined in local vs. global scopes. Consider the modification of the last example by moving bar to the global scope:

Note that above subtlety does not pertain to type and macro definitions as they can only appear at the global scope. There are special scoping rules concerning the evaluation of default and keyword function arguments which are described in the Function section [qui].

An assignment introducing a variable used inside a function, type or macro definition need not come before its inner usage:

This behavior may seem slightly odd for a normal variable, but allows for named functions – which are just normal variables holding function objects – to be used before they are defined. This allows functions to be defined in whatever order is intuitive and convenient, rather than forcing bottom up ordering or requiring forward declarations, as long as they are defined by the time they are actually called. As an example, here is an inefficient, mutually recursive way to test if positive integers are even or odd:

Julia provides built-in, efficient functions to test for oddness and evenness called iseven() and isodd() so the above definitions should only be taken as examples.

Ambito locale hard contro soft
Blocks which introduce a soft local scope, such as loops, are generally used to manipulate the variables in their parent scope. Thus their default is to fully access all variables in their parent scope.

Conversely, the code inside blocks which introduce a hard local scope (function, type, and macro definitions) can be executed at any place in a program. Remotely changing the state of global variables in other modules should be done with care and thus this is an opt-in feature requiring the global keyword.

The reason to allow modifying local variables of parent scopes in nested functions is to allow constructing closures which have a private state, for instance the state variable in the following example:

See also the closures in the examples in the next two sections [prossimamente].

:mrgreen:

SciPy – 46 – statistica – 3

Continuo da qui, copio qui.

Costruire distribuzioni specifiche
The next examples shows how to build your own distributions. Further examples show the usage of the distributions and some statistical tests.

Creare distribuzioni continue, cioè subclassing rv_continuous
Making continuous distributions is fairly simple.

Interestingly, the pdf is now computed automatically:

Be aware of the performance issues mentions in Performance Issues and Cautionary Remarks [post precedente]. The computation of unspecified common methods can become very slow, since only general methods are called which, by their very nature, cannot use any specific information about the distribution. Thus, as a cautionary example:

But this is not correct: the integral over this pdf should be 1. Let’s make the integration interval smaller:

This looks better. However, the problem originated from the fact that the pdf is not specified in the class definition of the deterministic distribution.

Subclassing rv_discrete
In the following we use stats.rv_discrete to generate a discrete distribution that has the probabilities of the truncated normal for the intervals centered around the integers.

General Info: From the docstring of rv_discrete, help(stats.rv_discrete),

“You can construct an arbitrary discrete rv where P{X=xk} = pk by passing to the rv_discrete initialization method (through the values= keyword) a tuple of sequences (xk, pk) which describes only those values of X(xk) that occur with nonzero probability (pk).”

Next to this, there are some further requirements for this approach to work:

  • The keyword name is required.
  • The support points of the distribution xk have to be integers.
  • The number of significant digits (decimals) needs to be specified.

In fact, if the last two requirements are not satisfied an exception may be raised or the resulting numbers may be incorrect.

An Example. Let’s do the work. First

And finally we can subclass rv_discrete:

Now that we have defined the distribution, we have access to all common methods of discrete distributions.

Testing the Implementation. Let’s generate a random sample and compare observed frequencies with the probabilities.

Nota: il codice per i grafici non è agli URLs indicati ma a questo.

Next, we can test, whether our sample was generated by our normdiscrete distribution. This also verifies whether the random numbers are generated correctly.

The chisquare test requires that there are a minimum number of observations in each bin. We combine the tail bins into larger bins so that they contain enough observations.

The pvalue in this case is high, so we can be quite confident that our random sample was actually generated by the distribution.

:mrgreen:

Unicode – il carattere e il codice – 4

Un’altra puntata –l’ultima, promesso– della telenovela sui codici Unicode, Emoji compresi.

Devo il post a Edo, appena finisce la scuola diventerà … vedrete 💥

Il post precedente, quello che doveva essere definitivo, questo, ha, secondo qualcuno, il difetto di usare un linguaggio di nicchia. E si sa che se una cosa si può fare con un linguaggio molto probabilmente si può fare anche con quasi tutti gli altri. Allora perché non usare Python? O il C++? O NodeJS? O _________.

La versione con Python l’ha fatta lui, funziona, rockz! e allora la posto. Non c’è nemmeno bisogno di aggiungere spiegazioni, è tutto chiarissimissimo 😊

Solo una cosa: importa il modulo ast usato per determinare se il dato passato è un numero o una stringa. Il modulo è per “Abstract Syntax Trees”, non dei più comunemente usati; potenza di Stack Overflow che suggerisce (quasi) sempre la dritta giusta. 💥

#!/usr/bin/ env python

import ast, os, sys

def isnum(st):
    try:
        ast.literal_eval(st)
        return True
    except Exception:
        return False

if len(sys.argv) < 2:
    sys.exit()

d = sys.argv[1]
try:
    if isnum(d):
        ans = chr(int(d, 0))
    else:
        n = ord(d)
        ans = '{} {}'.format(n, hex(n))
except:
    ans = ''

cmd = "echo {} | xclip -f -selection clipboard".format(ans)
os.system(cmd)

Ah! ‘na roba nèh! Se non ci saranno grosse novità la telenoovela finisce qui. A parte eventuali nuovi linguaggi nei commenti.

:mrgreen: