Category Archives: Python

SciPy – 3 – creare arrays

Continuo da qui, copio qui.

You have now seen how to inspect your array and to make adjustments in the data type of it, but you haven’t explicitly seen hwo to create arrays. You should already know that you can use np.array() to to this, but there are other routines for array creation that you should know about: np.eye() and np.identity().

The np.eye() function allows you to create a square matrix with dimensions that are equal to the positive integer that you give as an argument to the function. The entries are generally filled with zeros, only the matrix diagonal is filled with ones. The np.identity() function works and does the same and also returns an identity array.

However, note that np.eye() can take an additional argument k that you can specify to pick the index of the diagonal that you want to populate with ones.

Nota per me che a volte la memoria… 😉: qui si trovano gli indici delle funzioni di NumPy e di SciPy, da bookmarkare prima di subito 😊

Other array creation functions that will most definitely come in handy when you’re working with the matrices for linear algebra are the following:

  • The np.arange() function creates an array with uniformly spaced values between two numbers. You can specify the spacing between the elements,
  • The latter also holds for np.linspace(), but with this function you specify the number of elements that you want in your array.
  • Lastly, the np.logspace() function also creates arrays with uniformly spaced values, but this time in a logarithmic scale. This means that the spacing is now logarithmical: two numbers are evenly spaced between the logarithms of these two to the base of 10.

Adesso dice la prof Karlijn: Now that you have refreshed your memory and you know how to handle the data types of your arrays, it’s time to also tackle the topic of indexing and slicing. Prossimamente.

:mrgreen:

SciPy – 2 – oggetti essenziali di NumPy

Continuo da qui, copio qui.

Un ripasso veloce di cose già viste.

An array is, structurally speaking, nothing but pointers. It’s a combination of a memory address, a data type, a shape and strides. It contains information about the raw data, how to locate an element and how to interpret an element.

The memory address and strides are important when you dive deeper into the lower-level details of arrays, while the data type and shape are things that beginners should surely know and understand. Two other attributes that you might want to consider are the data and size, which allow you to gather even more information on your array.

You’ll see in the results of the code that is included in the code chunk above that the data type of myArray is int64. When you’re intensively working with arrays, you will definitely remember that there are ways to convert your arrays from one data type to another with the astype() method.

Nevertheless, when you’re using SciPy and NumPy together, you might also find the following type handling NumPy functions very useful, especially when you’re working with complex numbers:

Try to add print() calls to see the results of the code that is given above. Then, you’ll see that complex numbers have a real and an imaginary part to them. The np.real() and np.imag() functions are designed to return these parts to the user, respectively.

Alternatively, you might also be able to use np.cast to cast an array object to a different data type, such as float in the example above.

The only thing that really stands out in difficulty in the above code chunk is the np.real_if_close() function. When you give it a complex input, such as myArray, you’ll get a real array back if the complex parts are close to zero. This last part, “close to 0”, can be adjusted by yourself with the tol argument that you can pass to the function.

OK? 😎 continua

:mrgreen:

SciPy – 1 – algebra lineare

Sono sempre fermo all’inizio perché chi ben comincia… 😉 e poi devo ancora decidere da dove devo copiare 😊, ci vorrebbe uno come Jake, lui spiega bene, rockz 🚀

Potrei partire da SciPy Reference Guide ma mi sembra un po’ troppo documentosa, forse è meglio il Scipy Tutorial: Vectors and Arrays (Linear Algebra) di Karlijn Willems.

Il tutorial di Karlijn mi sembra troppo corto, inoltre copre solo una parte di SciPy (a quanto vedo dall’indice della Reference) ma può essere un inizio. Anzi parto; poi si vedrà 😉 Intanto il solito mantra.

Continuo da qui, copio qui.

Much of what you need to know to really dive into machine learning is linear algebra, and that is exactly what this tutorial tackles. Today’s post goes over the linear algebra topics that you need to know and understand to improve your intuition for how and when machine learning methods work by looking at the level of vectors and matrices.

By the end of the tutorial, you’ll hopefully feel more confident to take a closer look at an algorithm!

Introduzione
Ho scorso con Jake tutto un notebook su NumPy, one of the core libraries for scientific computing in Python. This library contains a collection of tools and techniques that can be used to solve on a computer mathematical models of problems in Science and Engineering. Ma c’è SciPy, un package che ci consente prestazioni migliori, it’s a powerful data structure that allows you to efficiently compute arrays and matrices.

Now, SciPy is basically NumPy.

It’s also one of the core packages for scientific computing that provides mathematical algorithms and convenience functions, but it’s built on the NumPy extension of Python. This means that SciPy and NumPy are often used together.

Later on in this tutorial, it will become clear to you how the collaboration between these two libraries has become self-evident.

Interagire con NumPy e SciPy
To interact efficiently with both packages, you first need to know some of the basics of this library and its powerful data structure. To work with these arrays, there’s a huge amount of high-level mathematical functions operate on these matrices and arrays.

Vedremo adesso cosa serve per usare efficientemente SciPy. In essence, you have to know how about the array structure and how you can handle data types and how you can manipulate the shape of arrays. Ah! c’è un cheat sheet sia per NumPy che per SciPy.

Pausa 😊 in fondo ci stiamo ancora preparando alla partenza 😎

:mrgreen:

SciPy – inizio – 0

Questo post –e quelli che verranno– possono essere considerati una continuazione della serie relativa a NumPy (e Panda e Matplotlib e Scikit-Learn). Ma anche no, cioè non lo so. Davvero 😯

Dal sito di Scipy (a proposito per gli italiani si pronuncia “scipai“) salta fuori che i collegamenti con Numpy sono strettissimi. Ecco la prima cosa da approfondire sarà proprio in cosa i due packages si differenziano, come si inegrano, se ho fatto bene a raccontare (quasi cento posts) la telenovela di NumPy, se devo iniziare a seguire questa nuova di pakka.
Una nota: lo so che i ggiovani non usano il lemma “telenovela” ma io ho una certa età, sono senior direbbe Jakob Nielsen, portate passiensa.
Ah! un’altra nota ancora, per me questa: chissà se ritroverò Jake VanderPlas o uno come lui, un maestro che rockz 🚀 assay, che ti dice come fare, in modo chiaro tanto che riesca a farlo anch’io (me) 😉

Non so se anche voi avete quella che per me è una reazione istintiva (un riflesso pavloviano direbbe chi sa cosa vuol dire (probabilmente, o che comunque suona bane)) quando sentite di una cosa nuova viene spontaneo dire: “cosa ne dice la Wiki?”, come faremmo senza (auto-cit).

La Wiki dice appunto:

SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.

SciPy builds on the NumPy array object and is part of the NumPy stack which includes tools like Matplotlib, pandas and SymPy, and an expanding set of scientific computing libraries. This NumPy stack has similar users to other applications such as MATLAB, GNU Octave, and Scilab. The NumPy stack is also sometimes referred to as the SciPy stack.

E poi:

In the 1990s, Python was extended to include an array type for numerical computing called Numeric (This package was eventually replaced by Travis Oliphant who wrote NumPy in 2006 as a blending of Numeric and Numarray which had been started in 2001). As of 2000, there was a growing number of extension modules and increasing interest in creating a complete environment for scientific and technical computing. In 2001, Travis Oliphant, Eric Jones, and Pearu Peterson merged code they had written and called the resulting package SciPy. The newly created package provided a standard collection of common numerical operations on top of the Numeric array data structure. Shortly thereafter, Fernando Pérez released IPython, an enhanced interactive shell widely used in the technical computing community, and John Hunter released the first version of Matplotlib, the 2D plotting library for technical computing. Since then the SciPy environment has continued to grow with more packages and tools for technical computing.

Ma mica finisce qui; in effetti ho un altro riflesso pavloviano (di ordine superiore?): Quora. E –quando si dice la combinazione– ecco: What is the difference between NumPy and SciPy?
Due risposte, ecco la prima breve ma densa:

NumPy is part of SciPy, and basically just defines arrays along with some basic numerical functions. The rest of SciPy implements stuff like numerical integration and optimization and machine learning using NumPy’s functionality.

OK, mi sa che sono su qualcosa di sexy, da fare. Prossimamente. Senza dilazionare. Subito. Dopo la pausa 😉

:mrgreen:

NumPy – 99 – tante altre risorse – 3

GfBo

Continuo da qui, nell’esame di altre risorse.

Theano
Theano is a Python library that lets you to define, optimize, and evaluate mathematical expressions, especially ones with multi-dimensional arrays (numpy.ndarray). Using Theano it is possible to attain speeds rivaling hand-crafted C implementations for problems involving large amounts of data. It can also surpass C on a CPU by many orders of magnitude by taking advantage of recent GPUs.

Installato via Conda ma mi da errori (di versione?). Da mettere tra le cose da esaminare in futuro (davvero, prossimamente… forse 😊).

SciPy
SciPy is a collection of mathematical algorithms and convenience functions built on the Numpy extension of Python. It adds significant power to the interactive Python session by providing the user with high-level commands and classes for manipulating and visualizing data. With SciPy an interactive Python session becomes a data-processing and system-prototyping environment rivaling systems such as MATLAB, IDL, Octave, R-Lab, and SciLab.

#esempio minimo do SciPy

import numpy as np

from scipy import linalg, optimize

np.info(optimize.fmin)

Produce questo file:

 fmin(func, x0, args=(), xtol=0.0001, ftol=0.0001, maxiter=None, maxfun=None,
      full_output=0, disp=1, retall=0, callback=None, initial_simplex=None)

Minimize a function using the downhill simplex algorithm.

This algorithm only uses function values, not derivatives or second
derivatives.

Parameters
----------
func : callable func(x,*args)
    The objective function to be minimized.
x0 : ndarray
    Initial guess.
args : tuple, optional
    Extra arguments passed to func, i.e. ``f(x,*args)``.
xtol : float, optional
    Absolute error in xopt between iterations that is acceptable for
    convergence.
ftol : number, optional
    Absolute error in func(xopt) between iterations that is acceptable for
    convergence.
maxiter : int, optional
    Maximum number of iterations to perform.
maxfun : number, optional
    Maximum number of function evaluations to make.
full_output : bool, optional
    Set to True if fopt and warnflag outputs are desired.
disp : bool, optional
    Set to True to print convergence messages.
retall : bool, optional
    Set to True to return list of solutions at each iteration.
callback : callable, optional
    Called after each iteration, as callback(xk), where xk is the
    current parameter vector.
initial_simplex : array_like of shape (N + 1, N), optional
    Initial simplex. If given, overrides `x0`.
    ``initial_simplex[j,:]`` should contain the coordinates of
    the j-th vertex of the ``N+1`` vertices in the simplex, where
    ``N`` is the dimension.

Returns
-------
xopt : ndarray
    Parameter that minimizes function.
fopt : float
    Value of function at minimum: ``fopt = func(xopt)``.
iter : int
    Number of iterations performed.
funcalls : int
    Number of function calls made.
warnflag : int
    1 : Maximum number of function evaluations made.
    2 : Maximum number of iterations reached.
allvecs : list
    Solution at each iteration.

See also
--------
minimize: Interface to minimization algorithms for multivariate
    functions. See the 'Nelder-Mead' `method` in particular.

Notes
-----
Uses a Nelder-Mead simplex algorithm to find the minimum of function of
one or more variables.

This algorithm has a long history of successful use in applications.
But it will usually be slower than an algorithm that uses first or
second derivative information. In practice it can have poor
performance in high-dimensional problems and is not robust to
minimizing complicated functions. Additionally, there currently is no
complete theory describing when the algorithm will successfully
converge to the minimum, or how fast it will if it does. Both the ftol and
xtol criteria must be met for convergence.

References
----------
.. [1] Nelder, J.A. and Mead, R. (1965), "A simplex method for function
       minimization", The Computer Journal, 7, pp. 308-313

.. [2] Wright, M.H. (1996), "Direct Search Methods: Once Scorned, Now
       Respectable", in Numerical Analysis 1995, Proceedings of the
       1995 Dundee Biennial Conference in Numerical Analysis, D.F.
       Griffiths and G.A. Watson (Eds.), Addison Wesley Longman,
       Harlow, UK, pp. 191-208.

Anche questo da esaminare in dettaglio prossimamente 😊

Le risorse disponibili per SciPy sono  infinite  tantissime. Roba da Ok, panico! fin da prima di cominciare 😯
Ma mi serve un attimo di riflessione, riorganizzare le idee. E ci sono anche altre cose che mi stanno tentando… 😯

:mrgreen:

NumPy – 98 – tante altre risorse – 2

Continuo da qui, nell’esame di altre risorse.

Bokeh
Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.

C’è tutto, la User Guide, la Gallery di esempi, la Reference Guide e, per chi vuole contribuire la Developer Guide.

Non l’ho installato ma da una rapida scorsa alla documentazione si scopre che può essere utilizzato offline, fuori dal Web, nel modo tradizionale che ormai usiamo in pochi 😯

VisPy
VisPy is a Python library for interactive scientific visualization that is designed to be fast, scalable, and easy to use.

Anche qui c’è tutto quello che serve


Vega & Vega-Lite
Visualization Grammars.
Vega is a declarative format for creating, saving, and sharing visualization designs. With Vega, visualizations are described in JSON, and generate interactive views using either HTML5 Canvas or SVG.

Inseriti nella rassegna anche se qui non siamo più con Python.
Ci sono componenti aggiuntivi da terzi:

ggvis is a data visualization package for R that renders web-based visualizations using Vega. It features a syntax similar in spirit to ggplot2.

Vega.jl uses the Julia programming language to generate spec-compliant Vega 2.x visualizations. Vega.jl is integrated with Jupyter Notebook, and provides a high-quality visualization experience for scientific computing.

The MediaWiki Graph extension allows you to embed Vega visualizations on MediaWiki sites, including Wikipedia.

Cedar integrates Vega with the GeoServices from ArcGIS. It adds templated documents for reusable charts that programatically bind to new data sources.

e tanti altri, tra cui Python, via Altair (post precedente).

Anche questo sembra OK, ottima la modalità interattiva nel browser (l’immagine viene da lì). Però (da considerare per qualcuno) si esce da Python, altri linguaggi da imparare.

:mrgreen:

NumPy – 97 – tante altre risorse – 1

Continuo da qui, alla ricerca di altre risorse.

Considero solo i componenti free, ce ne sono tanti. Poi, ovviamente, se diventa un’occupazione importante occorrerà approfondire, valutando caso per caso.

Non seguo un ordine logico –troppo impegnativo– ma cronologico (per me, l’ordine temporale di quando mi è stato detto (anche se a me nessuno dice mai niente 😡 (auto-cit.))).

ggplot A package for plotting in Python
Making plots is a very repetetive: draw this line, add these colored points, then add these, etc. Instead of re-using the same code over and over, ggplot implements them using a high-level but very expressive API. The result is less time spent creating your charts, and more time interpreting what they mean.

ggplot is not a good fit for people trying to make highly customized data visualizations. While you can make some very intricate, great looking plots, ggplot sacrafices highly customization in favor of generall doing “what you’d expect”.

ggplot has a symbiotic relationship with pandas. If you’re planning on using ggplot, it’s best to keep your data in DataFrames. Think of a DataFrame as a tabular data object. For example, let’s look at the diamonds dataset which ships with ggplot.

Gli script sono sempre molto brevi, essenziali. Però –imho– niente di nuovo; anzi cose che avevo preparato con Gnuplot (anticamente).
Nota: attenzione agli URLs del sito: parecchi non sono aggiornati.

HoloViews
Stop plotting your data – annotate your data and let it visualize itself.

HoloViews is a Python library that makes analyzing and visualizing scientific or engineering data much simpler, more intuitive, and more easily reproducible. Instead of specifying every step for each plot, HoloViews lets you store your data in an annotated format that is instantly visualizable, with immediate access to both the numeric data and its visualization. Examples of how HoloViews is used in Python scripts as well as in live Jupyter Notebooks may be accessed directly from the holoviews-contrib repository. Here is a quick example of HoloViews in action:

Ho resistito alla tentazione di installarlo e investire un po’ di tempo; ma in questi casi ricordare sempre la legge di Hofstadter 😜

Altair
Altair is a declarative statistical visualization library for Python.
Altair is developed by Brian Granger and Jake Vanderplas in close collaboration with the UW Interactive Data Lab.
With Altair, you can spend more time understanding your data and its meaning. Altair’s API is simple, friendly and consistent and built on top of the powerful Vega-Lite JSON specification. This elegant simplicity produces beautiful and effective visualizations with a minimal amount of code.

Sembra bello, purtroppo c’è questa nota: Altair’s documentation is currently in a very incomplete form; we are in the process of creating more comprehensive documentation. Stay tuned!
Ma ci sono anche: Altair’s Documentation Site e Altair’s Tutorial Notebooks.
Vega (standard e Lite) sono in lista, prossimamente… 😯

Seaborn
Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.

L’ho già usato ripetutamente copiando Jake VanderPlas. Mi sembra davvero invitante, chissà… 😯

:mrgreen:

NumPy – 96 – scikit-learn

Continuo da qui, seguendo i suggerimenti.

Il sito di scikit-learn è una miniera, se si devono usare strumenti racontati nei post precedenti da Jake VanderPlas è il posto giusto da fiondarsi e approfondire.
C’è un introduzione per i niubbi come me, c’è una ricca documentazione, ci sono esempi 😊

Davvero non resisto, devo provarne qualcuno. Per esempio l’Isotonic Regression:

The isotonic regression finds a non-decreasing approximation of a function while minimizing the mean squared error on the training data. The benefit of such a model is that it does not assume any form for the target function such as linearity. For comparison a linear regression is also presented.

# Author: Nelle Varoquaux <nelle.varoquaux@gmail.com>
#         Alexandre Gramfort <alexandre.gramfort@inria.fr>
# License: BSD

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection

from sklearn.linear_model import LinearRegression
from sklearn.isotonic import IsotonicRegression
from sklearn.utils import check_random_state

n = 100
x = np.arange(n)
rs = check_random_state(0)
y = rs.randint(-50, 50, size=(n,)) + 50. * np.log(1 + np.arange(n))

#Fit IsotonicRegression and LinearRegression models

ir = IsotonicRegression()

y_ = ir.fit_transform(x, y)

lr = LinearRegression()
lr.fit(x[:, np.newaxis], y)  # x needs to be 2d for LinearRegression

segments = [[[i, y[i]], [i, y_[i]]] for i in range(n)]
lc = LineCollection(segments, zorder=0)
lc.set_array(np.ones(len(y)))
lc.set_linewidths(0.5 * np.ones(n))

fig = plt.figure()
plt.plot(x, y, 'r.', markersize=12)
plt.plot(x, y_, 'g.-', markersize=12)
plt.plot(x, lr.predict(x[:, np.newaxis]), 'b-')
plt.gca().add_collection(lc)
plt.legend(('Data', 'Isotonic Fit', 'Linear Fit'), loc='lower right')
plt.title('Isotonic regression')

fig.savefig("np895.png")

Uh! qualcosa di meno semplice (non che il precedente sia elementare, ma ancora ci arrivo), ecco Plot the decision surfaces of ensembles of trees on the iris dataset

Plot the decision surfaces of forests of randomized trees trained on pairs of features of the iris dataset.

This plot compares the decision surfaces learned by a decision tree classifier (first column), by a random forest classifier (second column), by an extra- trees classifier (third column) and by an AdaBoost classifier (fourth column).

In the first row, the classifiers are built using the sepal width and the sepal length features only, on the second row using the petal length and sepal length only, and on the third row using the petal width and the petal length only.

In descending order of quality, when trained (outside of this example) on all 4 features using 30 estimators and scored using 10 fold cross validation, we see:

Increasing max_depth for AdaBoost lowers the standard deviation of the scores (but the average score does not improve).

See the console’s output for further details about each model.

In this example you might try to:

  • vary the max_depth for the DecisionTreeClassifier and AdaBoostClassifier, perhaps try max_depth=3 for the DecisionTreeClassifier or max_depth=None for AdaBoostClassifier
  • vary n_estimators

It is worth noting that RandomForests and ExtraTrees can be fitted in parallel on many cores as each tree is built independently of the others. AdaBoost’s samples are built sequentially and so do not use multiple cores.

"""
====================================================================
Plot the decision surfaces of ensembles of trees on the iris dataset
====================================================================

Plot the decision surfaces of forests of randomized trees trained on pairs of
features of the iris dataset.

This plot compares the decision surfaces learned by a decision tree classifier
(first column), by a random forest classifier (second column), by an extra-
trees classifier (third column) and by an AdaBoost classifier (fourth column).

In the first row, the classifiers are built using the sepal width and the sepal
length features only, on the second row using the petal length and sepal length
only, and on the third row using the petal width and the petal length only.

In descending order of quality, when trained (outside of this example) on all
4 features using 30 estimators and scored using 10 fold cross validation, we see::

    ExtraTreesClassifier()  # 0.95 score
    RandomForestClassifier()  # 0.94 score
    AdaBoost(DecisionTree(max_depth=3))  # 0.94 score
    DecisionTree(max_depth=None)  # 0.94 score

Increasing `max_depth` for AdaBoost lowers the standard deviation of the scores (but
the average score does not improve).

See the console's output for further details about each model.

In this example you might try to:

1) vary the ``max_depth`` for the ``DecisionTreeClassifier`` and
   ``AdaBoostClassifier``, perhaps try ``max_depth=3`` for the
   ``DecisionTreeClassifier`` or ``max_depth=None`` for ``AdaBoostClassifier``
2) vary ``n_estimators``

It is worth noting that RandomForests and ExtraTrees can be fitted in parallel
on many cores as each tree is built independently of the others. AdaBoost's
samples are built sequentially and so do not use multiple cores.
"""
print(__doc__)

import numpy as np
import matplotlib.pyplot as plt

from sklearn import clone
from sklearn.datasets import load_iris
from sklearn.ensemble import (RandomForestClassifier, ExtraTreesClassifier,
                              AdaBoostClassifier)
from sklearn.externals.six.moves import xrange
from sklearn.tree import DecisionTreeClassifier

# Parameters
n_classes = 3
n_estimators = 30
plot_colors = "ryb"
cmap = plt.cm.RdYlBu
plot_step = 0.02  # fine step width for decision surface contours
plot_step_coarser = 0.5  # step widths for coarse classifier guesses
RANDOM_SEED = 13  # fix the seed on each iteration

# Load data
iris = load_iris()

plot_idx = 1

models = [DecisionTreeClassifier(max_depth=None),
          RandomForestClassifier(n_estimators=n_estimators),
          ExtraTreesClassifier(n_estimators=n_estimators),
          AdaBoostClassifier(DecisionTreeClassifier(max_depth=3),
                             n_estimators=n_estimators)]

for pair in ([0, 1], [0, 2], [2, 3]):
    for model in models:
        # We only take the two corresponding features
        X = iris.data[:, pair]
        y = iris.target

        # Shuffle
        idx = np.arange(X.shape[0])
        np.random.seed(RANDOM_SEED)
        np.random.shuffle(idx)
        X = X[idx]
        y = y[idx]

        # Standardize
        mean = X.mean(axis=0)
        std = X.std(axis=0)
        X = (X - mean) / std

        # Train
        clf = clone(model)
        clf = model.fit(X, y)

        scores = clf.score(X, y)
        # Create a title for each column and the console by using str() and
        # slicing away useless parts of the string
        model_title = str(type(model)).split(".")[-1][:-2][:-len("Classifier")]
        model_details = model_title
        if hasattr(model, "estimators_"):
            model_details += " with {} estimators".format(len(model.estimators_))
        print( model_details + " with features", pair, "has a score of", scores )

        plt.subplot(3, 4, plot_idx)
        if plot_idx <= len(models):
            # Add a title at the top of each column
            plt.title(model_title)

        # Now plot the decision boundary using a fine mesh as input to a
        # filled contour plot
        x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
        y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
        xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                             np.arange(y_min, y_max, plot_step))

        # Plot either a single DecisionTreeClassifier or alpha blend the
        # decision surfaces of the ensemble of classifiers
        if isinstance(model, DecisionTreeClassifier):
            Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
            Z = Z.reshape(xx.shape)
            cs = plt.contourf(xx, yy, Z, cmap=cmap)
        else:
            # Choose alpha blend level with respect to the number of estimators
            # that are in use (noting that AdaBoost can use fewer estimators
            # than its maximum if it achieves a good enough fit early on)
            estimator_alpha = 1.0 / len(model.estimators_)
            for tree in model.estimators_:
                Z = tree.predict(np.c_[xx.ravel(), yy.ravel()])
                Z = Z.reshape(xx.shape)
                cs = plt.contourf(xx, yy, Z, alpha=estimator_alpha, cmap=cmap)

        # Build a coarser grid to plot a set of ensemble classifications
        # to show how these are different to what we see in the decision
        # surfaces. These points are regularly space and do not have a black outline
        xx_coarser, yy_coarser = np.meshgrid(np.arange(x_min, 
                     x_max, plot_step_coarser),np.arange(y_min,
                     y_max, plot_step_coarser))
        Z_points_coarser = model.predict(np.c_[xx_coarser.ravel(),
                     yy_coarser.ravel()]).reshape(xx_coarser.shape)
        cs_points = plt.scatter(xx_coarser, yy_coarser, s=15,
                     c=Z_points_coarser, cmap=cmap,edgecolors="none")

        # Plot the training points, these are clustered together and have a
        # black outline
        for i, c in zip(xrange(n_classes), plot_colors):
            idx = np.where(y == i)
            plt.scatter(X[idx, 0], X[idx, 1], c=c, label=iris.target_names[i],
                        cmap=cmap)

        plot_idx += 1  # move on to the next plot in sequence

plt.suptitle("Classifiers on feature subsets of the Iris dataset")
plt.axis("tight")

plt.savefig("np896.png")

Ahemmm… continuo appena mi riprendo 😊, perché 😎

:mrgreen:

NumPy – 95 – Altre risorse di Machine Learning

Continuo da qui, copio qui.

OK, non ho continuato l’esame puntuale (in pratica copiare tutto) dell’ottimo notebook di Jake VanderPlas, parte del codice è ormai vecchio.
Però uno sguardo ai consigli finali sugli approfondimenti. E poi –ma ci vorrà un po’ di tempo– un esame a qualcuno di queste risorse.

Machine Learning in Python
To learn more about machine learning in Python, I’d suggest some of the following resources:

  • The Scikit-Learn website: The Scikit-Learn website has an impressive breadth of documentation and examples covering some of the models discussed here, and much, much more. If you want a brief survey of the most important and often-used machine learning algorithms, this website is a good place to start.
  • SciPy, PyCon, and PyData tutorial videos: Scikit-Learn and other machine learning topics are perennial favorites in the tutorial tracks of many Python-focused conference series, in particular the PyCon, SciPy, and PyData conferences. You can find the most recent ones via a simple web search.
  • Introduction to Machine Learning with Python: Written by Andreas C. Mueller and Sarah Guido, this book includes a fuller treatment of the topics in this chapter. If you’re interested in reviewing the fundamentals of Machine Learning and pushing the Scikit-Learn toolkit to its limits, this is a great resource, written by one of the most prolific developers on the Scikit-Learn team.
  • Python Machine Learning: Sebastian Raschka’s book focuses less on Scikit-learn itself, and more on the breadth of machine learning tools available in Python. In particular, there is some very useful discussion on how to scale Python-based machine learning approaches to large and complex datasets.

Machine learning in generale
Of course, machine learning is much broader than just the Python world. There are many good resources to take your knowledge further, and here I will highlight a few that I have found useful:

  • Machine Learning: Taught by Andrew Ng (Coursera), this is a very clearly-taught free online course which covers the basics of machine learning from an algorithmic perspective. It assumes undergraduate-level understanding of mathematics and programming, and steps through detailed considerations of some of the most important machine learning algorithms. Homework assignments, which are algorithmically graded, have you actually implement some of these models yourself.
  • Pattern Recognition and Machine Learning: Written by Christopher Bishop, this classic technical text covers the concepts of machine learning discussed in this chapter in detail. If you plan to go further in this subject, you should have this book on your shelf.
  • Machine Learning: a Probabilistic Perspective: Written by Kevin Murphy, this is an excellent graduate-level text that explores nearly all important machine learning algorithms from a ground-up, unified probabilistic perspective.

These resources are more technical than the material presented in this book, but to really understand the fundamentals of these methods requires a deep dive into the mathematics behind them. If you’re up for the challenge and ready to bring your data science to the next level, don’t hesitate to dive-in!

Inoltre ho un elenco di altro ancora, roba da farci su un post, prossimamente.

:mrgreen:

NumPy – 94 – introduzione a Scikit-Learn – 3


Continuo da qui, copio qui.

L’esempio di learning supervisionato: la classificazione degli Iris di usa metodi deprecati e non più presenti in sklearn. Lo salto, rimandandolo a un’occasione più approfondita. Passo al punto successivo, sperando… 😊

No! anche l’Applicazione: esplorare numeri scritti a mano usa metodi deprecati. A questo punto serve una pausa di approfondimento e riflessione prima di continuare la serie. Chissà… 😊

Ho controllato –googlato, stackoverflowato– ma nessuna dritta su aggiornamenti dell’ottimo notebook di Jake. Prima o poi ne uscirà una nuova versione aggiornata –forse.

Nel frattempo il suo esame lo considero terminato.
Restano da vedere i suggerimenti, gli approcci alternativi indicati qui e là –prossimamente.
Un’ulteriore nota (non tanto mia): sì, Python è OK, i packages visti sono ottimi ma in pratica, nella prassi corrente, non è che queste cose sono così fondamentali e poi sarebbero nuove, occorrerebbe cambiare strumenti collaudati che si usano abitualmente. Ah! una cosa ancora: il tutto dev’essere fatto per Windows (ormai 10, raramente 7). Guarda che Python, IPython, NumPy &co. sono OS-agnostici, eventualmente con piccolissimi aggiustamenti per i comandi o la creazione di icone per ingombrare il desktop. E intanto scappano tutti (kwasy) sul Web. E sul mobile.

:mrgreen: