NumPy – 93 – introduzione a Scikit-Learn – 2

Continuo da qui, copio qui.

Estimator API di Scikit-Learn
The Scikit-Learn API is designed with the following guiding principles in mind, as outlined in the Scikit-Learn API paper:

  • Consistency: All objects share a common interface drawn from a limited set of methods, with consistent documentation.
  • Inspection: All specified parameter values are exposed as public attributes.
  • Limited object hierarchy: Only algorithms are represented by Python classes; datasets are represented in standard formats (NumPy arrays, Pandas DataFrames, SciPy sparse matrices) and parameter names use standard Python strings.
  • Composition: Many machine learning tasks can be expressed as sequences of more fundamental algorithms, and Scikit-Learn makes use of this wherever possible.
  • Sensible defaults: When models require user-specified parameters, the library defines an appropriate default value.

In practice, these principles make Scikit-Learn very easy to use, once the basic principles are understood. Every machine learning algorithm in Scikit-Learn is implemented via the Estimator API, which provides a consistent interface for a wide range of machine learning applications.

Basi delle API
Most commonly, the steps in using the Scikit-Learn estimator API are as follows (we will step through a handful of detailed examples in the sections that follow).

  • Choose a class of model by importing the appropriate estimator class from Scikit-Learn.
  • Choose model hyperparameters by instantiating this class with desired values.
  • Arrange data into a features matrix and target vector following the discussion above.
  • Fit the model to your data by calling the fit() method of the model instance.
  • Apply the Model to new data:
    For supervised learning, often we predict labels for unknown data using the predict() method.
    For unsupervised learning, we often transform or infer properties of the data using the transform() or predict() method.

We will now step through several simple examples of applying supervised and unsupervised learning methods.

Esempio supervisionato: regressione lineare semplice
As an example of this process, let’s consider a simple linear regression—that is, the common case of fitting a line to (x,y) data. We will use the following simple data for our regression example:

With this data in place, we can use the recipe outlined earlier. Let’s walk through the process:

1. scegliere una classe di modello
In Scikit-Learn, every class of model is represented by a Python class. So, for example, if we would like to compute a simple linear regression model, we can import the linear regression class:

Note that other more general linear regression models exist as well; you can read more about them in the sklearn.linear_model module documentation.

2. scegliere gli iperparametri del modello
An important point is that a class of model is not the same as an instance of a model.

Once we have decided on our model class, there are still some options open to us. Depending on the model class we are working with, we might need to answer one or more questions like the following:

  • Would we like to fit for the offset (i.e., y-intercept)?
  • Would we like the model to be normalized?
  • Would we like to preprocess our features to add model flexibility?
  • What degree of regularization would we like to use in our model?
  • How many model components would we like to use?

These are examples of the important choices that must be made once the model class is selected. These choices are often represented as hyperparameters, or parameters that must be set before the model is fit to data. In Scikit-Learn, hyperparameters are chosen by passing values at model instantiation. We will explore how you can quantitatively motivate the choice of hyperparameters in Hyperparameters and Model Validation [prossimamente].

For our linear regression example, we can instantiate the LinearRegression class and specify that we would like to fit the intercept using the fit_intercept hyperparameter:

Keep in mind that when the model is instantiated, the only action is the storing of these hyperparameter values. In particular, we have not yet applied the model to any data: the Scikit-Learn API makes very clear the distinction between choice of model and application of model to data.

3. organizzare i dati in una feature matrix e un target vector
Previously [post precedente] we detailed the Scikit-Learn data representation, which requires a two-dimensional features matrix and a one-dimensional target array. Here our target variable y is already in the correct form (a length-n_samples array), but we need to massage the data x to make it a matrix of size [n_samples, n_features]. In this case, this amounts to a simple reshaping of the one-dimensional array:

4. inserire il modello nei dati
Now it is time to apply our model to data. This can be done with the fit() method of the model:

This fit() command causes a number of model-dependent internal computations to take place, and the results of these computations are stored in model-specific attributes that the user can explore. In Scikit-Learn, by convention all model parameters that were learned during the fit() process have trailing underscores; for example in this linear model, we have the following:

These two parameters represent the slope and intercept of the simple linear fit to the data. Comparing to the data definition, we see that they are very close to the input slope of 2 and intercept of -1.

One question that frequently comes up regards the uncertainty in such internal model parameters. In general, Scikit-Learn does not provide tools to draw conclusions from internal model parameters themselves: interpreting model parameters is much more a statistical modeling question than a machine learning question. Machine learning rather focuses on what the model predicts. If you would like to dive into the meaning of fit parameters within the model, other tools are available, including the Statsmodels Python package.

5. predire le etichette per dati non conosciuti
Once the model is trained, the main task of supervised machine learning is to evaluate it based on what it says about new data that was not part of the training set. In Scikit-Learn, this can be done using the predict() method. For the sake of this example, our “new data” will be a grid of x values, and we will ask what y values the model predicts:

As before, we need to coerce these x values into a [n_samples, n_features] features matrix, after which we can feed it to the model:

Finally, let’s visualize the results by plotting first the raw data, and then this model fit:

Typically the efficacy of the model is evaluated by comparing its results to some known baseline, as we will see in the next example

Pausa ma poi si continua 😊

:mrgreen:

Posta un commento o usa questo indirizzo per il trackback.

Trackback

Rispondi

Inserisci i tuoi dati qui sotto o clicca su un'icona per effettuare l'accesso:

Logo di WordPress.com

Stai commentando usando il tuo account WordPress.com. Chiudi sessione /  Modifica )

Google photo

Stai commentando usando il tuo account Google. Chiudi sessione /  Modifica )

Foto Twitter

Stai commentando usando il tuo account Twitter. Chiudi sessione /  Modifica )

Foto di Facebook

Stai commentando usando il tuo account Facebook. Chiudi sessione /  Modifica )

Connessione a %s...

Questo sito utilizza Akismet per ridurre lo spam. Scopri come vengono elaborati i dati derivati dai commenti.

%d blogger hanno fatto clic su Mi Piace per questo: