NumPy – 20 – Aggregazioni: min, max e tutto il resto nel mezzo – 2


Continuo da qui a copiare qui.

Altre funzioni di aggregazione
NumPy provides many other aggregation functions, but we won’t discuss them in detail here. Additionally, most aggregates have a NaN-safe counterpart that computes the result while ignoring missing values, which are marked by the special IEEE floating-point NaN value (for a fuller discussion of missing data, see Handling Missing Data [prossimamente]). Some of these NaN-safe functions were not added until NumPy 1.8, so they will not be available in older NumPy versions.

The following table provides a list of useful aggregation functions available in NumPy:

Function Name NaN-safe Version Description
np.sum        np.nansum        Compute sum of elements       np.nanprod       Compute product of elements
np.mean       np.nanmean       Compute median of elements
np.std        np.nanstd        Compute standard deviation
np.var        np.nanvar        Compute variance
np.min        np.nanmin        Find minimum value
np.max        np.nanmax        Find maximum value
np.argmin     np.nanargmin     Find index of minimum value
np.argmax     np.nanargmax     Find index of maximum value
np.median     np.nanmedian     Compute median of elements
np.percentile np.nanpercentile Compute rank-based statistics of elements
np.any        N/A              Evaluate whether any elements are true
np.all        N/A              Evaluate whether all elements are true

We will see these aggregates often throughout the rest of the book.

Example: What is the Average Height of US Presidents?
Aggregates available in NumPy can be extremely useful for summarizing a set of values. As a simple example, let’s consider the heights of all US presidents. This data is available in the file president_heights.csv, which is a simple comma-separated list of labels and values:

Nota per me: il file president_heights.csv si trova qui; da mettere nella subdir data.


Now that we have this data array, we can compute a variety of summary statistics:


Note that in each case, the aggregation operation reduced the entire array to a single summarizing value, which gives us information about the distribution of values. We may also wish to compute quantiles:


We see that the median height of US presidents is 182 cm, or just shy of six feet.

Of course, sometimes it’s more useful to see a visual representation of this data, which we can accomplish using tools in Matplotlib (we’ll discuss Matplotlib more fully in Chapter 4 [prossimamente]).

Matplotlib non l’ho ancora usata e allora…


ci sono anche cose non trovate


e alla fine


Provo a configurare Matplotlib seguendo le indicazioni qui.

Ma ancora problemi; devo trovare il modo di precisare le integrazioni per inline.
Per intanto si può fare fuori da IPython, ecco il file

import numpy as np
import pandas as pd
data = pd.read_csv('data/president_heights.csv')
heights = np.array(data['height(cm)'])

import matplotlib.pyplot as plt
import seaborn; seaborn.set()  # set plot style

plt.title('Height Distribution of US Presidents')
plt.xlabel('height (cm)')

che con il comando python3 produce il grafico voluto:



Posta un commento o usa questo indirizzo per il trackback.



Inserisci i tuoi dati qui sotto o clicca su un'icona per effettuare l'accesso:

Logo di

Stai commentando usando il tuo account Chiudi sessione /  Modifica )

Google photo

Stai commentando usando il tuo account Google. Chiudi sessione /  Modifica )

Foto Twitter

Stai commentando usando il tuo account Twitter. Chiudi sessione /  Modifica )

Foto di Facebook

Stai commentando usando il tuo account Facebook. Chiudi sessione /  Modifica )

Connessione a %s...

Questo sito utilizza Akismet per ridurre lo spam. Scopri come vengono elaborati i dati derivati dai commenti.

%d blogger hanno fatto clic su Mi Piace per questo: