NumPy – 34 – manipolare dati con Pandas

Continuo da qui iniziando a copiare da un capitolo nuovo, qui.

Finora abbiamo visto gli oggetti di ndarray forniti da NumPy. Ma c’è il package Pandas: Pandas is a newer package built on top of NumPy, and provides an efficient implementation of a DataFrame. DataFrames are essentially multidimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data. As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs.

As we saw, NumPy’s ndarray data structure provides essential features for the type of clean, well-organized data typically seen in numerical computing tasks. While it serves this purpose very well, its limitations become clear when we need more flexibility (e.g., attaching labels to data, working with missing data, etc.) and when attempting operations that do not map well to element-wise broadcasting (e.g., groupings, pivots, etc.), each of which is an important piece of analyzing the less structured data available in many forms in the world around us. Pandas, and in particular its Series and DataFrame objects, builds on the NumPy array structure and provides efficient access to these sorts of “data munging” tasks that occupy much of a data scientist’s time.

We will focus on the mechanics of using Series, DataFrame, and related structures effectively. We will use examples drawn from real datasets where appropriate, but these examples are not necessarily the focus.

Installare Pandas
Basta seguire le istruzioni della documentazione.
Once Pandas is installed, you can import it and check the version:

Just as we generally import NumPy under the alias np, we will import Pandas under the alias pd:

Nota sulla documentazione
IPython gives you the ability to quickly explore the contents of a package (by using the tab-completion feature) as well as the documentation of various functions (using the ? character). (Refer back to Help and Documentation in IPython if you need a refresher on this.)
For example, to display all the contents of the pandas namespace, you can type pd.<TAB>:

Questo è solo l’inizio, lunghissimissimo l’elenco.
And to display Pandas’s built-in documentation, you can use this:

More detailed documentation, along with tutorials and other resources, can be found here.

:mrgreen:

Advertisements
Post a comment or leave a trackback: Trackback URL.

Trackbacks

Rispondi

Inserisci i tuoi dati qui sotto o clicca su un'icona per effettuare l'accesso:

Logo WordPress.com

Stai commentando usando il tuo account WordPress.com. Chiudi sessione / Modifica )

Foto Twitter

Stai commentando usando il tuo account Twitter. Chiudi sessione / Modifica )

Foto di Facebook

Stai commentando usando il tuo account Facebook. Chiudi sessione / Modifica )

Google+ photo

Stai commentando usando il tuo account Google+. Chiudi sessione / Modifica )

Connessione a %s...

%d blogger hanno fatto clic su Mi Piace per questo: