NumPy – 61 – lavorare con Series temporali – 2

Continuo da qui, copio qui.

Series Pandas per indicizzare con il tempo
Where the Pandas time series tools really become useful is when you begin to index data by timestamps. For example, we can construct a Series object that has time indexed data:

Now that we have this data in a Series, we can make use of any of the Series indexing patterns we discussed in previous sections, passing values that can be coerced into dates:

There are additional special date-only indexing operations, such as passing a year to obtain a slice of all data from that year:

Later, we will see additional examples of the convenience of dates-as-indices. But first, a closer look at the available time series data structures.

Strutture Pandas per Series di tempo
This section will introduce the fundamental Pandas data structures for working with time series data:

  • For time stamps, Pandas provides the Timestamp type. As mentioned before, it is essentially a replacement for Python’s native datetime, but is based on the more efficient numpy.datetime64 data type. The associated Index structure is DatetimeIndex.
  • For time Periods, Pandas provides the Period type. This encodes a fixed-frequency interval based on numpy.datetime64. The associated index structure is PeriodIndex.
  • For time deltas or durations, Pandas provides the Timedelta type. Timedelta is a more efficient replacement for Python’s native datetime.timedelta type, and is based on numpy.timedelta64. The associated index structure is TimedeltaIndex.

The most fundamental of these date/time objects are the Timestamp and DatetimeIndex objects. While these class objects can be invoked directly, it is more common to use the pd.to_datetime() function, which can parse a wide variety of formats. Passing a single date to pd.to_datetime() yields a Timestamp; passing a series of dates by default yields a DatetimeIndex:

Nota: sono solo io o Jake non ha detto che ha importato datetime?

Any DatetimeIndex can be converted to a PeriodIndex with the to_period() function with the addition of a frequency code; here we’ll use ‘D’ to indicate daily frequency:

A TimedeltaIndex is created, for example, when a date is subtracted from another:

Sequenze regolari: pd.date_range()
To make the creation of regular date sequences more convenient, Pandas offers a few functions for this purpose: pd.date_range() for timestamps, pd.period_range() for periods, and pd.timedelta_range() for time deltas. We’ve seen that Python’s range() and NumPy’s np.arange() turn a startpoint, endpoint, and optional stepsize into a sequence. Similarly, pd.date_range() accepts a start date, an end date, and an optional frequency code to create a regular sequence of dates. By default, the frequency is one day:

Alternatively, the date range can be specified not with a start and endpoint, but with a startpoint and a number of periods:

The spacing can be modified by altering the freq argument, which defaults to D. For example, here we will construct a range of hourly timestamps:

To create regular sequences of Period or Timedelta values, the very similar pd.period_range() and pd.timedelta_range() functions are useful. Here are some monthly periods:

And a sequence of durations increasing by an hour:

All of these require an understanding of Pandas frequency codes, which we’ll summarize in the next section.


Post a comment or leave a trackback: Trackback URL.



Inserisci i tuoi dati qui sotto o clicca su un'icona per effettuare l'accesso:


Stai commentando usando il tuo account Chiudi sessione / Modifica )

Foto Twitter

Stai commentando usando il tuo account Twitter. Chiudi sessione / Modifica )

Foto di Facebook

Stai commentando usando il tuo account Facebook. Chiudi sessione / Modifica )

Google+ photo

Stai commentando usando il tuo account Google+. Chiudi sessione / Modifica )

Connessione a %s...

%d blogger hanno fatto clic su Mi Piace per questo: