Category Archives: Python

Bisogna essere precisi

mib4

Jake VanderPlas rockz! 🚀 e oggi recentemente ne ha dato un’ulteriore dimostrazione, con questo tweet.

Si sa che io ci metto un po’ a capire le cose e poi c’è il rischio che me le dimentico e allora ho deciso di mettere giù tutta la storia, qui in questo post.

Intanto con l’installazione di Anaconda (sempre per merito|colpa di Jake (e del suo corso su NumPy)) sono cambiati i comandi, ecco:

i0

Questo comporterà di ridefinire la shebang in caso di proteste degli script, l’evilussione 👿

OK, sono pronto per ripetere il test di Jake:

i1

Ecco 💥 errore! 😡

Twitter impone la concisione, a volte troppo. E Jake da per scontato che chi lo segue sia smart come lui. E anche i commentatori, tutti. Mi lasciano solo 😯

Però, dai, l’avevo già sentita questa raccomandazione: import * neanche una volta; che si potrebbe anche dire, con parole mie mai import *.
Verifico:

i2

Ecco, proprio come dice Jake: import *. Not even once.
Perché sum viene ridefinita in un modulo di Numpy, con altri argomenti.
OK, l’informatica e il mondo sono salvi; e io posso pensare ad altro 🎶 🎵 😄
:mrgreen:

NumPy – 13 – Le basi degli arrays di NumPy – 2

and-ng

Continuo dal post precedente, sempre copiando qui.

Suddividere arrays, costruire sub-arrays
Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the slice notation, marked by the colon (:) character. The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array x, use this:

x[start:stop:step]

np77

A potentially confusing case is when the step value is negative. In this case, the defaults for start and stop are swapped. This becomes a convenient way to reverse an array:

np78

Sub-arrays con arrays-multidimensionali
Multi-dimensional slices work in the same way, with multiple slices separated by commas.

np79

Finally, subarray dimensions can even be reversed together:

np80

Accedere a righe e colonne dell’array
One commonly needed routine is accessing of single rows or columns of an array. This can be done by combining indexing and slicing, using an empty slice marked by a single colon (:):

np81

In the case of row access, the empty slice can be omitted for a more compact syntax:

np82

Visualizzazione di subarrays senza copiare
One important –and extremely useful– thing to know about array slices is that they return views rather than copies of the array data. This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies. Consider our two-dimensional array from before:

np83

Now if we modify this subarray, we’ll see that the original array is changed! Observe:

np84

This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.

Creare copie di arrays
Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the copy() method:

np85

If we now modify this subarray, the original array is not touched:

np86

Continua 😀
:mrgreen:

NumPy – 12 – Le basi degli arrays di NumPy – 1

larry

Copio qui, continuando da qui.

Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools like Pandas (Chapter 3 [prossimamente]) are built around the NumPy array. This section will present several examples of using NumPy array manipulation to access data and subarrays, and to split, reshape, and join the arrays. While the types of operations shown here may seem a bit dry and pedantic, they comprise the building blocks of many other examples used throughout the book. Get to know them well!

We’ll cover a few categories of basic array manipulations here:

  • Attributes of arrays: Determining the size, shape, memory consumption, and data types of arrays
  • Indexing of arrays: Getting and setting the value of individual array elements
  • Slicing of arrays: Getting and setting smaller subarrays within a larger array
  • Reshaping of arrays: Changing the shape of a given array
    Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many

Attributi degli arrays di NumPy
First let’s discuss some useful array attributes. We’ll start by defining three random arrays, a one-dimensional, two-dimensional, and three-dimensional array. We’ll use NumPy’s random number generator, which we will seed with a set value in order to ensure that the same random arrays are generated each time this code is run:

np68

Each array has attributes ndim (the number of dimensions), shape (the size of each dimension), and size (the total size of the array):

np69

Another useful attribute is the dtype, the data type of the array (which we discussed previously [post precedente]):

np70

Other attributes include itemsize, which lists the size (in bytes) of each array element, and nbytes, which lists the total size (in bytes) of the array:

np71

Indicizzazione degli arrays, accedere singoli elementi
If you are familiar with Python’s standard list indexing, indexing in NumPy will feel quite familiar. In a one-dimensional array, the ith value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with Python lists:

np72

To index from the end of the array, you can use negative indices:

np73

In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices:

np74

Values can also be modified using any of the above index notation:

np75

Keep in mind that, unlike Python lists, NumPy arrays have a fixed type. This means, for example, that if you attempt to insert a floating-point value to an integer array, the value will be silently truncated. Don’t be caught unaware by this behavior!

np76

La storia è ancora lunga, pausa 😉

:mrgreen:

NumPy – 11 – Comprendere i tipi di dati in Python

c4q

Continuando da qui copio qui.

Un post teorico, mi sa che per capirci qualcosa quando lo rivedrò devo copiare tutto –o almeno parecchio 😙

Effective data-driven science and computation requires understanding how data is stored and manipulated. This section outlines and contrasts how arrays of data are handled in the Python language itself, and how NumPy improves on this. Understanding this difference is fundamental to understanding much of the material throughout the rest of the book.

Users of Python are often drawn-in by its ease of use, one piece of which is dynamic typing. While a statically-typed language like C or Java requires each variable to be explicitly declared, a dynamically-typed language like Python skips this specification. For example, in C you might specify a particular operation as follows:

/* C code */
int result = 0;
for(int i=0; i<100; i++){
    result += i;
}

While in Python the equivalent operation could be written this way:

# Python code
result = 0
for i in range(100):
    result += i

Notice the main difference: in C, the data types of each variable are explicitly declared, while in Python the types are dynamically inferred. This means, for example, that we can assign any kind of data to any variable:

# Python code
x = 4
x = "four"

Here we’ve switched the contents of x from an integer to a string. The same thing in C would lead (depending on compiler settings) to a compilation error or other unintented consequences:

/* C code */
int x = 4;
x = "four";  // FAILS

This sort of flexibility is one piece that makes Python and other dynamically-typed languages convenient and easy to use. Understanding how this works is an important piece of learning to analyze data efficiently and effectively with Python. But what this type-flexibility also points to is the fact that Python variables are more than just their value; they also contain extra information about the type of the value. We’ll explore this more in the sections that follow.

L’integer di Python è più di un integer
The standard Python implementation is written in C. This means that every Python object is simply a cleverly-disguised C structure, which contains not only its value, but other information as well. For example, when we define an integer in Python, such as x = 10000, x is not just a “raw” integer. It’s actually a pointer to a compound C structure, which contains several values. Looking through the Python 3.4 source code, we find that the integer (long) type definition effectively looks like this (once the C macros are expanded):

struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};

A single integer in Python 3.4 actually contains four pieces:

  • ob_refcnt, a reference count that helps Python silently handle memory allocation and deallocation
  • ob_type, which encodes the type of the variable
  • ob_size, which specifies the size of the following data members
  • ob_digit, which contains the actual integer value that we expect the Python variable to represent.

This means that there is some overhead in storing an integer in Python as compared to an integer in a compiled language like C, as illustrated in the following figure:

np54

Here PyObject_HEAD is the part of the structure containing the reference count, type code, and other pieces mentioned before.

Notice the difference here: a C integer is essentially a label for a position in memory whose bytes encode an integer value. A Python integer is a pointer to a position in memory containing all the Python object information, including the bytes that contain the integer value. This extra information in the Python integer structure is what allows Python to be coded so freely and dynamically. All this additional information in Python types comes at a cost, however, which becomes especially apparent in structures that combine many of these objects.

La list di Python è più di una list
Let’s consider now what happens when we use a Python data structure that holds many Python objects. The standard mutable multi-element container in Python is the list. We can create a list of integers as follows:

np55

Because of Python’s dynamic typing, we can even create heterogeneous lists:

np56

But this flexibility comes at a cost: to allow these flexible types, each item in the list must contain its own type info, reference count, and other information –that is, each item is a complete Python object. In the special case that all variables are of the same type, much of this information is redundant: it can be much more efficient to store data in a fixed-type array. The difference between a dynamic-type list and a fixed-type (NumPy-style) array is illustrated in the following figure:

np57

At the implementation level, the array essentially contains a single pointer to one contiguous block of data. The Python list, on the other hand, contains a pointer to a block of pointers, each of which in turn points to a full Python object like the Python integer we saw earlier. Again, the advantage of the list is flexibility: because each list element is a full structure containing both data and type information, the list can be filled with data of any desired type. Fixed-type NumPy-style arrays lack this flexibility, but are much more efficient for storing and manipulating data.

Fixed-type arrays in Python
Python offers several different options for storing data in efficient, fixed-type data buffers. The built-in array module (available since Python 3.3) can be used to create dense arrays of a uniform type:

np58

Here 'i' is a type code indicating the contents are integers.
Much more useful, however, is the ndarray object of the NumPy package. While Python’s array object provides efficient storage of array-based data, NumPy adds to this efficient operations on that data. We will explore these operations in later sections; here we’ll demonstrate several ways of creating a NumPy array.
We’ll start with the standard NumPy import, under the alias np:

np59

Creare arrays da liste Python
First, we can use np.array to create arrays from Python lists:

np60

Remember that unlike Python lists, NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible (here, integers are up-cast to floating point):

np61

If we want to explicitly set the data type of the resulting array, we can use the dtype keyword:

np62

Finally, unlike Python lists, NumPy arrays can explicitly be multi-dimensional; here’s one way of initializing a multidimensional array using a list of lists:

np63

The inner lists are treated as rows of the resulting two-dimensional array.

Creare nuovi arrays
Especially for larger arrays, it is more efficient to create arrays from scratch using routines built into NumPy. Here are several examples:

np64

e –mica finito 😉

np65

e ancora

np66

e infine

np67

Tipi standard di NumPy
NumPy arrays contain values of a single type, so it is important to have detailed knowledge of those types and their limitations. Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.

The standard NumPy data types are listed in the following table. Note that when constructing an array, they can be specified using a string:

np.zeros(10, dtype='int16')

Or using the associated NumPy object:

np.zeros(10, dtype=np.int16)

Data type  Description
bool_      Boolean (True or False) stored as a byte
int_       Default integer type (same as C long; normally either int64 or int32)
intc       Identical to C int (normally int32 or int64)
intp       Integer used for indexing (same as C ssize_t; normally either int32 or int64)
int8       Byte (-128 to 127)
int16      Integer (-32768 to 32767)
int32      Integer (-2147483648 to 2147483647)
int64      Integer (-9223372036854775808 to 9223372036854775807)
uint8      Unsigned integer (0 to 255)
uint16     Unsigned integer (0 to 65535)
uint32     Unsigned integer (0 to 4294967295)
uint64     Unsigned integer (0 to 18446744073709551615)
float_     Shorthand for float64.
float16    Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
float32    Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
float64    Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
complex_   Shorthand for complex128.
complex64  Complex number, represented by two 32-bit floats
complex128 Complex number, represented by two 64-bit floats

More advanced type specification is possible, such as specifying big or little endian numbers; for more information, refer to the NumPy documentation. NumPy also supports compound data types, which will be covered [prossimamente].

:mrgreen:

NumPy – 10 – Introduzione di NumPy

guardian

Continuando da qui oggi finalmente qui.

This chapter […] outlines techniques for effectively loading, storing, and manipulating in-memory data in Python. The topic is very broad: datasets can come from a wide range of sources and a wide range of formats, including be collections of documents, collections of images, collections of sound clips, collections of numerical measurements, or nearly anything else. Despite this apparent heterogeneity, it will help us to think of all data fundamentally as arrays of numbers.

For example, images –particularly digital images– can be thought of as simply two-dimensional arrays of numbers representing pixel brightness across the area. Sound clips can be thought of as one-dimensional arrays of intensity versus time. Text can be converted in various ways into numerical representations, perhaps binary digits representing the frequency of certain words or pairs of words. No matter what the data are, the first step in making it analyzable will be to transform them into arrays of numbers.

For this reason, efficient storage and manipulation of numerical arrays is absolutely fundamental to the process of doing data science. We’ll now take a look at the specialized tools that Python has for handling such numerical arrays: the NumPy package, and the Pandas package [prossimamente].

This chapter will cover NumPy in detail. NumPy (short for Numerical Python) provides an efficient interface to store and operate on dense data buffers. In some ways, NumPy arrays are like Python’s built-in list type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size. NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python, so time spent learning to use NumPy effectively will be valuable no matter what aspect of data science interests you.

Install Anaconda, dice Jake, uhmmm… chissà forse, NumPy dovrei averlo…

np50

Uh! devo installare Anaconda 😊

…………………………………………………………………….
[considerate che qui ci sia una lunga pausa, sto installando]
…………………………………………………………………….

OK, fatto, seguendo le indicazioni trovate qui: Download Anaconda Now.

Tarocca un po l’environment ma per Numpy kwesto&altro 😉 e ora

np51

By convention, you’ll find that most people in the SciPy/PyData world will import NumPy using np as an alias:

np52

Throughout this chapter, and indeed the rest of the book, you’ll find that this is the way we will import and use NumPy.

Un promemoria sulla documentazione
[D]on’t forget that IPython gives you the ability to quickly explore the contents of a package (by using the tab-completion feature), as well as the documentation of various functions using the ? character.

For example, to display all the contents of the numpy namespace, you can type this:

np53

Nota: dopo il punto c’è Tab, nèh!
And to display NumPy’s built-in documentation, you can use np? e inoltre info più dettagliate qui.

:mrgreen:

NumPy – 9 – Risorse aggiuntive di IPython

nikon-small-world-fungus-growing-on-cow-dung
Continuo con IPython, oggi qui.

[W]e’ve just scratched the surface of using IPython to enable data science tasks. Much more information is available both in print and on the Web, and here we’ll list some other resources that you may find helpful.

Risorse nel Web
The IPython website links to documentation, examples, tutorials, and a variety of other resources.

The nbviewer website shows static renderings of any IPython notebook available on the internet.

A Gallery of Interesting IPython Notebooks: This ever-growing list of notebooks, powered by nbviewer, shows the depth and breadth of numerical analysis you can do with IPython. It includes everything from short examples and tutorials to full-blown courses and books composed in the notebook format!

E poi cercando si trovano video e altro ancora.
Ci sono poi libri, Jake non ne cita di free, se volete l’elenco è là.
E poi c’è l’help come raccontato qui.

😊 Finito il capitolo su IPython, adesso si comincia per davvero con Numpy 😊

:mrgreen:

NumPy – 8 – Profiling e timing

mib0

Sempre su IPython copio qui continuando da qui.

Uno scrive, probabilmente non nel modo più efficiente, sapete com’è, dice Knuth “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil“.

But once you have your code working, it can be useful to dig into its efficiency a bit.
IPython provides access to a wide array of functionality for this kind of timing and profiling of code. Here we’ll discuss the following IPython magic commands:

  • %time: Time the execution of a single statement
  • %timeit: Time repeated execution of a single statement for more accuracy
  • %prun: Run code with the profiler
  • %lprun: Run code with the line-by-line profiler
  • %memit: Measure the memory use of a single statement
  • %mprun: Run code with the line-by-line memory profiler

The last four commands are not bundled with IPython–you’ll need to get the line_profiler and memory_profiler extensions, which we will discuss in the following sections.

Controllare i tempi di porzioni di codice con %timeit e %time

%timeit
np46
%time
np47
For %time as with %timeit, using the double-percent-sign cell magic syntax allows timing of multiline scripts:

np48

For more information on %time and %timeit, as well as their available options, use the IPython help functionality (i.e., type %time? at the IPython prompt).

Profiling script interi con %prun
Python contains a built-in code profiler (which you can read about in the Python documentation), but IPython offers a much more convenient way to use this profiler, in the form of the magic function %prun.

np49

The result is a table that indicates, in order of total time on each function call, where the execution is spending the most time. In this case, the bulk of execution time is in the list comprehension inside sum_of_lists. From here, we could start thinking about what changes we might make to improve the performance in the algorithm.
For more information on %prun, as well as its available options, use the IPython help functionality (i.e., type %prun? at the IPython prompt).

È possibile profilare singole linee con %lprun. Bisogna installare line_profiler, cosa che non faccio.
L’uso della memoria si profila con %memit e %mprun, previa installazione di memory_profiler.

Sono tutte operazioni specialistiche, da approfondire se dovessero servire, per adesso le metto tra le cose da fare 😉

:mrgreen:

NumPy – 7 – Errori e debugging

cvp

Copio qui continuando da qui.

Code development and data analysis always require a bit of trial and error, and IPython contains tools to streamline this process. This section will briefly cover some options for controlling Python’s exception reporting, followed by exploring tools for debugging errors in code.

Controllare le eccezioni con %xmode
Most of the time when a Python script fails, it will raise an Exception. When the interpreter hits one of these exceptions, information about the cause of the error can be found in the traceback, which can be accessed from within Python. With the %xmode magic function, IPython allows you to control the amount of information printed when the exception is raised. Consider the following code:

np40

Using the %xmode magic function (short for Exception mode), we can change what information is printed.

%xmode takes a single argument, the mode, and there are three possibilities: Plain, Context, and Verbose. The default is Context, and gives output like that just shown before. Plain is more compact and gives less information:

np41

e

np42

This extra information can help narrow-in on why the exception is being raised. So why not use the Verbose mode all the time? As code gets complicated, this kind of traceback can get extremely long. Depending on the context, sometimes the brevity of Default mode is easier to work with.

Debugging quando non basta la traceback
IPython con il comando magigo %debug è perhaps the most convenient interface to debugging. If you call it after hitting an exception, it will automatically open an interactive debugging prompt at the point of the exception. The ipdb prompt lets you explore the current state of the stack, explore the available variables, and even run Python commands!

np43

E questo è solo l’inizio, consente di andare oltre

np44

This allows you to quickly find out not only what caused the error, but what function calls led up to the error.
If you’d like the debugger to launch automatically whenever an exception is raised, you can use the %pdb magic function to turn on this automatic behavior:

np45

Finally, if you have a script that you’d like to run from the beginning in interactive mode, you can run it with the command %run -d, and use the next command to step through the lines of code interactively.

Lista (parziale) dei comandi di debugging
There are many more available commands for interactive debugging than we’ve listed here; the following table contains a description of some of the more common and useful ones:

Command     Description
list        Show the current location in the file
h(elp) 	    Show a list of commands, or find help on a specific command
q(uit) 	    Quit the debugger and the program
c(ontinue)  Quit the debugger, continue in the program
n(ext) 	    Go to the next step of the program
<enter>     Repeat the previous command
p(rint)     Print variables
s(tep) 	    Step into a subroutine
r(eturn)    Return out of a subroutine

For more information, use the help command in the debugger, or take a look at ipdb’s online documentation.

Potrei cominciare con i ricordi sulle lotte con i debugger ma poi divento noioso 😡 Invece la vita è bella, dai 😄

:mrgreen:

NumPy – 6 – IPython e i comandi della shell

nikon-small-world-a-daisys-central-disc-pattern-of-tiny-unopened-flowers

Continuo da qui a copiare qui.

When working interactively with the standard Python interpreter, one of the frustrations is the need to switch between multiple windows to access Python tools and system command-line tools. Si riferisce agli utenti normali; io invece ho sempre almeno due terminali aperti. Vi ho mai raccontato di quando il terminale era solo alfanumerico, niente finestre e allora si usava & –e ph (phantom) sul Pr1me–, scomodo ma bei tempi, ero giovane. CMQ… IPython bridges this gap, and gives you a syntax for executing shell commands directly from within the IPython terminal. The magic happens with the exclamation point: anything appearing after ! on a line will be executed not by the Python kernel, but by the system command-line.

Introduzione rapida alla shell
Mi sa che salto, niente di nuovo, anzi… come dicevo 😊

Comandi di shell in IPython
Shell commands can not only be called from IPython, but can also be made to interact with the IPython namespace. For example, you can save the output of any shell command to a Python list using the assignment operator:

np34

Note that these results are not returned as lists, but as a special shell return type defined in IPython:

np35

Sembra una lista ma ha funzionalità in più come si può scoprire nell’help di IPython.

Communication in the other direction–passing Python variables into the shell–is possible using the {varname} syntax:

np36

The curly braces contain the variable name, which is replaced by the variable’s contents in the shell command.

Comandi magici relativi alla shell
Non si può usare !cd perché i comandi sono eseguiti in una sub-shell. Ma se proprio vuoi c’è %cd

np37

In fact, by default you can even use this without the % sign:

np38

This is known as an automagic function, and this behavior can be toggled with the %automagic magic function.

Besides %cd, other available shell-like magic functions are %cat, %cp, %env, %ls, %man, %mkdir, %more, %mv, %pwd, %rm, and %rmdir, any of which can be used without the % sign if automagic is on. This makes it so that you can almost treat the IPython prompt as if it’s a normal shell: This access to the shell from within the same terminal window as your Python session means that there is a lot less switching back and forth between interpreter and shell as you write your Python code.

Chissà se funziona anche per gli aliases? No, non esattamente come vorrei: non considera quelli definiti da me. E poi, dai, basta avere un altro terminale aperto; io ne ho sempre almeno due, senza contare quello di tilda 😊

np39

Uh! funziona con gli shell scripts 😄

:mrgreen:

NumPy – 5 – Input, output e history di IPython

cuvyveqviaajb2

Continuo da qui a impratichirmi con la REPL di IPython copiando qui. Non ho ancora capito come fare a dirgli di considerare che voglio Python 3 e non il 2.x ma no credo dipenda da me. Il tempo aggiusterà tutto, spero  😊

Abbiamo visto che i comandi precedenti sono accessibili con i tasti freccia o C-p e C-n (d’ora in poi userò le abbreviazioni Emacs), ma ce ne sono altre.

Gli oggetti In e Out di IPython
I comandi precedenti sono memorizzati in In e gli output in Out, esempio:

np28

sì, version 2.7; lo stesso con Out:

np29

Note that not all operations have outputs: for example, import statements and print statements don’t affect the output. The latter may be surprising, but makes sense if you consider that print is a function that returns None; for brevity, any command that returns None is not added to Out.

Questo può tornare utile, per esempio:

np30

Uso di _ come e outputs precedenti
A differenza della REPL nativa di Python che consente di richiamare con _ solo l’ultimo output con IPython ci sono tutti, aumentando il numero di _:

np31

Sopprimere l’output
Sometimes you might wish to suppress the output of a statement (this is perhaps most common with the plotting commands that we’ll explore in Introduction to Matplotlib). Or maybe the command you’re executing produces a result that you’d prefer not like to store in your output history, perhaps so that it can be deallocated when other references are removed. The easiest way to suppress the output of a command is to add a semicolon to the end of the line:

np32

se l’è presa, non volevo, ancora ‘mici? 🌷 😊

Comandi magici relativi

np33

Notare che quanto detto sopra per i comandi non memorizzati vale solo per la versione 3.x di Python.

Other similar magic commands are %rerun (which will re-execute some portion of the command history) and %save (which saves some set of the command history to a file).

OK 😊 ma quando arriviamo alle cose sexy? 😀

:mrgreen: