Racket – Espressioni regolari – 6

ps0
Oggi, proseguendo da qui sono qui: [doc]/guide/regexp.html, anzi qui [doc]/guide/Looking_Ahead_and_Behind.html che –forse, chissà– finisco il capitolo 😀

Guardare avanti e indietro

You can have assertions in your pattern that look ahead or behind to ensure that a subpattern does or does not occur. These “look around” assertions are specified by putting the subpattern checked for in a cluster whose leading characters are: ?= (for positive lookahead), ?! (negative lookahead), ?<= (positive lookbehind), ?<! (negative lookbehind). Note that the subpattern in the assertion does not generate a match in the final result; it merely allows or disallows the rest of the match.

Guardare avanti

Positive lookahead with ?= peeks ahead to ensure that its subpattern could match.

re50

The regexp #rx"grey(?=hound)" matches grey, but only if it is followed by hound. Thus, the first grey in the text string is not matched.

Negative lookahead with ?! peeks ahead to ensure that its subpattern could not possibly match.

re51

The regexp #rx"grey(?!hound)" matches grey, but only if it is not followed by hound. Thus the grey just before socks is matched.

Guardare indietro

Positive lookbehind with ?<= checks that its subpattern could match immediately to the left of the current position in the text string.

re52

The regexp #rx"(?<=grey)hound" matches hound, but only if it is preceded by grey.

Negative lookbehind with ?<! checks that its subpattern could not possibly match immediately to the left.

re53

Lookaheads and lookbehinds can be convenient when they are not confusing.

OK, adesso per finire un esercizio completo, qui: [doc]/guide/An_Extended_Example.html.

Un esempio esteso

Here’s an extended example from Friedl’s Mastering Regular Expressions, page 189, that covers many of the features described in this chapter. The problem is to fashion a regexp that will match any and only IP addresses or dotted quads: four numbers separated by three dots, with each number between 0 and 255.

First, we define a subregexp n0-255 that matches 0 through 255:

re54

Note that n0-255 lists prefixes as preferred alternates, which is something we cautioned against in Alternation. However, since we intend to anchor this subregexp explicitly to force an overall match, the order of the alternates does not matter.

The first two alternates simply get all single- and double-digit numbers. Since 0-padding is allowed, we need to match both 1 and 01. We need to be careful when getting 3-digit numbers, since numbers above 255 must be excluded. So we fashion alternates to get 000 through 199, then 200 through 249, and finally 250 through 255.

An IP-address is a string that consists of four n0-255s with three dots separating them.

re55

Let’s try it out:

re56

which is fine, except that we also have

re57

All-zero sequences are not valid IP addresses! Lookahead to the rescue. Before starting to match ip-re1, we look ahead to ensure we don’t have all zeros. We could use positive lookahead to ensure there is a digit other than zero.

re58

Or we could use negative lookahead to ensure that what’s ahead isn’t composed of only zeros and dots.

re59

The regexp ip-re will match all and only valid IP addresses.

re60

:mrgreen:

Posta un commento o usa questo indirizzo per il trackback.

Rispondi

Inserisci i tuoi dati qui sotto o clicca su un'icona per effettuare l'accesso:

Logo di WordPress.com

Stai commentando usando il tuo account WordPress.com. Chiudi sessione /  Modifica )

Google photo

Stai commentando usando il tuo account Google. Chiudi sessione /  Modifica )

Foto Twitter

Stai commentando usando il tuo account Twitter. Chiudi sessione /  Modifica )

Foto di Facebook

Stai commentando usando il tuo account Facebook. Chiudi sessione /  Modifica )

Connessione a %s...

Questo sito utilizza Akismet per ridurre lo spam. Scopri come vengono elaborati i dati derivati dai commenti.

%d blogger hanno fatto clic su Mi Piace per questo: