Working with unordered structures
Sets
- A collection of unordered and unindexed items
- We use Sets in order to have a collection without duplicates
- The items of a set can't be accessed by referring to an index or a key
- A Set contains only immutable data types, e.g. strings, numbers, or tuples
# create a set of string items
a_set = {"Roma", "Torino", "Bologna"}
# create a set from a list
a_set = set(["Roma", "Torino", "Bologna","Roma"])
# add an item to the set
a_set.add("Palermo")
# remove an item from the set
a_set.remove("Palermo") # raise an error if the item isn't found
a_set.discard("Palermo") # doesn't raise errors if the item isn't found
#ERROR: we can't add a mutable item to the set
a_set.add(["Rimini","Firenze"])
# some operations: union, intersection, difference ... etc
b_set = {"Napoli", "Bari", "Lecce", "Roma"}
new_set = a_set.union(b_set)
#OUTPUT: {'Bari', 'Bologna', 'Lecce', 'Napoli', 'Roma', 'Torino'}
new_set = a_set.intersection(b_set)
#OUTPUT: {'Roma'}
new_set = a_set.difference(b_set)
#OUTPUT: {'Bologna', 'Torino'}
Dictionaries
- A data structure that can store data in the form of key-value pairs
- It provides an efficient and useful way for organizing the data under specific typologies (keys)
- The items of a dictionary are accessed by specifying a key
- The key values should be immutable data types, e.g. strings, numbers, or tuples
- A dictionary can't contain duplicate keys
# create a dictionary
ages_dict = {}
# add a new (key,value) pair
ages_dict["Marco"] = 25
ages_dict["Alessia"] = 22
ages_dict["Giulia"] = 21
#OUTPUT: {'Marco': 25, 'Alessia':22, 'Giulia':21}
# accessing an item
print(ages_dict["Marco"])
# remove an item from the dictionary
del ages_dict["Marco"]
# important methods
ages_dict.items() #returns a sequence of (key,value) pairs
#OUTPUT: [('Alessia',22),('Giulia',21)]
a_dict = {"Pippo":34}
ages_dict.update(a_dict) #updates ages_dict with the (key,value) pairs of a_dict
#OUTPUT: {'Alessia': 22, 'Giulia':21, 'Pippo':34}
Exercises
1st Exercise
The variable lyrics
contains the lyrics (string value) of the song "Lonely Boy" by "The Black Keys". The words are all written in lowercase and the line breaks are represented with ;;
lyrics = "well i’m so above you ;; and it’s plain to see ;; but i came to love you anyway ;; so you pulled my heart out ;; and i don’t mind bleeding ;; any old time you keep me waiting ;; waiting, waiting ;; oh, oh-oh i got a love that keeps me waiting ;; oh, oh-oh i got a love that keeps me waiting ;; i’m a lonely boy ;; i’m a lonely boy ;; oh, oh-oh i got a love that keeps me waiting ;; well your mama kept you but your daddy left you ;; and i should’ve done you just the same ;; but i came to love you ;; am i born to bleed? ;; any old time you keep me waiting ;; waiting, waiting ;; oh, oh-oh i got a love that keeps me waiting ;; oh, oh-oh i got a love that keeps me waiting ;; i’m a lonely boy ;; i’m a lonely boy ;; oh, oh-oh i got a love that keeps me waiting ;; hey! ;; oh, oh-oh i got a love that keeps me waiting ;; oh, oh-oh i got a love that keeps me waiting ;; i’m a lonely boy ;; i’m a lonely boy ;; oh, oh-oh i got a love that keeps me waiting"
1.a) Define the function clean_lyrics()
which takes lyrics
as parameter and returns a set of all the unique words of the song excluding the following words ['a', 'i', 'am', 'to', ';;', 'the', 'you', 'don’t', 'and', 'that', 'i’m', 'it’s']
. Once defined call the function and print its outcome.
Hint: in python the function {string}.split({separator})
splits a string into a list of words using {separator}
as splitter between the words in the string. For instance, "Hi my name is James".split(" ") -> ["Hi", "my", "name", "is", "James"]
def clean_lyrics(txt_lyrics):
lyrics_set = set(txt_lyrics.split(" "))
unwanted_list = ['a', 'i', 'am', 'to', ';;', 'the', 'you', 'don’t', 'and', 'that', 'i’m', 'it’s']
unwanted_set = set(unwanted_list)
clean_set = lyrics_set.difference(unwanted_set)
return clean_set
print(clean_lyrics(lyrics))
1.b) Define the function family_words()
which takes the outcome of clean_lyrics()
as parameter and return the number of words that are also part of the following list ["mama","daddy","sister","brother","boy","girl"]
.
def family_words(l_words):
clean_set = set(l_words)
l_words = ["mama", "daddy", "sister", "brother", "boy", "girl"]
s_words = set(l_words)
common_set = clean_set.intersection(s_words)
return len(common_set)
l_clean_lyrics = family_words(clean_lyrics(lyrics))
print(l_clean_lyrics)
1.c) Define the function count_words()
which takes lyrics
as parameter and returns a dictionary with all the unique words as keys and the occurrences of each word as value. The dictionary must not include the following words ['a', 'i', 'am', 'to', ';;', 'the', 'you', 'don’t', 'and', 'that', 'i’m', 'it’s']
.
def count_words(txt_lyrics):
result_dict = {}
lyrics_l = txt_lyrics.split(" ")
unwated_list = ['a', 'i', 'am', 'to', ';;', 'the', 'you', 'don’t', 'and', 'that', 'i’m', 'it’s']
for w in lyrics_l:
if w not in unwated_list:
if w not in result_dict:
result_dict[w] = 0
result_dict[w] += 1
return result_dict
count_dict = count_words(lyrics)
print(count_dict)
1.d) Andrea wants to create a playlist with some songs by The Black Keys. He wrote down on a paper the list of songs he would like to include. For each song first, he wrote the name of the album and then the title of the song contained in it. The two values are separated using "::". For instance, in el_camino::lonely_boy
the album name is "el_camino" and "lonely_boy" is the title of the song. Here we have the entire playlist of Andrea, the songs are separated by ";;":
playlist_txt = "el camino::lonely boy ;; el camino::little black submarine ;; el camino::gold on the ceiling ;; turn blue::fever ;; turn blue::gotta get away ;; brothers::howlin for you ;; brothers::tighten up ;; turn blue::it is up to you now"
Define the function build_playlist_dict()
which takes playlist_txt
as parameter and creates a dictionary having the album titles as keys each key will have the list of all its corresponding songs.
def build_playlist_dict(a_txt):
result_dict = {}
songs = a_txt.split(" ;; ")
for a_song in songs:
song_parts = a_song.split("::")
album = song_parts[0]
song_name = song_parts[1]
if album not in result_dict:
result_dict[album] = []
result_dict[album].append(song_name)
return result_dict
print(build_playlist_dict(playlist_txt))
2nd Exercise
Giving a collection of pokemon cards (i.e. a list) we want to readapt it following the pokemon-evolution rules, such that if we have two cards in our deck representing the same pokemon we must replace them with one card representing the evolved version of that pokemon. The evolution rules are defined in the following dictionary:
evolution_map = {
"Poliwag": "Poliwhirl",
"Bulbasaur": "Ivysaur",
"Charmander": "Charmeleon",
"Pidgey": "Pidgeotto",
"Psyduck": "Golduck",
"Abra": "Kadabra"
}
For instance:
l_cards = ["Poliwag", "Pidgey", "Abra", "Pidgey", "Charmander", "Bulbasaur", "Charmander", "Psyduck", "Poliwag","Goldeen"]
# The new list should be:
# ['Poliwhirl', 'Charmeleon', 'Bulbasaur', 'Psyduck', 'Pidgeotto', 'Goldeen', 'Abra']
Define the function pokemon_cards()
which takes a list of pokemon cards and returns a new list of pokemon cards which follows the pokemon-evolution rules.
def pokemon_cards(l_pokemons):
card_checked = set()
new_l_pokemons = list()
for pok in l_pokemons:
if pok not in card_checked:
num_occ = 0
for pok_check in l_pokemons:
if pok_check == pok:
num_occ += 1
if num_occ > 1:
pok_evol = evolution_map[pok]
new_l_pokemons.append(pok_evol)
else:
new_l_pokemons.append(pok)
card_checked.add(pok)
return new_l_pokemons