Working with unordered structures

Date: 16/11/2020

Time: 09:30-11:30

Sets

What is it about?

A collection of unordered and unindexed items

Why do we need it?

We use Sets in order to have a collection without duplicates

What else should I know?

The items of a set can't be accessed by referring to an index or a key

It can contain only immutable data types, e.g. strings, numbers, or tuples

# create a set of string items

a_set = {"Roma", "Torino", "Bologna"}

# create a set from a list

a_set = set(["Roma", "Torino", "Bologna","Roma"])

# add an item to the set

a_set.add("Palermo")

# remove an item from the set

a_set.remove("Palermo") # raise an error if the item isn't found

a_set.discard("Palermo") # doesn't raise errors if the item isn't found

#ERROR: we can't add a mutable item to the set

a_set.add(["Rimini","Firenze"])

# some operations: union, intersection, difference ... etc

b_set = {"Napoli", "Bari", "Lecce", "Roma"}

new_set = a_set.union(b_set)

#OUTPUT: {'Bari', 'Bologna', 'Lecce', 'Napoli', 'Roma', 'Torino'}

new_set = a_set.intersection(b_set)

#OUTPUT: {'Roma'}

new_set = a_set.difference(b_set)

#OUTPUT: {'Bologna', 'Torino'}

Dictionaries

What is it about?

A data structure that can store data in the form of key-value pairs

Why do we need it?

It provides an efficient and useful way for organizing the data under specific typologies (keys)

What else should I know?

The items of a dictionary are accessed by specifying a key

The key values should be immutable data types, e.g. strings, numbers, or tuples

A dictionary can't contain duplicate keys

# create a dictionary

ages_dict = {}

# add a new (key,value) pair

ages_dict["Marco"] = 25

ages_dict["Alessia"] = 22

ages_dict["Giulia"] = 21

#OUTPUT: {'Marco': 25, 'Alessia':22, 'Giulia':21}

# accessing an item

print(ages_dict["Marco"])

# remove an item from the dictionary

del ages_dict["Marco"]

# important methods

ages_dict.items() #returns a sequence of (key,value) pairs

#OUTPUT: [('Alessia',22),('Giulia',21)]

a_dict = {"Pippo":34}

ages_dict.update(a_dict) #updates ages_dict with the (key,value) pairs of a_dict

#OUTPUT: {'Alessia': 22, 'Giulia':21, 'Pippo':34}

Exercises

(check the exercises on the github repository)

1st Exercise

We define the variable lyrics containing the lyrics (string value) of the song "Lonely Boy" of "The Black Keys". The words are all written in lowercase and the lines are separated by ;;

lyrics = "well i’m so above you ;; and it’s plain to see ;; but i came to love you anyway ;; so you pulled my heart out ;; and i don’t mind bleeding ;; any old time you keep me waiting ;; waiting, waiting ;; oh, oh-oh i got a love that keeps me waiting ;; oh, oh-oh i got a love that keeps me waiting ;; i’m a lonely boy ;; i’m a lonely boy ;; oh, oh-oh i got a love that keeps me waiting ;; well your mama kept you but your daddy left you ;; and i should’ve done you just the same ;; but i came to love you ;; am i born to bleed? ;; any old time you keep me waiting ;; waiting, waiting ;; oh, oh-oh i got a love that keeps me waiting ;; oh, oh-oh i got a love that keeps me waiting ;; i’m a lonely boy ;; i’m a lonely boy ;; oh, oh-oh i got a love that keeps me waiting ;; hey! ;; oh, oh-oh i got a love that keeps me waiting ;; oh, oh-oh i got a love that keeps me waiting ;; i’m a lonely boy ;; i’m a lonely boy ;; oh, oh-oh i got a love that keeps me waiting"

a) We want to print all the unique words in the lyrics of the song "Lonely Boy". We also want to exclude the following words from the final set: ['', 'a', 'i', 'am', 'to', ';;', 'the', 'you', 'don’t', 'and', 'that', 'i’m', 'it’s']. Define a function clean_lyrics() which takes lyrics as parameter and returns a clean set of words (as just described). Call the defined function and print the new returned set.

Hint:
On python {string}.split({separator}) splits a string into a list of words using {separator} as splitter between the words in the string
Example:
Calling "Hi my name is James".split(" ") returns the following list ["Hi", "my", "name", "is", "James"]

Mark the box to see the solution

def clean_lyrics(txt_lyrics):

lyrics_set = set(txt_lyrics.split(" "))

unwated_list = ['', 'a', 'i', 'am', 'to', ';;', 'the', 'you', 'don’t', 'and', 'that', 'i’m', 'it’s']

unwanted_set = set(unwated_list)

clean_set = lyrics_set.difference(unwanted_set)

return clean_set

my_set = clean_lyrics(lyrics)

print(my_set)

b) Define a function common_words() which takes the clean version of lyrics (result of point (a)) as parameter. The function should count and return the number of words that are also part of the following list ["mama","daddy","sister","brother","boy","girl"].

Mark the box to see the solution

def common_words(clean_set):

l_words = ["mama","daddy","sister","brother","boy","girl"]

s_words = set(l_words)

common_set = clean_set.intersection(s_words)

return len(common_set)

print(common_words(my_set))

2nd Exercise

We want to further analyse the lyrics of the 1st Exercise considering the same variable lyrics.
a) Define a function count_words() which takes lyrics as parameter and returns a dictionary of all the words with a corresponding number to indicate the count of the occurrences in the song lyrics. The dictionary should not consider and contain the following words ['', 'a', 'i', 'am', 'to', ';;', 'the', 'you', 'don’t', 'and', 'that', 'i’m', 'it’s'].

Mark the box to see the solution

def count_words(txt_lyrics):

result_dict = {}

lyrics_l = txt_lyrics.split(" ")

unwated_list = ['', 'a', 'i', 'am', 'to', ';;', 'the', 'you', 'don’t', 'and', 'that', 'i’m', 'it’s']

for w in lyrics_l:

if w not in unwated_list:

if w not in result_dict:

result_dict[w] = 0

result_dict[w] += 1

return result_dict

count_dict = count_words(lyrics)

print(count_dict)

b) Andrea wants to build a clever organization for its playlist of "The Black Keys". He used to write first the name of the album followed by the title of the song and separating the two values using "::" (e.g. el_camino::lonely_boy the album name is "el_camino" while "lonely_boy" is the song title). Here we have the entire playlist of Andrea, the songs are separated using ";;":

playlist_txt = "el_camino::lonely_boy ;; el_camino::little_black_submarine ;; el_camino::gold_on_the_ceiling ;; turn_blue::fever ;; turn_blue::gotta_get_away ;; brothers::howlin_for_you ;; brothers::tighten_up ;; turn_blue::it_is_up_to_you_now"

Define a function build_playlist_dict() which takes playlist_txt as a parameter and creates a dictionary having the album titles as keys while for each key (album) the dictionary associates a list of all its corresponding songs.

Mark the box to see the solution

def build_playlist_dict(a_txt):

result_dict = {}

songs = a_txt.split(" ;; ")

for a_song in songs:

song_parts = a_song.split("::")

album = song_parts[0]

song_name = song_parts[1]

if album not in result_dict:

result_dict[album] = []

result_dict[album].append(song_name)

return result_dict

print(build_playlist_dict(playlist_txt))

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search