Working with unordered structures

Date: 16/11/2020
Time: 09:30-11:30

Sets

What is it about?
  • A collection of unordered and unindexed items

  • Why do we need it?
  • We use Sets in order to have a collection without duplicates

  • What else should I know?
  • The items of a set can't be accessed by referring to an index or a key
  • It can contain only immutable data types, e.g. strings, numbers, or tuples
  • # create a set of string items
    a_set = {"Roma", "Torino", "Bologna"}

    # create a set from a list
    a_set = set(["Roma", "Torino", "Bologna","Roma"])

    # add an item to the set
    a_set.add("Palermo")

    # remove an item from the set
    a_set.remove("Palermo") # raise an error if the item isn't found
    a_set.discard("Palermo") # doesn't raise errors if the item isn't found

    #ERROR: we can't add a mutable item to the set
    a_set.add(["Rimini","Firenze"])

    # some operations: union, intersection, difference ... etc
    b_set = {"Napoli", "Bari", "Lecce", "Roma"}
    new_set = a_set.union(b_set)
    #OUTPUT: {'Bari', 'Bologna', 'Lecce', 'Napoli', 'Roma', 'Torino'}
    new_set = a_set.intersection(b_set)
    #OUTPUT: {'Roma'}
    new_set = a_set.difference(b_set)
    #OUTPUT: {'Bologna', 'Torino'}

    Dictionaries

    What is it about?
  • A data structure that can store data in the form of key-value pairs

  • Why do we need it?
  • It provides an efficient and useful way for organizing the data under specific typologies (keys)

  • What else should I know?
  • The items of a dictionary are accessed by specifying a key
  • The key values should be immutable data types, e.g. strings, numbers, or tuples
  • A dictionary can't contain duplicate keys
  • # create a dictionary
    ages_dict = {}

    # add a new (key,value) pair
    ages_dict["Marco"] = 25
    ages_dict["Alessia"] = 22
    ages_dict["Giulia"] = 21
    #OUTPUT: {'Marco': 25, 'Alessia':22, 'Giulia':21}

    # accessing an item
    print(ages_dict["Marco"])

    # remove an item from the dictionary
    del ages_dict["Marco"]

    # important methods
    ages_dict.items() #returns a sequence of (key,value) pairs
    #OUTPUT: [('Alessia',22),('Giulia',21)]
    a_dict = {"Pippo":34}
    ages_dict.update(a_dict) #updates ages_dict with the (key,value) pairs of a_dict
    #OUTPUT: {'Alessia': 22, 'Giulia':21, 'Pippo':34}

    Exercises

    (check the exercises on the github repository)

    1st Exercise

    We define the variable lyrics containing the lyrics (string value) of the song "Lonely Boy" of "The Black Keys". The words are all written in lowercase and the lines are separated by ;;

    lyrics = "well i’m so above you ;; and it’s plain to see ;; but i came to love you anyway ;; so you pulled my heart out ;; and i don’t mind bleeding ;; any old time you keep me waiting ;; waiting, waiting ;; oh, oh-oh i got a love that keeps me waiting ;; oh, oh-oh i got a love that keeps me waiting ;; i’m a lonely boy ;; i’m a lonely boy ;; oh, oh-oh i got a love that keeps me waiting ;; well your mama kept you but your daddy left you ;; and i should’ve done you just the same ;; but i came to love you ;; am i born to bleed? ;; any old time you keep me waiting ;; waiting, waiting ;; oh, oh-oh i got a love that keeps me waiting ;; oh, oh-oh i got a love that keeps me waiting ;; i’m a lonely boy ;; i’m a lonely boy ;; oh, oh-oh i got a love that keeps me waiting ;; hey! ;; oh, oh-oh i got a love that keeps me waiting ;; oh, oh-oh i got a love that keeps me waiting ;; i’m a lonely boy ;; i’m a lonely boy ;; oh, oh-oh i got a love that keeps me waiting"

    a) We want to print all the unique words in the lyrics of the song "Lonely Boy". We also want to exclude the following words from the final set: ['', 'a', 'i', 'am', 'to', ';;', 'the', 'you', 'don’t', 'and', 'that', 'i’m', 'it’s']. Define a function clean_lyrics() which takes lyrics as parameter and returns a clean set of words (as just described). Call the defined function and print the new returned set.

    Hint:
    On python {string}.split({separator}) splits a string into a list of words using {separator} as splitter between the words in the string
    Example:
    Calling "Hi my name is James".split(" ") returns the following list ["Hi", "my", "name", "is", "James"]
    Mark the box to see the solution
    def clean_lyrics(txt_lyrics):
        lyrics_set = set(txt_lyrics.split(" "))
        unwated_list = ['', 'a', 'i', 'am', 'to', ';;', 'the', 'you', 'don’t', 'and', 'that', 'i’m', 'it’s']
        unwanted_set = set(unwated_list)
        clean_set = lyrics_set.difference(unwanted_set)
        return clean_set

    my_set = clean_lyrics(lyrics)
    print(my_set)

    b) Define a function common_words() which takes the clean version of lyrics (result of point (a)) as parameter. The function should count and return the number of words that are also part of the following list ["mama","daddy","sister","brother","boy","girl"].

    Mark the box to see the solution
    def common_words(clean_set):
        l_words = ["mama","daddy","sister","brother","boy","girl"]
        s_words = set(l_words)
        common_set = clean_set.intersection(s_words)
        return len(common_set)
        
    print(common_words(my_set))

    2nd Exercise

    We want to further analyse the lyrics of the 1st Exercise considering the same variable lyrics.
    a) Define a function count_words() which takes lyrics as parameter and returns a dictionary of all the words with a corresponding number to indicate the count of the occurrences in the song lyrics. The dictionary should not consider and contain the following words ['', 'a', 'i', 'am', 'to', ';;', 'the', 'you', 'don’t', 'and', 'that', 'i’m', 'it’s'].

    Mark the box to see the solution
    def count_words(txt_lyrics):
        result_dict = {}
        lyrics_l = txt_lyrics.split(" ")
        unwated_list = ['', 'a', 'i', 'am', 'to', ';;', 'the', 'you', 'don’t', 'and', 'that', 'i’m', 'it’s']
        for w in lyrics_l:
            if w not in unwated_list:
                if w not in result_dict:
                    result_dict[w] = 0
                result_dict[w] += 1

        return result_dict

    count_dict = count_words(lyrics)
    print(count_dict)

    b) Andrea wants to build a clever organization for its playlist of "The Black Keys". He used to write first the name of the album followed by the title of the song and separating the two values using "::" (e.g. el_camino::lonely_boy the album name is "el_camino" while "lonely_boy" is the song title). Here we have the entire playlist of Andrea, the songs are separated using ";;":

    playlist_txt = "el_camino::lonely_boy ;; el_camino::little_black_submarine ;; el_camino::gold_on_the_ceiling ;; turn_blue::fever ;; turn_blue::gotta_get_away ;; brothers::howlin_for_you ;; brothers::tighten_up ;; turn_blue::it_is_up_to_you_now"

    Define a function build_playlist_dict() which takes playlist_txt as a parameter and creates a dictionary having the album titles as keys while for each key (album) the dictionary associates a list of all its corresponding songs.

    Mark the box to see the solution
    playlist_txt = "el_camino::lonely_boy ;; el_camino::little_black_submarine ;; el_camino::gold_on_the_ceiling ;; turn_blue::fever ;; turn_blue::gotta_get_away ;; brothers::howlin_for_you ;; brothers::tighten_up ;; turn_blue::it_is_up_to_you_now"

    def build_playlist_dict(a_txt):
        result_dict = {}
        songs = a_txt.split(" ;; ")
        for a_song in songs:
            song_parts = a_song.split("::")
            album = song_parts[0]
            song_name = song_parts[1]
            if album not in result_dict:
                result_dict[album] = []
            result_dict[album].append(song_name)
        return result_dict

    print(build_playlist_dict(playlist_txt))