The CTP Book

A book for teaching Computational Thinking and Programming skills to people with a background in the Humanities

View on GitHub

Development - Advanced, exercise 39

Text

The Sørensen–Dice coefficient is a statistic used to gauge the similarity of two samples, that was intended to be applied to discrete data. Given two sets, A and B, it is defined as twice the number of elements common to both sets divided by the sum of the number of elements in each set, as defined in the following formula:

Sørensen–Dice coefficient

Write an algorithm in Python – def sd_coeff(s1, s2) – which takes in input two sets and returns the number defining the Sørensen–Dice coefficient for those sets.

Solution

# Test case for the function
def test_sd_coeff(s1, s2, expected):
    result = sd_coeff(s1, s2)
    print(result)
    if result is not None and (round(result, 2) == round(expected, 2)):
        return True
    else:
        return False


# Code of the function
def sd_coeff(s1, s2):
    count = 0
    for i in s1:
        if i in s2:
            count += 1
    
    den = len(s1) + len(s2)

    return (2 * count) / den
    
            
# Tests
print(test_sd_coeff({1, 2, 3}, {1, 2, 3}, 1.0))
print(test_sd_coeff({1, 2}, {1, 2, 3}, 0.8))

Additional material

The runnable Python file is available online.