Data analysis of the Netflix shows - part 3

Date: 11/12/2020
Time: 12:30-14:30

Data visualization

  • Apply a graphic representation on the data: mapping the original data to graphic elements (for example: lines, points, barchart etc).
  • If we want to draw a graphic on top of a dataset, we need to walk through these three steps:
    • What is the nature of our data?
    • What aspects we want to analyse?
    • What are the most suitable graphical elements we can use to present our analysis?
  • Example: Data: a collection of articles. For each article we have the year of its publication and the author/s. Analysis: we want to check how many articles did "Silvio Peroni" authored on each different year. Visualization: we can use a barchart.

    The matplotlib library

    matplotlib was the first Python data visualization library and it's still widely used for plotting in the Python community. It was designed to closely resemble MATLAB, a popular proprietary programming language.
    We are especially interested in matplotlib.pyplot, a collection of plotting functions. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc.

    Note: this is not a builtin library of python so we need to install it using pip install matplotlib

    import matplotlib.pyplot as plt

    data = [23,85, 72, 43, 52]
    labels = ['A', 'B', 'C', 'D', 'E']

    # x-Axis ticks and label
    plt.xticks(range(len(data)), labels)
    plt.xlabel('Class')

    # y-Axis label
    plt.ylabel('Amounts')

    # chart title
    plt.title('I am title')

    # plt a bar
    plt.bar(range(len(data)), data)
    plt.show(

    After running the above script this chart should appear:

    Data visualizations on the Netflix shows dataset (see also the github repository)

    To answer the following exercises we need to use some of the functions we have defined on the previous lessons (part-1, and part-2).

    a) Draw a graphic using matplotlib which plots the total number of shows (all type of shows) that Netflix added for each different year.

    b) Draw a graphic using matplotlib which plots the average time (in years) it takes Netflix to add a show on its list after its actual release. Plot this value for each different year.
    Hint: Take a look at the line charts of matplotlib https://datatofish.com/line-chart-python-matplotlib/.