Project

Bibliometrics Engine

The name of the project is Bibliometric Engine

It is a software that takes in input a file in a particular format (CSV), containing citations between scholarly documents, each identified by a digital object identifier (DOI)

The goal of the software is to run particular analysis and extractions on such data

You need a group

The project must be implemented by a group of people

You need to

form groups of at least 3 and at most 4 people
choose a name for the group (yes: a name) - it will be used to publish the ranks of the best-performing projects (more info later)
communicate the name of the group and its members (including their emails) to me by sending an email at silvio.peroni@unibo.it

Final deadline: all groups must be ready by next Wednesday at most (16 December)

Project stub (1/2)

<group_name> import *

class BibliometricEngine(object):
    def __init__(self, citations_file_path):
        self.data = process_citations(citations_file_path)

    def compute_impact_factor(self, dois, year):
        return do_compute_impact_factor(self.data, dois, year)

    def get_co_citations(self, doi1, doi2):
        return do_get_co_citations(self.data, doi1, doi2)

    def get_bibliographic_coupling(self, doi1, doi2):
        return do_get_bibliographic_coupling(self.data, doi1, doi2)

    def get_citation_network(self, start, end):
        return do_get_citation_network(self.data, start, end)
	
    def merge_graphs(self, g1, g2):
        return do_merge_graphs(self.data, g1, g2)

Project stub (2/2)

    def search_by_prefix(self, prefix, is_citing, dump):
        if dump is None:
            return do_search_by_prefix(self.data, prefix, is_citing)
        else:
            return do_search_by_prefix(dump, prefix, is_citing)

    def search(self, query, field, dump):
        if dump is None:
            return do_search(self.data, query, field)
        else:
            return do_search(dump, query, field)
	
    def filter_by_value(self, query, field, dump):
        if dump is None:
            return do_filter_by_value(self.data, query, field)
        else:
            return do_filter_by_value(dump, query, field)

Your output

You have to develop a Python file named as your group, where spaces are substituted by underscores and the whole file is lowercase

E.g.: group Best group ever, file best_group_ever.py

The import statement must specify the module implemented

E.g.: from best_group_ever import *

Each group has to implement the functions that have been highlighted in red in the previous slide

process_citations

def process_citations(citations_file_path)

It takes in input a comma separated value (CSV) file, and return a data structure containing all the data included in the CSV is some form

The data can be preprocessed, changed according to some empirical rule, ordered in a certain way, etc.

These data will be automatically provided in input to the other functions

Example of CSV citation data

citing	cited	creation	timespan
10.2964/jsik_2020_003	10.1007/s11192-019-03217-6	2020-02-29	P0Y5M15D
10.1007/s11192-019-03311-9	10.1007/s11192-019-03217-6	2019-12-04	P0Y2M20D
10.1007/s11192-019-03217-6	10.1007/978-3-030-00668-6_8	2019-09-14	P1Y
10.1007/s11192-019-03217-6	10.1007/978-3-319-11955-7_42	2019-09-14	P5Y
...	...	...	...

citing (string): the DOI of a citing article

cited (string): the DOI of a cited article

creation (string): publication date of the citing article (date format: YYYY-MM-DD, YYYY-MM or YYYY)

timespan (string): the difference between the publication date of the citing article and the publication date of the cited article (duration format): PnYnMnD, PnYnM, or PnY, plus - before P for negative durations