COVID-19 themed GitHub Repository Dataset

covid19-repos


COVID-19 themed GitHub Repository Dataset

Introduction

Ever since the beginning of the outbreak of the COVID-19 pandemic, researchers from interdisciplinary domains have worked together to fight against the crisis. The open source community, plays a vital role in coping with the pandemic which is inherently a collaborative process. Plenty of COVID-19 related datasets, tools, software, deep learning models, are created and shared in research communities with great efforts. However, COVID-19 themed open source projects have not been systematically studied, and we are still unaware how the open source community helps combat COVID-19 in practice.

To fill this void, we take the first step to study COVID-19 themed repositories in GitHub, one of the most popular collaborative platforms. We have collected over 67K COVID-19 themed GitHub repositories till July 2020. We release this dataset to boost future research on adopting open source technologies and resources to rapidly tackle the worldwide public health emergency in practice. For more details, please refer to https://covid19-repos.github.io/.

The dataset includes the following files:

  • contributor-activity.csv: This file contains the contribution of each contributor.
  • contributor-country.csv: This file contains the country name of each contributor.
  • daily_confirmed.csv: This file contains the commit logs for two months of top stared repositories in each category.
  • logs.zip: This file contains the commit logs for two months of top stared repositories.
  • readme.zip: This file contains the readme files of all repositories.
  • repo_link.csv: This file contains the repository ID and all links included in their readme files.
  • repos-baseinfo.csv: This file contains the basic information of each repository.
  • repos-contributors.xlsx: This file contains the basic information of contributors.

The dataset can be accessed via the following link: Link