Collaborative Data Science within Government of Canada: Development of R libraries for common tasks with Open Canada data

Articles and reports: 11-522-X202100100028
Description:

Many Government of Canada groups are developing codes to process and visualize various kinds data, often duplicating each other’s efforts, with sub-optimal efficiency and limited level of code quality reviewing. This paper informally presents a working-level approach to addressing this technical problem. The idea is to collaboratively build a common repository of code and knowledgebase for use by anyone in the public sector to perform many common data science tasks, and, in doing that, help each other to master both the data science coding skills and the industry standard collaborative practices. The paper explains why R language is used as the language of choice for collaborative data science code development. It summaries R advantages and addresses its limitations, establishes the taxonomy of discussion topics of highest interested to the GC data scientists working with R, provides an overview of used collaborative platforms, and presents the results obtained to date. Even though the code knowledgebase is developed mainly in R, it is meant to be valuable also for data scientists coding in Python and other development environments. Key Words: Collaboration; Data science; Data Engineering; R; Open Government; Open Data; Open Science

Issue Number: 2021001
Author(s): Gorodnichy, Dmitry; Little, Patrick
Main Product: Statistics Canada International Symposium Series: Proceedings
Format Release date More information
PDF October 29, 2021

Related information

Subjects and keywords

Subjects