Numerous choices are made in any data analysis project: what and how much data is collected, what database schema is used to store the data, how the data is transformed, what cutoffs were used in the data cleaning process to remove low quality datapoints, what algorithms (with which parameters) were run on the data, etc. These choices can have an immense effect on the final results and conclusions, but - even when provenance is registered - are often not made explicit and thereby introduce (hidden) bias. Although it is often argued that advanced models will take this bias out of the equation and present an objective view on the world, O’Neil convincingly argues in her book "Weapons of Math Destruction" that “algorithms are opinions embedded in mathematics” and therefore pose the danger of cementing bias into the analysis while simultaneously hiding it. Through providing context at different levels of the data science cycle, we can alleviate this bias and thereby improve explainability of algorithms and analysis results.
For the user, it is often difficult to articulate their needs in data science, leading the analyst to perform the analyses that the user wants rather than those he/she needs. Both known and unknown biases exist at the data gathering stage where samples are not always representative of the population under investigation. At the modelling phase, black-box algorithms hide assumptions taken by the designer and hard cut-offs remove the context in which decisions are taken. At the dissemination phase, it is important to put any conclusions into perspective.