Open Ethics Data Passport

Problem

Supervised learning remains one of the most widely used approaches in machine learning. A supervised learning approach requires data annotated by subject-matter experts to train machine learning algorithms. Part of the bias and algorithmic unfairness gets inherited from the historically labeled data. It is a socio-technological phenomenon that happens because people who label the data or the ones who make decisions mapping inputs to outputs unconsciously carry biases already (we are humans). Transparency around who labeled the data and the impact of the profile of the data labeler – their expertise, their personality, and their value hierarchies on the resulting fairness and accuracy properties of the machine learning models remain unknown.

Purpose

Bring transparency to the systemic properties of the AI models by developing an Open Ethics Data “Passport” (OEDP). The Data Passport has a purpose at depicting the origins of the training datasets by bringing a standardized approach to convey information about data annotation processes, data labelers profiles, and correct scoping of the labeler’s job.

The project scope document and the code is available on GitHub

Get involved