Taxonomy
We want to make sure that everyone is on the same page and there’s no confusion around terminology. We’re building the open-source taxonomy that enyone can use and contribute to. We want to make sure that every term is adequately translated and is linked to other terms using simple markdown. To generate your term files, use this link. You can also join other volunteers in the effort to build a shared language for the AI Ethics domain.
Artificial intelligence
Artificial intelligence (AI) is a system that makes it possible for a computer to learn from experience, adjust to new inputs, and perform tasks commonly associated with human intelligence. Today AI is properly known as “narrow AI” (or “weak AI”), as it is designed to perform a narrow task (e.g. recognize objects, classify images, recommend products, conduct coaching conversations).
Disclosure
Disclosure (ethics disclosure, or self-disclosure) describes in a standardized way how the autonomous system was built and how it is supposed to work. Disclosures contain information about the data collection, data processing, and decision-making practices of a digital product and are voluntarily provided by the product’s vendor (an individual developer or an organization).
Disclosure Validation
A sequence of automated software-based checks to control validity and security elements in the Disclosure. Disclosure Validation is important to allow the check whether a presented Disclosure was actually issued and issued by the valid Disclosure Identity Provider.
Disclosure Verification
A procedure to control the correspondence of the elements in the Disclosure and the actual data processing and data collection practices of the product. Disclosure Verification is similar to certification procedures to check whether what was disclosed is true, and to elevate the level of trust to the product and product’s vendor.
Disclosure Identity Provider
The automated Disclosure processing is enabled by requests to both the Open Ethics Disclosure database powered by Disclosure Identity Providers (DIP) and the product’s Disclosure file, stored in the predefined location of the product’s website, following OETP specification. DIP serves as a service point to generate and retrieve generated disclosures.
Open Ethics Transparency Protocol
Open Ethics Transparency Protocol (OETP) is a protocol for product owners to generate, host, and verify a machine-readable disclosure for their AI solutions in a standardized and explicit way, without compromising IP or security. The scope of the Protocol covers Disclosures of “ethical postures” for systems such as Software as a Service (SaaS) Applications, Software Applications, Software Components, Application Programming Interfaces (API), Automated Decision-Making (ADM) systems, and systems using Artificial Intelligence (AI). OETP aims to bring more transparent, predictable, and safe environments for the end-users.
Training Data
The training data is an initial set of data items (examples) used to train an AI model to produce results. Typically, in AI model development, training sets make up the majority of the total data available for model development. The allocation ratio between Training, Test, and Validation is usually around 60%:20%:20%.
Data Labelling
In supervised machine learning, a model is trained using a labeled training data set. Labeling (or tagging, annotation) procedure typically involves human Labeler to augment each item of unlabeled data with meaningful tags (or labels) that are informative. For example, labels might indicate whether a photo contains a dog or a cat, which words were uttered in an audio recording, what type of action is being performed in a video, what the topic of a news article is, what the overall sentiment of a tweet is, whether the dot in an x-ray is a tumor, etc.
Labeler
In Machine Learning, Labeler is an individual typically involved in the construction of the Training Data sets by assigning labels to every data item. Labelers use their human knowledge and experience to assign labels, which could result in biased labeling results and therefore in biased predictions by a machine learning model.
Testing Data
The testing data set is used to provide an unbiased evaluation of a final model fit on the training dataset. After a model has been processed by using the training data set, you test the model by making predictions against the test data set.
Validation Data
A validation data set is a set of data used to train AI to find and optimize the best model to solve a given problem. Validation sets are also known as dev sets.
Open Dataset
Open data is data that can be freely accessed, used, shared and built-on by anyone, anywhere, for any purpose. This is the summary of the full Open Definition.
Limited Access Dataset
A Limited Access Dataset is a set of data that may be disclosed to an outside party without data subject authorization if certain conditions are met. Usually, the purpose of the disclosure may only be for research, security, public health or health care operations. Second, the organization/person receiving the Limited Access Dataset must sign a data use agreement with Data Controller.
Proprietary Dataset
In the context of training the machine learning models, Proprietary Dataset is the fuel that can turn a commoditized workflow software into rich, competitively “defensible” engine. Proprietary Dataset could be created, aggregated, or both. There are different methods to build Proprietary Datasets, such as: scraping publicly available but scattered data, exploring partnerships with established entities, crowdourcing data collection and labeling, or collecting data from the use of the products/services.
Algorithm
In computer science, an algorithm is a finite sequence of well-defined instructions that could be implemented in a computer program, typically to solve a class of problems or to perform a computation. An algorithm is often paired with words specifying the activity for which a set of rules has been designed.
Open Source
Open-source software (OSS) is a type of computer software in which source code is released under a license where the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose. The main principle of Open Source is peer production, with products such as source code, blueprints, and documentation freely available to the public.
Proprietary Source
Proprietary software is any software that is copyrighted and bears limits against use, distribution and modification that are imposed by its publisher, vendor or developer. In general, proprietary software doesn’t provide end users or subscribers with access to its source code.
Decision Space
A decision space is the set of all possible decisions that the system can produce. Systems don’t make decisions like humans do, however they produce outputs that may directly influence the processes we use and the outcomes we rely upon. For instance, the decision space of a traffic light is a set of states that it exhibits, such as Green, Yellow, Red, blinking Yellow, and Not working at all.
Restricted Decision Space
The Decision Space is called “Restricted”, when the set of AI model outputs is known and predefined. The above example of traffic light has a Restricted Decision Space.
Unrestricted Decision Space
The Decision Space is called “Unrestricted”, when the set of AI model outputs is not limited, e.g. when either the type of output is not predicted or the output itself is generated by the model.
Machine Learning
Machine learning is a subset of methods in Artificial Intelligence (AI), studying algorithms and statistical models to provide systems the ability to autonomously learn and improve from experience without being explicitly programmed.
Supervised Learning
Supervised learning consists of mapping data to known labels which together are composed in a Training Data Set. This mapping process is done by human Subject-matter Experts provide during the Data Labeling process. Training the model means achieving sufficient accuracy of prediction for real-world examples. The testing of such models is performed using Test and Validation datasets.
Unsupervised Learning
Unsupervised learning is where the input data is unlabeled and the system tries to learn the structure from that data automatically, without any human guidance. Anomaly detection, such as flagging unusual credit card transactions to prevent fraud, is an example of unsupervised learning.
Semi-supervised Learning
Semi-supervised learning is often a combination of the first two approaches. That is, the system trains on partially labeled input data — usually a lot of unlabeled data and a little bit of labeled data. Facial recognition in photo services from Facebook and Google are real-world applications of this approach.
Reinforcement Learning
Reinforcement learning occurs when a computer system receives data in a specific environment and then learns how to maximize its outcomes for particular criteria. Reinforcement learning differs from supervised learning in not needing labeled input/output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected. Applications range from Robotics, personalized recommendations to drug development.
Robotics
Robotics is the intersection of science, engineering and technology that produces machines, called robots. A robot has three consistent characteristics: robots have a mechanical component that allows to complete tasks in the environment for which it’s designed. Robots require a source of power, typically electrical. To operate robots execute a computer program. Robots be of any form but some are made to resemble humans in appearance (called “androids”).
Transfer Learning
Transfer learning involves reusing a model that was trained while solving one problem and applying it to a different but related problem. For example, a deep learning model trained on millions of images of cats could be “fine-tuned” to detect melanoma in medical imaging.
Subject-matter Expert (SME)
The term Subject-matter Expert or Domain Expert is frequently used in software development, and the term refers to the knowledge in the domains other than the software domain. Typically, SMEs have developed their expertise in their particular discipline over a long period of time and after a great deal of immersion in the topic.
Bias
In Biased prediction (estimation, decision), the expected value of the result differs from the true underlying value being estimated. Bias is a statistical property and can be viewed as a systematic error introduced into measurement, sampling, or testing by selecting or encouraging one outcome or answer over the others.
Human-in-the-loop
Human-in-the-loop (HITL) is a design pattern in AI that leverages both human and machine intelligence to create machine learning models and to bring meaningful automation scenarios into the real-world. The human-in-the-loop approach reframes an automation problem as a Human-Computer Interaction (HCI) design problem. With this approach AI systems are designed to augment or enhance the human capacity, serving as a tool to be exercised through human interaction.
Ethics
A system of moral values and principles of conduct, governing an individual, a group or an AI system when it works as an autonomous subject of decision-making.
Open Ethics Vector
Open Ethics Vector (OEV) reflects the priorities for values about how data-driven decisions are made. OEV allows to make sure that the value sets are aligned between AI systems and its users users. Disclosing prioritized value sets for applications and their components, making them open, can help end-users to learn which apps are best for them.
Open Ethics Label
Open Ethics Label (OEL) is a set of user-facing graphical illustrations and textual descriptions of the digital product. It facilitates (ex-ante) understanding of the values and risks the product carries and translates a machine-readable Disclosure into a set of recognizable icons and marks. Additionally, icons can carry information about successful conformity assessment as a result of audit verification.
Personal Data
Typically, it’s any data that can be used to distinguish or trace a living person (data subject). Definitions of Personal Data may differ depending on legislation.
Data Subject
A data subject is an identifiable living person to whom a particular data item relates. A data subject may be given the ability to inquire about or remove their data according to a particular practice, standard, rule or regulation.
Data Controller
Data Controller is the entity that takes ownership of personal data and determines how and by whom it is handled.
Data Processor
Data Processor is the entity that handles personal data on behalf of the controller. A data processor may have subprocessors.