What is Multiple-instance learning

Multiple-instance learning: A brief overview

Introduction

Multiple-instance learning (MIL) is a type of machine learning that is becoming increasingly relevant in various domains such as computer vision, natural language processing, and drug discovery, among others. In multiple-instance learning, the training data consists of bags, which are collections of instances, where the label of the bag depends on the presence or absence of certain instances in the bag. Compared to traditional classification, where each instance has a label, multiple-instance learning is a less well-defined problem, as the labels are associated with bags instead of individual instances.

What is multiple-instance learning?

Multiple-instance learning is a type of supervised learning, where the input data consists of bags, which are collections of instances. In contrast to traditional supervised learning, where each instance has a class label, in multiple-instance learning each bag has a class label. The class label of a bag is determined by the presence or absence of certain instances. A bag is labeled positive if it contains at least one positive instance, and negative otherwise. The instances themselves are not labeled; their labels are implicitly defined by the labels of the bags they belong to.

Applications of multiple-instance learning

Computer vision: In object recognition, each image is a bag of instances (pixels), where the class label of the bag depends on the presence or absence of the object of interest in the image.
Natural language processing: In text classification, each document is a bag of instances (words), where the class label of the bag depends on the presence or absence of certain keywords in the document.
Drug discovery: In drug discovery, each molecule is a bag of instances (atoms), where the class label of the bag depends on the ability of the molecule to bind to a target protein.
Sensor networks: In sensor networks, each sensor is a bag of instances (measurements), where the class label of the bag depends on the presence or absence of certain events or phenomena being sensed.

Types of multiple-instance learning problems

There are different types of multiple-instance learning problems, depending on the availability of information about the instances and bags. The most commonly studied types of multiple-instance learning problems are:

Standard multiple-instance learning: In this case, the instances in a bag are unordered and there is no additional information provided about the bag.
Multi-label multiple-instance learning: In this case, the bags may have multiple class labels, allowing for more complex relationships between the instances and the class labels.
Instance-level structured multiple-instance learning: In this case, the instances in a bag have some predefined structure (e.g., a hierarchy, a graph), which can be used to model the relationships between the instances.
Bag-level structured multiple-instance learning: In this case, there is additional information provided about the bags, such as bag similarity (e.g., bags that are similar in some way are more likely to have the same label) or bag hierarchy (e.g., some bags may be subsets of others).

Algorithms for multiple-instance learning

Several algorithms have been proposed for multiple-instance learning, ranging from simple heuristics to more complex models.

EM-DD: This is a simple algorithm that alternates between estimating the distribution of positive and negative instances and assigning bag labels based on these distributions.
MILES: This is a more complex algorithm that extends the standard Euclidean distance metric to handle bags of instances.
MILBoost and MILSVM: These are two popular algorithms that use boosting and support vector machines, respectively, to learn a classifier for multiple-instance data.
DeepMIML: This is a recent deep learning algorithm that uses a convolutional neural network to extract features from the bags of instances and a recurrent neural network to model the relationships between the instances.

Conclusion

Multiple-instance learning is a type of machine learning that is becoming increasingly important in several domains. By considering bags of instances instead of individual instances, multiple-instance learning allows for more flexible and nuanced modeling of complex relationships between the input data and the output labels. Several algorithms have been proposed for multiple-instance learning, from simple heuristic methods to deep learning models. As multiple-instance learning continues to evolve, its applications are likely to expand to new domains and challenges.

Related AI Basics

What is Multiple-instance learning

Multiple-instance learning: A brief overview