How to use sample weights with tensorflow datasets?

Written by- Aionlinecourse1783 times views


This article will help you understand how to use sample weights with tensorflow datasets.

There are two types of weighting schemes that are used in machine learning, namely, implicit and explicit weighting. The first one is used when the user has no idea about the distribution of the data that they have. The second one is used when the user knows about the distribution of their data.

The implicit weighting scheme is also called as stratified sampling or proportional sampling. If you want to make sure that your dataset has a certain proportion of samples for each class, then this is what you should do. You can also use this method if you want to make sure that all classes are represented equally in your dataset.

Solution:

From the documentation of tf.keras model.fit():

sample_weight

[...] This argument is not supported when x is a dataset, generator, or keras.utils.Sequence instance, instead provide the sample_weights as the third element of x.

What is meant by that? This is demonstrated for the Dataset case in one of the official documentation turorials:

sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.0

# Create a Dataset that includes sample weights
# (3rd element in the return tuple).
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train, sample_weight))

# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

model = get_compiled_model()
model.fit(train_dataset, epochs=1)

See the link for a full-fledged example.


Thank you for reading the article. If you face any problem, please comment below.