K-Means Clustering Quiz Questions

1. How is the optimal number of clusters typically determined in K-means clustering?

view answer: C. By employing an elbow plot or silhouette analysis
Explanation: The optimal number of clusters in K-means clustering is typically determined by employing an elbow plot or silhouette analysis to identify the point where adding more clusters does not result in a significant improvement in the within-cluster variance.
2. What is the primary assumption made by the K-means clustering algorithm?

view answer: A. Clusters have a spherical shape
Explanation: The primary assumption made by the K-means clustering algorithm is that clusters have a spherical shape, as the algorithm minimizes the Euclidean distance between data points and cluster centroids.
3. Which of the following is a limitation of K-means clustering?

view answer: D. All of the above
Explanation: All of the listed options are limitations of K-means clustering: sensitivity to the initial placement of cluster centroids, inability to handle missing data, and inability to handle categorical data.
4. What is the main difference between K-means and K-medoids clustering algorithms?

view answer: A. K-means uses centroids, while K-medoids use medoids
Explanation: The main difference between K-means and K-medoids clustering algorithms is that K-means uses centroids (mean of data points in a cluster), while K-medoids use medoids (actual data points that are most representative of a cluster).
5. In K-means clustering, what is the role of the "inertia" or "within-cluster sum of squares"?

view answer: C. It measures the compactness of clusters
Explanation: In K-means clustering, the "inertia" or "within-cluster sum of squares" measures the compactness of clusters, with lower values indicating tighter clusters.
6. How does K-means++ improve upon the original K-means algorithm?

view answer: B. By using an initialization technique that reduces the sensitivity to the initial placement of cluster centroids
Explanation: K-means++ improves upon the original K-means algorithm by using an initialization technique that reduces the sensitivity to the initial placement of cluster centroids, increasing the likelihood of finding a better clustering solution.
7. Which of the following is NOT an advantage of K-means clustering?

view answer: C. Guaranteed to find the global optimum
Explanation: K-means clustering is not guaranteed to find the global optimum, as it is sensitive to the initial placement of cluster centroids and can converge to a local minimum.
8. How does the K-means clustering algorithm deal with an empty cluster?

view answer: B. It selects a new centroid for the empty cluster from the remaining data points
Explanation: If an empty cluster is encountered in the K-means clustering algorithm, it selects a new centroid for the empty cluster from the remaining data points, typically choosing the data point with the highest distance from its current centroid.
9. Which of the following is a disadvantage of the K-means clustering algorithm?

view answer: D. All of the above
Explanation: All of the listed options are disadvantages of the K-means clustering algorithm: it assumes clusters have a spherical shape, it cannot handle categorical data, and it is sensitive to the initial placement of cluster centroids.
10. What type of data does K-means clustering work best with?

view answer: A. Continuous data
Explanation: K-means clustering works best with continuous data, as it relies on Euclidean distance to measure similarity between data points.
11. What is the time complexity of the K-means clustering algorithm?

view answer: C. O(nkI)
Explanation: The time complexity of the K-means clustering algorithm is O(nkI), where n is the number of data points, k is the number of clusters, and I is the number of iterations.
12. In K-means clustering, what is the purpose of the "elbow method"?

view answer: A. To determine the optimal number of clusters
Explanation: In K-means clustering, the "elbow method" is used to determine the optimal number of clusters by plotting the within-cluster sum of squares against the number of clusters and identifying the point where adding more clusters does not result in a significant improvement in the within-cluster variance.
13. Which of the following is a common application of K-means clustering?

view answer: A. Image segmentation
Explanation: Image segmentation is a common application of K-means clustering, as it involves partitioning an image into regions based on pixel intensities or colors.
14. What happens if the number of specified clusters in K-means clustering is too small?

view answer: B. The resulting clusters will be too broad and may not capture the underlying structure of the data
Explanation: If the number of specified clusters in K-means clustering is too small, the resulting clusters will be too broad and may not capture the underlying structure of the data, leading to suboptimal clustering solutions.
15. What happens if the number of specified clusters in K-means clustering is too large?

view answer: B. The resulting clusters will be too specific and may overfit the data
Explanation: If the number of specified clusters in K-means clustering is too large, the resulting clusters will be too specific and may overfit the data, leading to suboptimal clustering solutions.
16. What is the difference between K-means clustering and hierarchical clustering?

view answer: D. All of the above
Explanation: All of the listed options are differences between K-means clustering and hierarchical clustering: K-means is a partitional clustering method, while hierarchical clustering is a tree-based method; K-means is sensitive to outliers, while hierarchical clustering is robust to outliers; and K-means requires the number of clusters to be specified, while hierarchical clustering does not.
17. In K-means clustering, what does the term "convergence" refer to?

view answer: A. The point at which the centroids stop changing significantly
Explanation: In K-means clustering, "convergence" refers to the point at which the centroids stop changing significantly, indicating that the algorithm has reached a stable clustering solution.
18. Can K-means clustering handle non-convex clusters?

view answer: C. No, K-means clustering assumes convex clusters
Explanation: K-means clustering assumes convex clusters, as it relies on minimizing the Euclidean distance between data points and centroids. It may struggle to handle non-convex clusters without additional preprocessing or modification.
19. Which of the following is a potential solution for dealing with categorical data in K-means clustering?

view answer: D. All of the above
Explanation: All of the listed options are potential solutions for dealing with categorical data in K-means clustering: using the Gower distance, one-hot encoding categorical variables, or replacing K-means with K-medoids.
20. In K-means clustering, how are initial centroids typically selected?

view answer: D. Both A and B
Explanation: In K-means clustering, initial centroids are typically selected either randomly from the data points or by using the K-means++ initialization method, which reduces the sensitivity to the initial placement of cluster centroids.
21. How can K-means clustering be used for dimensionality reduction?

view answer: B. By using the cluster centroids as a reduced representation of the data
Explanation: K-means clustering can be used for dimensionality reduction by using the cluster centroids as a reduced representation of the data. Each data point is represented by its nearest centroid, effectively reducing the dimensionality of the dataset.
22. In K-means clustering, which of the following factors can impact the quality of the clustering solution?

view answer: D. All of the above
Explanation: All of the listed factors can impact the quality of the clustering solution in K-means clustering: the number of clusters, the distance metric used, and the initialization method.
23. How can K-means clustering be used for outlier detection?

view answer: A. By identifying data points that are far from their cluster centroids
Explanation: K-means clustering can be used for outlier detection by identifying data points that are far from their cluster centroids. These data points may be considered outliers, as they do not fit well within their assigned cluster.
24. Which of the following is a limitation of K-means clustering in handling imbalanced datasets?

view answer: A. K-means assumes that all clusters have similar sizes
Explanation: K-means clustering assumes that all clusters have similar sizes, which can be a limitation when handling imbalanced datasets, as the algorithm may not perform well on clusters with significantly different sizes.
25. Which of the following clustering algorithms can be used as an alternative to K-means clustering for handling categorical data?

view answer: C. K-modes
Explanation: K-modes is a clustering algorithm specifically designed to handle categorical data. It is an alternative to K-means clustering and replaces the mean-based centroid calculation with a mode-based calculation.
26. How can K-means clustering be extended to handle mixed-type data (both continuous and categorical)?

view answer: D. All of the above
Explanation: K-means clustering can be extended to handle mixed-type data by using the Gower distance, one-hot encoding categorical variables, and standardizing continuous variables. These preprocessing steps can help account for the differences between continuous and categorical data.
27. In K-means clustering, what is the purpose of the silhouette score?

view answer: C. To evaluate the quality of the clustering solution
Explanation: The silhouette score is a metric used to evaluate the quality of the clustering solution in K-means clustering. It takes into account both the compactness of clusters and the separation between them, with higher scores indicating better clustering solutions.
28. In K-means clustering, which of the following techniques can be used to address the sensitivity to the initial placement of cluster centroids?

view answer: D. Both B and C
Explanation: To address the sensitivity to the initial placement of cluster centroids in K-means clustering, both K-means++ initialization and running the algorithm multiple times with different initializations can be used. K-means++ improves the initial placement of centroids, while running the algorithm multiple times increases the likelihood of finding a better clustering solution by selecting the solution with the lowest within-cluster sum of squares.
29. What is the primary goal of K-means clustering?

view answer: D. Partitioning data into clusters
Explanation: The primary goal of K-means clustering is to partition data into clusters based on similarity, minimizing the within-cluster variance and maximizing the between-cluster variance.
30. Which distance metric is most commonly used in K-means clustering?

view answer: A. Euclidean distance
Explanation: Euclidean distance is the most commonly used distance metric in K-means clustering to measure the similarity between data points.

© aionlinecourse.com All rights reserved.