Hierarchical Clustering Quiz Questions

1. What are the two main types of hierarchical clustering?

view answer: B. Agglomerative and divisive
Explanation: The two main types of hierarchical clustering are agglomerative and divisive.
2. In agglomerative hierarchical clustering, what does the algorithm begin with?

view answer: A. Each data point in a separate cluster
Explanation: In agglomerative hierarchical clustering, the algorithm begins with each data point in a separate cluster and successively merges clusters until a stopping criterion is met.
3. In divisive hierarchical clustering, what does the algorithm begin with?

view answer: B. All data points in one cluster
Explanation: In divisive hierarchical clustering, the algorithm begins with all data points in one cluster and successively splits clusters until a stopping criterion is met.
4. What is a dendrogram?

view answer: A. A diagram that represents the tree structure of hierarchical clustering
Explanation: A dendrogram is a diagram that represents the tree structure of hierarchical clustering, visualizing the relationships between clusters and data points.
5. What is the purpose of a linkage function in hierarchical clustering?

view answer: B. To determine the distance between clusters
Explanation: The purpose of a linkage function in hierarchical clustering is to determine the distance between clusters, which is used to guide the merging or splitting of clusters.
6. What is single linkage in hierarchical clustering?

view answer: A. The minimum distance between data points in two clusters
Explanation: Single linkage in hierarchical clustering is the minimum distance between data points in two clusters.
7. What is complete linkage in hierarchical clustering?

view answer: B. The maximum distance between data points in two clusters
Explanation: Complete linkage in hierarchical clustering is the maximum distance between data points in two clusters.
8. What is average linkage in hierarchical clustering?

view answer: C. The average distance between data points in two clusters
Explanation: Average linkage in hierarchical clustering is the average distance between data points in two clusters.
9. How is the optimal number of clusters determined in hierarchical clustering?

view answer: C. By examining the dendrogram and selecting an appropriate cut-off point
Explanation: In hierarchical clustering, the optimal number of clusters is determined by examining the dendrogram and selecting an appropriate cut-off point, which represents the desired level of granularity in the clustering solution.
10. What is the main advantage of hierarchical clustering over K-means clustering?

view answer: A. It does not require specifying the number of clusters in advance
Explanation: The main advantage of hierarchical clustering over K-means clustering is that it does not require specifying the number of clusters in advance. The dendrogram allows the user to choose the optimal number of clusters based on the desired level of granularity.
11. Which of the following is a limitation of hierarchical clustering?

view answer: D. All of the above
Explanation: Hierarchical clustering has several limitations, including sensitivity to the choice of linkage function, inability to handle large datasets due to computational complexity, and inability to undo previous steps, as merging or splitting decisions are final.
12. Which of the following is NOT a distance metric used in hierarchical clustering?

view answer: D. Pearson correlation coefficient
Explanation: Although the Pearson correlation coefficient can be used to measure similarity between data points, it is not a distance metric. Euclidean distance, Manhattan distance, and cosine similarity are common distance metrics used in hierarchical clustering.
13. In hierarchical clustering, which of the following techniques can be used to handle categorical data?

view answer: A. Gower distance
Explanation: Gower distance is a distance metric specifically designed for handling mixed-type data, including categorical variables. One-hot encoding and standardization are not appropriate for handling categorical data in hierarchical clustering.
14. What is Ward's method in hierarchical clustering?

view answer: A. A linkage method that minimizes the total within-cluster variance
Explanation: Ward's method is a linkage method in hierarchical clustering that minimizes the total within-cluster variance, which helps to create more compact and well-separated clusters.
15. Can hierarchical clustering be used for outlier detection?

view answer: A. Yes, by identifying small clusters or isolated data points in the dendrogram
Explanation: Hierarchical clustering can be used for outlier detection by identifying small clusters or isolated data points in the dendrogram, which may represent unusual or rare observations.
16. Which of the following is a disadvantage of hierarchical clustering compared to K-means clustering?

view answer: D. Hierarchical clustering is more computationally expensive
Explanation: Hierarchical clustering is more computationally expensive than K-means clustering, especially for large datasets, due to the complexity of the merging or splitting process.
17. How is the cophenetic correlation coefficient used in hierarchical clustering?

view answer: A. To measure the agreement between the original distances between data points and the distances represented in the dendrogram
Explanation: The cophenetic correlation coefficient is used in hierarchical clustering to measure the agreement between the original distances between data points and the distances represented in the dendrogram. A high cophenetic correlation indicates that the dendrogram preserves the pairwise distances well, while a low value suggests that the dendrogram may not accurately represent the original data structure.
18. Which of the following is NOT a common stopping criterion for hierarchical clustering?

view answer: D. The total within-cluster sum of squares is minimized
Explanation: The total within-cluster sum of squares is not a common stopping criterion for hierarchical clustering. The other three options are commonly used stopping criteria for determining when to stop merging or splitting clusters.
19. What is the primary difference between bottom-up and top-down hierarchical clustering?

view answer: A. Bottom-up starts with each data point in a separate cluster, while top-down starts with all data points in a single cluster
Explanation: The primary difference between bottom-up (agglomerative) and top-down (divisive) hierarchical clustering is that bottom-up starts with each data point in a separate cluster and merges them iteratively, while top-down starts with all data points in a single cluster and splits them iteratively.
20. Which distance metric is more appropriate for high-dimensional data in hierarchical clustering?

view answer: C. Cosine similarity
Explanation: Cosine similarity is more appropriate for high-dimensional data in hierarchical clustering because it is less affected by the curse of dimensionality compared to Euclidean or Manhattan distance, as it measures the angle between data points rather than the absolute distance.
21. Can hierarchical clustering handle missing data?

view answer: B. Yes, by using distance metrics that can handle missing data
Explanation: Hierarchical clustering can handle missing data by using distance metrics that can handle missing data, such as Gower distance or other custom distance measures.
22. What is the primary advantage of using hierarchical clustering for time series data?

view answer: D. It can capture the temporal structure of the data
Explanation: The primary advantage of using hierarchical clustering for time series data is that it can capture the temporal structure of the data, as it considers the relationships between data points over time.
23. How does dynamic time warping (DTW) distance differ from other distance metrics in hierarchical clustering for time series data?

view answer: A. DTW distance is invariant to time shifts and scaling
Explanation: Dynamic time warping (DTW) distance is a distance metric that is invariant to time shifts and scaling, making it particularly suitable for time series data where the temporal alignment of the series may not be perfect.
24. How can hierarchical clustering be used for feature selection?

view answer: D. By clustering features and selecting representative features from each cluster
Explanation: Hierarchical clustering can be used for feature selection by clustering features and selecting representative features from each cluster, which helps to reduce redundancy and retain the most informative features.
25. What is the primary goal of cluster validation in hierarchical clustering?

view answer: B. To assess the quality and stability of the clustering solution
Explanation: The primary goal of cluster validation in hierarchical clustering is to assess the quality and stability of the clustering solution, which can provide insights into the appropriateness of the chosen linkage function, distance metric, and the number of clusters.
26. What is the silhouette score in hierarchical clustering?

view answer: D. A measure of both the compactness and separation of clusters
Explanation: The silhouette score in hierarchical clustering is a measure of both the compactness (how close data points within a cluster are to each other) and separation (how far apart different clusters are) of clusters. It can be used to assess the quality of a clustering solution.
27. Can hierarchical clustering be applied to text data?

view answer: A. Yes, by converting text data into numerical representations, such as term frequency-inverse document frequency (TF-IDF) vectors
Explanation: Hierarchical clustering can be applied to text data by converting text data into numerical representations, such as term frequency-inverse document frequency (TF-IDF) vectors, and using appropriate distance metrics, such as cosine similarity.
28. What is the main difference between hierarchical clustering and partitional clustering methods?

view answer: B. Hierarchical clustering uses a tree structure to represent the relationships between clusters, while partitional clustering does not
Explanation: The main difference between hierarchical clustering and partitional clustering methods is that hierarchical clustering uses a tree structure (dendrogram) to represent the relationships between clusters, while partitional clustering methods (e.g., K-means) do not.
29. How can hierarchical clustering be used for dimensionality reduction?

view answer: A. By applying the clustering algorithm to the features instead of the data points
Explanation: Hierarchical clustering can be used for dimensionality reduction by applying the clustering algorithm to the features instead of the data points. This results in a tree structure that can be used to identify groups of similar features, allowing for the selection of representative features from each group and reducing the overall dimensionality of the dataset.
30. Which of the following methods is NOT a hierarchical clustering algorithm?

view answer: C. K-means clustering
Explanation: K-means clustering is a partitional clustering algorithm, not a hierarchical clustering algorithm. Agglomerative clustering, divisive clustering, and Ward's method are all hierarchical clustering algorithms.

© aionlinecourse.com All rights reserved.