
Explanation:
When building a K-means clustering model, all features (variables) used in the model must be numeric in nature. According to the Microsoft Azure AI Fundamentals (AI-900) study materials and standard machine learning theory, K-means clustering is an unsupervised learning algorithm that groups data points into clusters based on their similarity - specifically by minimizing the Euclidean distance between data points and their assigned cluster centroids.
Because the K-means algorithm depends on distance calculations, it requires numeric data types. The Euclidean distance (or similar measures) can only be computed between numerical values. Therefore, all categorical or text data must first be converted into numeric form through feature engineering techniques such as one-hot encoding, label encoding, or embedding vectors, depending on the nature of the data.
Here's how K-means works in summary:
* The algorithm initializes a predefined number of centroids (K).
* Each data point is assigned to the nearest centroid based on numeric distance.
* The centroids are recalculated as the mean of the points in each cluster.
* The process repeats until convergence.
If non-numeric data (e.g., text or Boolean) were provided, the model would not be able to calculate distances accurately, leading to computational errors.
Other options are incorrect:
* Boolean and integer types can represent numeric values but are considered special cases; the algorithm requires general numeric representation (e.g., continuous values).
* Text cannot be processed directly without conversion.
Thus, according to Azure Machine Learning and AI-900 official concepts, all features in a K-means clustering model must be numeric to ensure valid mathematical operations and clustering accuracy.
