正解:B
According to the Microsoft Azure AI Fundamentals (AI-900) official study guide and Microsoft Learn modules on machine learning concepts, ensuring that the accuracy of a predictive model can be proven requires data partitioning-specifically splitting the available data into training and testing datasets. This is a foundational concept in supervised machine learning.
When you split the data, typically about 70-80% of the dataset is used for training the model, while the remaining 20-30% is used for testing (or validation). The reason behind this approach is to ensure that the model's performance metrics-such as accuracy, precision, recall, and F1-score-are evaluated on data the model has never seen before. This prevents overfitting and allows you to demonstrate that the model generalizes well to new, unseen data.
In the AI-900 Microsoft Learn content under "Describe the machine learning process", it is explained that after cleaning and transforming the data, the next essential step is data splitting to "evaluate model performance objectively." By keeping training and testing data separate, you can prove the reliability and accuracy of the model's predictions, which is particularly crucial in sensitive domains like clinical or healthcare analytics, where decision transparency and validation are vital.
* Option A (Train the model by using the clinical data) is incorrect because you should not train and evaluate on the same data-it would lead to biased results.
* Option C (Train the model using automated ML) is incorrect because automated ML is a method for training and tuning, but it doesn't inherently prove accuracy.
* Option D (Validate the model by using the clinical data) is also incorrect if you use the same dataset for validation and training-it would not prove true accuracy.
Therefore, per Microsoft's official AI-900 study content, the verified correct answer is B. Split the clinical data into two datasets.