
Explanation:

The question describes a process where an AI system generates text that describes an image - for example,
"A dog playing with a ball in the park." This process is an example of image classification, a core workload in computer vision that allows a system to recognize and categorize the content of an image.
According to the Microsoft Azure AI Fundamentals (AI-900) official study guide and Microsoft Learn module "Identify Azure services for computer vision," image classification involves analyzing the pixels of an image and assigning one or more predefined categories or labels to it. In more advanced implementations, image classification models are combined with caption generation algorithms to produce descriptive text. For example, Azure AI Vision can generate captions and tags that describe an image's content, such as "outdoor scene," "a person riding a bicycle," or "a group of people smiling." Let's review the other options to clarify why they are incorrect:
* Facial detection: Identifies the presence and location of human faces in an image, but does not generate descriptive text.
* Object detection: Identifies and locates multiple objects within an image by drawing bounding boxes, not by describing the overall scene.
* Optical character recognition (OCR): Extracts text from images or scanned documents (for example, reading a street sign), but it doesn't create descriptive language about what's depicted.
Therefore, the correct answer is Image classification, as it aligns with the AI-900 learning objective that describes this task as recognizing and categorizing the main content of an image, often leading to caption generation in modern vision models such as those in Azure AI Vision.