Cloud Storage には、データ サイエンス チームがモデルで使用したいさまざまなファイルがあります。現在、ユーザーには Cloud Storage 内のデータを探索、クレンジング、検証する方法がありません。データ サイエンス チームが Cloud Storage 内のデータをすばやくクレンジングおよび探索するために使用できるローコード ソリューションを探しています。どうすればよいでしょうか。
正解:C
Dataprep is a low code, serverless, and fully managed service that allows users to visually explore, cleanse, and validate data in Cloud Storage. It also provides features such as data profiling, data quality, data transformation, and data lineage. Dataprep is integrated with BigQuery, so users can easily export the prepared data to BigQuery for further analysis or modeling. Dataprep is a suitable solution for the data science team to quickly and easily work with the data in Cloud Storage, without having to write code or manage infrastructure. The other options are not as suitable as Dataprep for this use case, because they either require more coding, more infrastructure management, or more data movement. Loading the data into BigQuery, either directly or through Dataflow, would incur additional costs and latency, and may not provide the same level of data exploration and validation as Dataprep. Creating an external table in BigQuery would allow users to query the data in Cloud Storage, but would not provide the same level of data cleansing and transformation as Dataprep. References:
* Dataprep overview
* Dataprep features
* Dataprep and BigQuery integration