Automated data profiling and quality scan via Dataplex

Automated data profiling and quality scan via Dataplex

11.269 Lượt nghe
Automated data profiling and quality scan via Dataplex
Data quality is a critical concern within a complex data environment, particularly when dealing with a substantial volume of data distributed across multiple locations. To systematically identify and visualise potential issues, establish periodic scans, and notify the relevant teams at an organisational level on a significant scale, where should one begin? This is precisely where the automated data profiling and data quality scanning capabilities of Dataplex on Google Cloud can prove invaluable. Requiring no infrastructure setup and offering a straightforward method for defining and implementing rules for data profiling and quality checks, it could serve as an excellent foundation for your large-scale data quality framework. 01:16 - Data Profiling vs Data Quality Scan 02:37 - Dataplex auto profiling 08:15 - Dataplex auto data quality scan 10:47 - Profiling hinted quality rules & YAML via CLI 18:36 - Other options to create scans 21:08 - Sensitive data considerations 22:02 - Summary Slide: https://drive.google.com/file/d/13khs0dM-TTsqcoyqWanTSDJHcAHpptqV/view Repo: https://github.com/rocketechgroup/dataplex_auto_dq --- Need to modernise your data stack? I specialise in Google Cloud solutions, including migrating your analytics workloads into BigQuery, optimising performance, and tailoring solutions to fit your business needs. With deep expertise in the Google Cloud ecosystem, I’ll help you unlock the full potential of your data. Curious about my work? Check out https://www.fundamenta.co/my-work to see the impact I’ve made. Let’s chat! Book a call at https://calendly.com/richard-he-fundamenta or email [email protected]. 🚀📊