Amazon Enhances Data Discovery by Integrating Catalogs with SageMaker

Amazon is addressing the challenges enterprises face when teams create data assets outside of centralized data catalogs. This fragmentation complicates data discovery and hinders collaboration. To combat this, Amazon's Business Data Technologies (BDT) team has developed an enterprise data catalog known as Andes, designed for sharing datasets under established policies. However, the emergence of local datasets and non-tabular assets, such as dashboards and metrics, outside of Andes has complicated the discovery process.

The integration of Amazon SageMaker with Andes aims to enhance the overall data discovery experience. SageMaker's support for multimodal catalogs and its integration with enterprise identity management make it a fitting choice for extending Andes' governance model.

Challenges in Data Discovery

Previously, users had to navigate multiple catalogs based on the asset type, leading to inefficiencies. Teams spent significant time indexing various catalogs to find the appropriate one for their needs, diverting attention from solving business issues.

Key Capabilities for Integration

To streamline the process, the BDT team identified four critical capabilities necessary for effective integration:

  • Catalog Connectors: These connectors facilitate the ingestion of data assets into SageMaker while ensuring business continuity and governance.
  • Delegated Ownership: As data systems expand, centralized governance teams can delegate permissions for catalog enrichment and metadata management.
  • Integration with Access Tools: Teams can discover and consume data through SageMaker Unified Studio and internal tools.

Unified Data Catalog

The SageMaker catalog now includes a diverse range of data assets, including datasets, dashboards, metrics, and models, all while adhering to best practices for access and usage. This unified approach simplifies the discovery and sharing of data across teams.

“SageMaker provides a unified catalog that makes discovery and sharing of data assets, metrics, and dashboards across teams straightforward, with direct integration to Andes datasets.” – Gerry Moses, Sr. Principal TPM, Amazon

Benefits of Integration

By merging existing governance tools with Amazon SageMaker, BDT is laying the groundwork for more efficient data discovery across teams. This integration allows for better collaboration and ensures that teams can easily find the data they need.

Next Steps

To explore more about Amazon SageMaker Unified Studio, interested parties can visit the AWS console for further information.

This editorial summary reflects AWS and other public reporting on Amazon Enhances Data Discovery by Integrating Catalogs with SageMaker.

Reviewed by WTGuru editorial team.