Vanguard, a leading investment firm, has successfully transitioned its Financial Advisor Services (FAS) division from a single Amazon Redshift cluster to a sophisticated multi-warehouse architecture. This evolution was driven by the need to manage exponential growth in data analytics requirements, which included an increase in ETL jobs, dashboards, and user queries.
Understanding Vanguard's Analytics Needs
Vanguard's FAS division oversees a vast array of assets and supports numerous financial advisors across the country. The scale and complexity of its operations generate significant amounts of data, necessitating advanced analytics for insights, compliance, and operational efficiency.
Key Use Cases Addressed
- Operational Efficiency: Enhanced sales tracking and compensation management.
- Data Science: Improved customer segmentation and marketing campaign effectiveness.
- Exploratory Analytics: Facilitated ad-hoc analysis and competitive comparisons.
Initial Architecture Challenges
Initially, Vanguard's analytics relied on a fragmented system that resulted in a 'data swamp' of unstructured data. This setup hindered consistent reporting and decision-making.
Growth in Data Requirements
Over two years, FAS experienced significant growth:
- 20 AWS Glue ETL jobs expanded to over 600.
- Data volume surged to 20 TB in Amazon Redshift and 150 TB in S3.
- Tableau dashboards increased from 20 to over 500.
Transition to Multi-Warehouse Architecture
To combat performance bottlenecks, Vanguard implemented a multi-warehouse architecture utilizing Amazon Redshift's advanced data sharing capabilities. This architecture allows for workload isolation and independent scaling.
Architecture Overview
The new setup includes:
- Producer Cluster: A provisioned Amazon Redshift cluster dedicated to ETL processing.
- Consumer Workgroups: Serverless instances that auto-scale based on demand for various analytical tasks.
Results Achieved
The transition led to measurable improvements:
- 100% adherence to ETL SLAs, ensuring timely data availability.
- Increased analyst productivity with the removal of restrictive query timeouts.
- Enhanced ability to run complex analytical workloads independently.
Looking Ahead: Data Mesh Architecture
As Vanguard continues to evolve, it is exploring a data mesh architecture to decentralize data ownership and enhance scalability. This approach aims to align data management with business functions and improve operational efficiency.
Key Components of Data Mesh
- Decentralized data ownership with dedicated stewards for each domain.
- Independent data processing pipelines to reduce cross-domain dependencies.
- Self-service analytics capabilities for faster insights.
Conclusion
Vanguard's journey illustrates that scaling analytics requires more than just expanding infrastructure; it necessitates a strategic approach to data architecture. The transition to a multi-warehouse setup has positioned Vanguard to meet growing data demands effectively.