Swami Sivasubramanian Shares Insights Into Future-Proofing, Connecting and Democratizing Data
by Ken Pagano, Senior Customer Solutions Architect, Onica by Rackspace Technology
This was my third trip to AWS re:Invent, and this year I arrived with a focus on data analytics and data science in hopes of gaining a deeper understanding of situations that our customers frequently engage us with. The technical sessions I signed up for did just that, but I was more impressed with the demand for these sessions. Most, if not all of them, were filled to capacity. Themes varied greatly, but there were some common threads in most of the sessions I attended. Those included discussions of how data in ETL pipelines needs to process quickly before it depreciates in value, how organizations need to solve for scalability when processing large amounts of data, and the democratization of data. These topics surfaced again during Swami Sivasubramanian’s keynote this morning.
Sivasubramanian, Vice President of Data and Machine Learning at AWS, took these themes a step further using colorful analogies of neurosciences and the human brain, and the lessons learned from ancient Indian tribes who grew tree roots for building bridges across valleys. Secretly though, I was captivated by the parallels he drew in his presentation about how data in organizations can be compared to the human thought process, and how the obstacles within organizations prevent data that is naturally stored in silos from flowing through analytical pathways.
There were three main themes throughout Sivasubramanian’s keynote that were addressed as modern data strategies that all organizations can follow. The first was the future-proofing a data foundation to remove heavy lifting, the second was about weaving the connective tissue between silos, and the third was about democratizing data across the organization, all referencing the same insightful analogies that he led off with.
Not too far into the session, the first major announcement of Amazon Athena for Apache Spark was made, a new capability that Athena has for interactive analytics on Apache Spark, which enables users to build spark applications using a simplified notebook interface in the Athena console or Athena APIs. A few minutes later, Amazon DocumentDB Elastic Clusters was also announced, which provides a capability to scale and handle any number of read/write requests with little to no downtime.
The first guest speaker was Rathi Murthy, Expedia Group CTO and President of Expedia Product and Technology. Murthy spoke about how her organization gathers their customer travel behaviors and partner needs as a catalyst to not only transform their own company, but the travel industry itself. Murthy spoke to the power of data and innovation. She shared how her team uses AI and machine learning services, such as HA configurations of EKS, DynamoDB, and SageMaker, to make nearly 600 billion AI predictions per year, powered by over 70 petabytes of data, and how they have 360,000 permutations of one page of one of their brands to demonstrate scale. She demonstrated the innovation within her travel-booking business model, which enhances the customer experiences by incorporating recommendations and predictions linked to flight routes so customers can book their travel with confidence.
We also learned that Geospatial ML for Amazon SageMaker now supports built-in visualization tools and pre-trained neural nets for common use cases. This announcement was followed by a second guest speaker, Kumar Chellapilla, GM, ML/AI Services, AWS, who gave a compelling demonstration of how machine learning and readily available satellite imagery can help forecast natural disasters and manage emergency response times using geospatial data to make life-saving decisions for first responders.
I was also excited to learn that the AWS Machine Learning University now offers educator training, a train-the-trainer program that grants scholarships which are meant to help educational institutions keep up with the demand for machine learning. AWS predicts that the growth of AI and machine learning services will create so much demand that it will soon outpace the supply of educators in the discipline. Some could argue that’s already happened, but this program announcement emphasizes the importance that the machine learning practice already has in our field.
Shikha Verma, Head of Product, Amazon DataZone, gave a demonstration of how data producers and consumers —such as analysts, scientists and engineers — can be managed within a unified zone to govern and share resources that are commonly difficult to assign permissions or give access to.
Anna Berg Asberg, Global VP, R&D, AstraZeneca, gave a compelling and heartfelt presentation about how AstraZeneca uses data and AI/machine learning to help protect the lives of patients. She spoke about how the scale of the genome database is massive, using 25 petabytes of data across the AWS global network, and how their environment uses Step Functions, Lambda and AWS Batch for optimizing their compute workloads and Amazon S3 for storage. Over 110 billion statistical tests can be achieved in under 30 hours, leading up to actionable insights for scientists to use. Asberg also explained how patient data, tumor tissue data, and medical images are pulled together to detect patterns in patients and make predictions for them, and how there’s explosive growth in this area. Her main message focused on how organizations need to democratize data by using SageMaker and Service Catalog to make MLOps environments in only minutes.
Wrapping up, Sivasubramanian also referenced the time lost to the manual efforts involved when organizations and teams try to connect data across silos. This effort often requires very complex extract, transform and load (ETL) processes. So, every time an organization wants to make a new data query, it has to build a different machine learning model and ETL data pipeline for it. That’s why AWS is investing in a Zero ETL future, so that data integration can be seamless and organizations don’t have to manually build data pipelines each time this happens.
Sivasubramanian’s final thoughts for the session referenced what it takes for an organization to create meaningful insights into data. He stated “it’s individuals who ultimately create these sparks, but it is the responsibility of leaders to empower them with a data-driven culture to help them get there.” Imagine what insights your organization can gain with the proper data analytics foundation in place.
Are You Realizing the Cloud Optimization Benefits of Kubernetes and Containers?
September 22nd, 2023
Google Cloud Next ’23 Highlights— AI and Beyond
September 14th, 2023
Why You Need an MLOps Framework for Standardizing AI and Machine Learning Operations
September 12th, 2023