SA
Skip to main content

Databricks AI Summit 2023 Databricks Session

The landscape of data science and AI is rapidly changing, with key industry leaders like Google leading the charge. Despite the apparent gap in resources and budgets between technology giants and startups, solutions such as Databricks are evolving to bridge this gap. This report summarizes key points discussed in a recent conference around the same theme.

Importance of Data and AI

The conference strongly emphasized that the winners in every industry will be data and AI companies. However, it was also acknowledged that many companies still need help handling data effectively, often due to a need for more resources and infrastructure.

Evolution of Data Handling

Initially, handling structured data through tools like Excel, business intelligence, and data warehousing was straightforward. As unstructured and unsemantic data became more prevalent, there was a need for a more sophisticated platform that could handle data lakes, orchestration, governance, data science, data warehousing, streaming, and business intelligence (BI).

Governance and Silos

The pitfalls of poor data governance were highlighted, noting that it can lead to flawed engineering. Data silos were identified as drivers of high operational costs. Inconsistent policies and disparate tools can reduce trust in data and inhibit cross-team productivity.

Databricks' Lakehouse

Databricks proposes a solution akin to the 'iPhone of data'—the Lakehouse. It is designed to unify all data usages into a single layer, providing one copy of data with centralized governance. The concept of the Lakehouse is built on unification, which offers unique advantages in data management.

Open Source and Portability

Open-source solutions were deemed not to be advantageous in and of themselves. However, the 'open' nature of such solutions signifies portability and helps avoid lock-ins, which can benefit organizations.

Data Explosion

The conference recognized that the amount of data in circulation would only continue to grow, with a prediction of a massive explosion of data.

Cost-effectiveness

As data scales up, certain operations like ETL on platforms like Snowflake become more expensive compared to Databricks, thus making the latter a more cost-effective solution.

Real-time Streaming

Over 50% of Databricks' customers use its real-time streaming features for critical risk profiling, highlighting the importance of a platform capable of handling such operations.

AI/ML on Lakehouse

The Lakehouse AI ML aims to provide unified data and AI with one security and governance model, further simplifying data handling and usage.

Dolly

Databricks has introduced Dolly, the first truly open instruction-tuned LLM, which is licensed for commercial usage.

Build vs. Buy

The conference concluded with a discussion about the 'build vs. buy' dilemma, presenting a checklist that includes considerations such as an abundance of engineers, time availability, financial resources, and the need for a single cloud.

Conclusion

As the importance and volume of data continue to grow, companies must invest in scalable, unified, and cost-effective solutions to stay competitive. Innovations like Databricks' Lakehouse model provide promising avenues to address the unique challenges posed by the modern data landscape.