Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessSam Altman's Sister Amends Lawsuit Accusing OpenAI CEO of Sexual Abuse - GV WireGoogle News: OpenAI‘System failure’ paralyzes Baidu robotaxis in ChinaTechCrunch AIThe Perils of AI-Generated Legal Advice for Dealers and Finance Companies - JD SupraGoogle News: Generative AIDrones Reportedly Being Used to Help Smugglers Cross the U.S.-Mexico BorderInternational Business TimesCrack ML Interviews with Confidence: Anomaly Detection (20 Q&A)Towards AIInspectMind AI (YC W24) Is HiringHacker News TopMicrosoft CFO’s AI Spending Runs Up Against Tech Bubble FearsBloomberg TechnologyWhy Traditional Defenses Can’t Hide AI Traffic Patterns - Security BoulevardGoogle News: Machine LearningHow We Built an EdTech Platform That Scaled to 250K Daily UsersDEV CommunityClaude Code leak puts Anthropic on the other side of the copyright battleBusiness InsiderPrivate equity-backed cardiology practice adding new in-house smart lab powered by AI - cardiovascularbusiness.comGoogle News: AIBuilding Trust in Generative AI Together: Cisco’s Role in the NIST GenAI Program - Cisco BlogsGoogle News: Generative AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessSam Altman's Sister Amends Lawsuit Accusing OpenAI CEO of Sexual Abuse - GV WireGoogle News: OpenAI‘System failure’ paralyzes Baidu robotaxis in ChinaTechCrunch AIThe Perils of AI-Generated Legal Advice for Dealers and Finance Companies - JD SupraGoogle News: Generative AIDrones Reportedly Being Used to Help Smugglers Cross the U.S.-Mexico BorderInternational Business TimesCrack ML Interviews with Confidence: Anomaly Detection (20 Q&A)Towards AIInspectMind AI (YC W24) Is HiringHacker News TopMicrosoft CFO’s AI Spending Runs Up Against Tech Bubble FearsBloomberg TechnologyWhy Traditional Defenses Can’t Hide AI Traffic Patterns - Security BoulevardGoogle News: Machine LearningHow We Built an EdTech Platform That Scaled to 250K Daily UsersDEV CommunityClaude Code leak puts Anthropic on the other side of the copyright battleBusiness InsiderPrivate equity-backed cardiology practice adding new in-house smart lab powered by AI - cardiovascularbusiness.comGoogle News: AIBuilding Trust in Generative AI Together: Cisco’s Role in the NIST GenAI Program - Cisco BlogsGoogle News: Generative AI

How do data scientists combine Kedro and Databricks?

kedro.orgApril 14, 20231 min read0 views
Source Quiz

In recent research, we learned that Databricks is the dominant machine-learning platform used by Kedro users. We investigated aspects of our users' data science workflows to prioritise seamless Kedro usage, and we are collaborating with Databricks on our findings.

In recent research, we found that Databricks is the dominant machine-learning platform used by Kedro users.

The purpose of the research was to identify any barriers to using Kedro with Databricks; we are collaborating with the Databricks team to create a prioritized list of opportunities to facilitate integration. For example, Kedro is best used with an IDE, but IDE support on Databricks is still evolving, so we are keen to understand the pain points that Kedro users face when combining it with Databricks.

Our research took qualitative data from 16 interviews, and quantitative data from a poll (140 participants) and a survey (46 participants) across the McKinsey and open-source Kedro user bases. We analysed two user journeys.

How to ensure a Kedro pipeline is available in a Databricks workspace

The first user journey we considered is how a user ensures the latest version of their pipeline codebase is available within the Databricks workspace. The most common workflow is to use Git, but almost a third of the users in our research set said there were a lot of steps to follow. The alternative workflow, which is to use dbx sync to Databricks repos, was used by less than 10% of the users we researched, indicating that awareness of this option is low.

How to ensure the latest version of a Kedro pipeline runs on Databricks

How to run Kedro pipelines using a Databricks cluster

The second user journey is how users run Kedro pipelines using a Databricks cluster. The most popular method, used by over 80% of participants in our research, is to use a Databricks notebook, which serves as an entry point to run Kedro pipelines. We discovered that many users were unaware of the IPython extension that significantly reduces the amount of code required to run Kedro pipelines in Databricks notebooks.

We also found that some users run their Kedro pipelines by packaging them and running the resulting Python package on Databricks. However, Kedro did not support the packaging of configurations until version 18.5, which has caused problems. The final option some users select is to use Databricks Connect, but this is not recommended since it is soon to be sunsetted by Databricks.

How to run a Kedro pipeline on a Databricks cluster.

Watch the discussion about Kedro and Databricks from a recent Kedro update meeting

The output of our research

To make it easier to pair Kedro and Databricks, we are updating Kedro’s documentation to cover the latest Databricks features and tools, particularly the development and deployment workflows for Kedro on Databricks with DBx. The goal is to help Kedro users take advantage of the benefits of working locally in an IDE and still deploy to Databricks with ease.

You can expect this new documentation to be released in the next one to two weeks.

We will also be creating a Kedro Databricks plugin or starter project template to automate the recommended steps in the documentation.

Coming soon…

We have a managed Delta table dataset available in our Kedro datasets repo, which will be available for public consumption soon. We are also planning to support managed MLflow on Databricks.

We have set up a milestone on GitHub so you can check in on our progress and contribute if you want to. To suggest features to us, report bugs, or just see what we’re working on right now, visit the Kedro projects on GitHub. We welcome every contribution, large or small.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

platformresearchfindings

Knowledge Map

Knowledge Map
TopicsEntitiesSource
How do data…platformresearchfindingskedro.org

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 177 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers