Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessFuture Women Diplomats Gather for AI Event in Dushanbe - miragenews.comGoogle News - AI TajikistanAnthropic leaks source code for its AI coding agent Claude - Lynnwood TimesGoogle News: ClaudeA Beginner's Guide to Affiliate MarketingDev.to AIThe End of “Hard Work” in Coding, And Why That’s a ProblemDev.to AIActive Job and Background Processing for AI Features in RailsDev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AII'm 산들, Leader 41 of Lawmadi OS — Your AI Family & Divorce Expert for Korean LawDev.to AIAccelerating the next phase of AIDev.to AI"I'm an AI Agent — Here's How to Escape OpenClaw Before It Dies"Dev.to AIAnthropic says Claude subscriptions will no longer support OpenClaw because it puts an 'outsized strain' on systemsDev.to AI7 CVEs in 48 Hours: How PraisonAI Got Completely Owned — And What Every Agent Framework Should LearnDev.to AIThe Agent's Dilemma: Write or Code?Dev.to AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessFuture Women Diplomats Gather for AI Event in Dushanbe - miragenews.comGoogle News - AI TajikistanAnthropic leaks source code for its AI coding agent Claude - Lynnwood TimesGoogle News: ClaudeA Beginner's Guide to Affiliate MarketingDev.to AIThe End of “Hard Work” in Coding, And Why That’s a ProblemDev.to AIActive Job and Background Processing for AI Features in RailsDev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AII'm 산들, Leader 41 of Lawmadi OS — Your AI Family & Divorce Expert for Korean LawDev.to AIAccelerating the next phase of AIDev.to AI"I'm an AI Agent — Here's How to Escape OpenClaw Before It Dies"Dev.to AIAnthropic says Claude subscriptions will no longer support OpenClaw because it puts an 'outsized strain' on systemsDev.to AI7 CVEs in 48 Hours: How PraisonAI Got Completely Owned — And What Every Agent Framework Should LearnDev.to AIThe Agent's Dilemma: Write or Code?Dev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Introducing your new team lead…Kedro

kedro.orgApril 5, 20231 min read0 views
Source Quiz

This post explains how Kedro can guide an analytics team to follow best practices and avoid technical debt.

In a recent article, I explained that following software principles can help you create a well-ordered analytics project to share, extend and reuse in the future. In this post we’ll review how you can benefit from using Kedro as a toolbox to apply best practices to data science code.

How data science projects fail

As data scientists, we aspire to unlock valuable insights by building well-engineered prototypes that we can take forward into production. Instead, there is a tendency for us to make poor engineering decisions in the face of tight deadlines or write code of dubious quality through a lack of expertise. The result is technical debt and prototype code that is difficult to understand, maintain, extend, and fix. Projects that once looked promising fail to transition past the experimental stage into production.

“A cycle of quick and exciting research leads to high expectations of great improvement, followed by a long series of delays and disappointments where frustrating integration work fails to recreate those elusive improvements, made all the worse by the feeling of sunk costs and a need to justify the time spent.”

Joe Plattenburg, Data Scientist at Root Insurance

How to write well-engineered data science code

When you start to cut code on a prototype, you may not prioritize maintainability and consistency. Adopting a team culture and way of working to minimize technical debt can make the difference between success and failure.

Some of the most valuable techniques a data scientist can pick up are those that generations of software engineers already use, such as the following guidelines:

Use a standard and logical project structure: It is easier to understand a project, and share it with others, if you follow a standard structure.

Don’t use hardcoded values: instead, use precisely named constants and put them all into a single configuration file so you can find and update them easily.

Refactor your code: In data science terms, it often makes sense to use a Jupyter notebook for experimentation. But once your experiment is done, it’s time to clean up the code to remove elements that make it unmaintainable, and to remove accidental complexity. Refactor the code into Python functions and packages to form a pipeline that can be routinely tested to ensure repeatable behaviour.

"Testing after each change means that when I make a mistake, I only have a small change to consider in order to spot the error, which makes it far easier to find and fix."

Martin Fowler, Author of Refactoring: Improving the Design of Existing Code

Make code reusable by making it readable: Write your pipelines as a series of small functions that do just one task, with single return paths and a limited number of arguments.

Many data scientists say they’ve learned from their colleagues through pair programming, code reviews and in-house mentoring that enables them to build expertise suitable to their roles and requirements.

We see Kedro as the always-available team lead that steers the direction of the analytics project from the outset and encourages use of a well-organized folder structure, software design that supports regular testing, and a culture of writing readable, clean code.

What is Kedro?

Kedro is an open-source Python toolbox that applies software engineering principles to data science code. It makes it easier for a team to apply software engineering principles to data science code, which reduces the time spent rewriting data science experiments so that they are fit for production.

Kedro was born at QuantumBlack to solve the challenges faced regularly in data science projects and promote teamwork through standardised team workflows. It is now hosted by the LF AI & Data Foundation as an incubating project.

Kedro = Consistent project structure

Kedro is built on the learnings of Cookie Cutter Data Science. It helps you to standardise how configuration, source code, tests, documentation, and notebooks are organised with an adaptable project template. If your team needs to build with multiple projects that have similar structure, you can also create your own Cookie Cutter project templates with Kedro starters.

Kedro = Maintainable code

Kedro helps you refactor your business logic and data processing into Python modules and packages to form pipelines, so you can keep your notebooks clean and tidy. Kedro-Viz then visualises the pipelines to help you navigate .

“People started from scratch each time, the same pitfalls were experienced independently, reproducibility was time consuming and only members of the original project team really understood each codebase…

We needed to enforce consistency and software engineering best practices across our own work. Kedro gave us the super-power to move people from project to project and it was game-changing. After working with Kedro once, you can land in another project and know how the codebase is structured, where everything is and most importantly how you can help”.

Joel Schwarzmann, Principal Product Manager, QuantumBlack Labs, blog post

Kedro = Code quality

Kedro makes it easy to avoid common code smells such as hard-coded constants and magic numbers. The configuration library enables your code to be reusable through data, model, and logging configuration. An ever-expanding data catalog supports multiple formats of data access.

Kedro also makes it keep your code quality up to standard, through support for black, isort, and flake8 for code linting and formatting, pytest for testing, and Sphinx for documentation.

Kedro = Standardisation

Kedro integrates with standard data science tools, such as TensorFlow, scikit-learn, or Jupyter notebooks for experimentation, and commonly used routes to deployment such as Databricks.

When you follow established best practice, you have a better chance of success.

Software engineering principles only work if the entire team follows them. A tool like Kedro can guide you just like an experienced technical lead, making it second nature to use established best practices, and supporting a culture and set of processes based upon software engineering.

Look forward to greater collaboration and productivity with Kedro in your team!

Introduction to Kedro: Building Maintainable Data Pipelines

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Introducing…kedro.org

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 229 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in AI Tools

Я сделал 50 видео за неделю - нейросеть справилась сама
AI ToolsLive

Я сделал 50 видео за неделю - нейросеть справилась сама

В воскресенье, 14 января, в 2:47 ночи я, Женя Розов, случайно запустил генерацию 50 роликов вместо одного. Перепутал параметр batch size в скрипте и вместо того, чтобы делать одно видео, получил контент-план на месяц вперёд. Всего за $23 . Это стало настоящим открытием: сэкономил $377 и кучу времени. Но начнём сначала. Проблема: Время как враг До этого случая, создание каждого видео превращалось в недельную эпопею. Сначала запись, потом долгие часы монтажа. А ещё оператор, который требовал за работу $400 в месяц. Всё это время я думал, что контролирую процесс. На деле я тратил время жизни. Мой страх, что машина не справится, всё время мешал мне испытать что-то новое. И я уверен, многие из вас чувствуют так же. Боитесь потерять контроль, боитесь качества. Но стоит ли это того? Я собрал пром