Chapter 7 ML development life cycle

The Machine Learning Development Life Cycle
Introduction

  • Quote: "There are things known and there are things unknown, and in between are the doors of perception." —Aldous Huxley
  • Scenario: An engineer explains to his boss, David, the need for specialized infrastructure for operationalizing ML models, highlighting common misconceptions between management and technical teams.

Importance of Understanding the ML Development Life Cycle

  • Objective: To bridge the gap between management's perception and the reality of AI implementation.
  • Focus: Understanding the ML development life cycle to make informed decisions and avoid common pitfalls.

Six Phases of the ML Development Life Cycle

Phase 1: Problem Definition and Planning

    1. Criticality: Most crucial phase; can make or break a project.
    1. Activities:
    • Defining subproblems
    • Framing AI initiatives
    • Performing feasibility analysis
    • Planning AI deployment
      Example: Breaking down a hate speech detection initiative into solvable subproblems.

【重点 Phase 1】
Phase 1, Framing AI Initiatives and Feasibility Analysis
Ideally, this phase should be driven by a business leader or a domain expert in collaboration with a development team, which may consist of AI experts and software or data engineers.
the most important aspect for a successful outcome in Business AI.

  • What pain point does the AI solution solve?

  • What metrics are you looking to impact?

  • How will the AI solution integrate into your business systems?

  • Do you have the necessary data to solve the problem?

  • Model deployment planning should start early on, ideally in Phase 1: Problem Definition and Planning. As you’re looking to launch a new initiative, you’ll need to discuss how you envision using the model in your product or workflow with your development team. As part of this, you’ll also specify the constraints under which the model needs to operate.

【data scientist不是做这个工作的合适人选,domain experts and business leaders是
后面也会提到这个话题,DS不懂公司政治、离业务远、不理解业务、不喜欢关心预算,他们通常只关心算法性能、适合发论文】

Such a breakdown makes it clear which problems are well suited for AI instead
of treating the entire initiative as an AI problem. Often, in practice, problems
that are not fully fleshed out are mislabeled as AI and thrown over the fence to
data scientists. Then leaders are shocked when the data scientist comes back
with, “This is a software engineering problem,” or they say, “This is not my
job.” The truth is that there could be some AI problem in there. But the data
scientist either doesn’t know it or doesn’t want to be responsible for all the non-
AI work. I’ve seen this happen at many companies, and it can result in missed
AI opportunities.
Some experienced data scientists will, of course, assist in breaking business
problems down to bring clarity to which subproblems need AI. Sadly, many data
scientists will not have sufficient domain expertise to help refine business
problems. With that in mind, it’s crucial that domain experts and business leaders
fully articulate business problems, break problems down into solvable pieces,
and then identify potential AI opportunities, which we’ll discuss more in Part 4
of this book.

Phase 2: Data Acquisition and Preparation

  • Central Role: Data is the core of AI initiatives.
  • Activities:
    • Exploratory period to determine data availability and quality
    • Gathering, reformatting, and preparing data for model development
    • Continuous Process: Data acquisition and preparation are ongoing throughout the ML life cycle.

Phase 3: Model Development

  • Process: Training the computer to complete a task using training data.
  • Activities:
    • Model training and evaluation
    • Iterative process of improving model accuracy
      Special Requirements: May need specialized hardware and expertise.

Phase 4: Post-Development Testing (PDT)

  • Purpose: Testing the model on real data in a real-life situation.
  • Importance: Reveals performance issues not seen during development.
  • Example: Testing an email spam classifier with actual company emails.

Phase 5: Model Deployment

  • Objective: Putting the model into production.
  • Activities:
    • Real-time or batch processing of new data
    • Planning deployment early in the project
      Example: Deploying a voice recognition model in a household fridge.

Phase 6: Monitoring and Feedback

  • Ongoing Process: Monitoring model performance to detect and address degradation.
  • Activities:
    • Collecting user feedback
    • Retraining models and fixing data quality issues
      Example: Monitoring a recommender system to improve its performance.

Summary

High-Level Understanding: Key to bringing AI from an idea to a working solution.
Key Points for Business Leaders:

  • Phase 1: Most important for successful outcomes.
  • Phase 2: Central for model development and refining business problems.
  • Phase 3: Highly iterative and requires creative problem-solving.
  • Phase 4: Integral for ensuring models work as expected in practice.
  • Phase 5: Deployment planning should start early.
  • Phase 6: Ensures ongoing model performance and understanding of customer behaviors.

Responsibilities

Phase 1: Problem Definition and Planning

Responsible Parties:

Business Leaders/Domain Experts: Define business problems, success metrics, and overall vision.
AI Experts/Data Scientists: Assist in breaking down problems into subproblems and identifying AI opportunities.
Development Team: Collaborate on feasibility analysis and initial planning.

Phase 2: Data Acquisition and Preparation

Responsible Parties:

Data Engineers: Collect, clean, and prepare data for model development.
Data Scientists: Determine data requirements and address data gaps.
Business Analysts: Ensure data aligns with business objectives and needs.

Phase 3: Model Development

Responsible Parties:

Data Scientists/Machine Learning Engineers: Conduct model training, evaluation, and iterative improvements.
AI Experts: Provide specialized knowledge and expertise.
Software Engineers: Assist with integrating models into existing systems.

Phase 4: Post-Development Testing (PDT)

Responsible Parties:

Data Scientists: Conduct testing on real data and evaluate model performance.
Quality Assurance (QA) Engineers: Ensure the model meets quality standards.
Business Leaders/Domain Experts: Provide feedback on model performance in real-world scenarios.

Phase 5: Model Deployment

Responsible Parties:

Software Engineers: Handle the technical aspects of deploying the model into production.
Data Engineers: Ensure data pipelines are in place for real-time or batch processing.
AI Experts: Oversee deployment to ensure it meets business and technical requirements.
Business Leaders: Communicate business constraints and deployment needs early on.

Phase 6: Monitoring and Feedback

Responsible Parties:

Data Scientists: Monitor model performance and address any degradation.
Data Engineers: Maintain data quality and manage data pipelines.
Customer Support/UX Teams: Collect user feedback and monitor user interactions.
Business Analysts: Analyze feedback and performance metrics to inform future improvements.

Business Leaders/Domain Experts
Phase 1: Problem Definition and Planning
Define business problems and success metrics.
Provide overall vision and objectives.
Collaborate on feasibility analysis and initial planning.
Phase 4: Post-Development Testing (PDT)
Provide feedback on model performance in real-world scenarios.
Phase 5: Model Deployment
Communicate business constraints and deployment needs early on.
Phase 6: Monitoring and Feedback
Analyze feedback and performance metrics to inform future improvements.
AI Experts/Data Scientists
Phase 1: Problem Definition and Planning
Assist in breaking down problems into subproblems.
Identify AI opportunities.
Phase 3: Model Development
Conduct model training, evaluation, and iterative improvements.
Provide specialized knowledge and expertise.
Phase 4: Post-Development Testing (PDT)
Conduct testing on real data and evaluate model performance.
Phase 5: Model Deployment
Oversee deployment to ensure it meets business and technical requirements.
Phase 6: Monitoring and Feedback
Monitor model performance and address any degradation.
Data Engineers
Phase 2: Data Acquisition and Preparation
Collect, clean, and prepare data for model development.
Address data gaps and ensure data quality.
Phase 5: Model Deployment
Ensure data pipelines are in place for real-time or batch processing.
Phase 6: Monitoring and Feedback
Maintain data quality and manage data pipelines.
Software Engineers
Phase 3: Model Development
Assist with integrating models into existing systems.
Phase 5: Model Deployment
Handle the technical aspects of deploying the model into production.
Quality Assurance (QA) Engineers
Phase 4: Post-Development Testing (PDT)
Ensure the model meets quality standards.
Business Analysts
Phase 2: Data Acquisition and Preparation
Ensure data aligns with business objectives and needs.
Phase 6: Monitoring and Feedback
Analyze feedback and performance metrics to inform future improvements.
Customer Support/UX Teams
Phase 6: Monitoring and Feedback
Collect user feedback and monitor user interactions.

posted @   wlu  阅读(10)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 地球OL攻略 —— 某应届生求职总结
· 提示词工程——AI应用必不可少的技术
· Open-Sora 2.0 重磅开源!
· 周边上新:园子的第一款马克杯温暖上架
点击右上角即可分享
微信分享提示