CRISP-ML(Q). The ML Lifecycle Process

The machine learning field is currently grappling with the absence of a standardized process model for developing machine learning solutions. This lack of organization often leads to disorganized projects and irreproducible results. Typically, machine learning and data science projects are carried out in an ad-hoc manner without clear guidelines. To address this challenge and provide guidance to machine learning practitioners, a recent initiative introduced the Cross-Industry Standard Process for Machine Learning Development with Quality Assurance (CRISP-ML(Q)) methodology. This framework aims to streamline the development lifecycle of machine learning applications and ensure the quality of results.

Overall, the CRISP-ML(Q) process model describes six phases:

  1. Business and Data Understanding

  2. Data Engineering (Data Preparation)

  3. Machine Learning Model Engineering

  4. Quality Assurance for Machine Learning Applications

  5. Deployment

  6. Monitoring and Maintenance.

The CRISP-DM approach (based on [78]).

  1. Business and Data Understanding

    Before diving into the development process, it's essential to define what we aim to achieve with the project, how we'll measure success, and whether applying machine learning is feasible. Once that's clear, we embark on the arduous journey of collecting and ensuring the quality of our data—a process fraught with challenges and time-consuming tasks.

    Confirming the feasibility before setting up the ML project is a best practice in an industrial setting.

    Example:

    • Business Understanding: The school administration wants to predict student performance to provide targeted support and improve academic outcomes.

    • Data Understanding: Student records including demographics, attendance, past exam scores, etc., are collected and analyzed to understand patterns and relationships relevant to academic performance.

  2. Data Engineering (Data Preparation)

    The second phase is pretty much straightforward. We will be preparing the data for the modeling phase. It includes data selection, data cleaning, feature engineering, data augmentation, and normalization.

    1. We start with feature selection, data selection, and dealing with unbalanced classes by over-sampling or under-sampling).

    2. Then, focusing on reducing noise and dealing with missing values. For quality assurance purposes, we will add data unit testing to mitigate faulty values.

    3. Depending on your model, we perform feature engineering and data augmentations, for example, one-hot encoding and clustering.

    4. Normalizing and scaling the data. It will mitigate the risk of biased features.

To ensure reproducibility, we create data modeling, transformation, and feature engineering pipelines.

Example:

  • Data is cleaned, processed, and transformed to prepare it for analysis.

  • Missing values are handled (e.g., imputation), categorical variables are encoded, and features are scaled if necessary.

  1. Machine Learning Model Engineering

    In the modeling phase, we focus on creating machine learning models tailored to our business needs. We consider various factors like performance, fairness, and scalability. This phase involves selecting, customizing, and training models based on our problem. We ensure reproducibility by documenting all aspects of model training, including algorithms, data, and settings. Iteration is key, as we may revisit goals and data to refine our models. We package everything into a pipeline for easy and repeatable model training.

    Example:

    • Various machine learning algorithms such as logistic regression, decision trees, and neural networks are trained using the prepared data.

    • Hyperparameter tuning and model selection are performed to optimize model performance.

  2. Quality Assurance for Machine Learning Applications

    After training the model, we evaluate its performance using a separate test dataset. We also check how well it handles noisy or incorrect data to ensure robustness. It's important to create a model that can be easily understood to build trust and meet regulatory standards. Deciding when to deploy the model can be automatic based on predefined criteria or manual, involving both domain and ML experts. Just like in the modeling phase, we document all evaluation outcomes for transparency.

    Example:

    • Model performance is evaluated using metrics like accuracy, precision, recall, and F1-score.

    • Cross-validation techniques are applied to ensure the model's generalization ability.

    • Bias and fairness checks are conducted to ensure the model does not discriminate against any particular group.

  3. Deployment

    Model deployment is when the trained ML model joins existing software. After evaluation, it goes live. Methods vary based on use cases and real-time needs. Tasks include hardware definition, production evaluation, user testing, backup plans, and gradual deployment strategies like canary or green/blue deployment.

    Example:

    • The best-performing machine learning model is deployed into the school's student management system or a dedicated application.

    • Integration with existing systems is carried out to automate predictions based on new student data.

  4. Monitoring and Maintenance.

    Once the ML model is in production, it needs constant monitoring and maintenance. We watch for drops in performance due to "model staleness" and hardware/software issues. Continuous monitoring decides when to retrain the model automatically. Maintenance involves updating data, hardware, and software to keep the model effective for the business. In essence, it's about ongoing integration, training, and deployment of ML model.

Conclusion:

In this article, we explored the CRISP-ML(Q) model for ML development, focusing on risk assessment and quality assurance. The process starts with defining business goals, collecting and cleaning data, building and validating models, and deploying them. Continuous monitoring and maintenance are crucial for success, involving tracking data, software, and hardware metrics to decide on retraining or system upgrades.