Managing and Monitoring Deep Learning/Machine Learning Experiments with MLflow

What is MLflow?

MLflow is an open-source platform for managing the complete machine learning lifecycle, including training, evaluation, and deployment of models. During the development of complex models, experiments are repeatedly conducted with changing hyperparameters, data versions, source code, and model architectures. Without proper tracking, it becomes difficult to reproduce results or improve model performance.

MLflow solves these issues with the following four key components:

MLflow Tracking: Stores and compares metadata such as parameters, metrics, models, and logs
MLflow Projects: Defines code and execution environments for reproducibility
MLflow Models: A universal format to save and deploy models trained with various frameworks
MLflow Model Registry: Supports model versioning, approval, and stage transitions like Production and Staging

Why is MLflow important in DL/ML?

Deep learning and machine learning projects involve numerous experiments, each with different hyperparameters, model architectures, and data preprocessing methods. Without systematic management, the following problems may arise:

Difficulties reproducing high-performing experiments
Challenges sharing or verifying experiments among collaborators
Uncertainty about which model version was deployed

MLflow provides the following benefits to address these challenges:

Reproducibility: Saves parameters, code, and environment to recreate results consistently
Organized experiment management: Compare and analyze hundreds of results at a glance
Deployment integration: Manage model versions for serving and easy rollback
Collaboration support: Share and review experiments with team members easily

MLflow Tracking Example

The example below shows how to apply MLflow Tracking in a simple classification model using PyTorch.


import mlflow
import mlflow.pytorch
import torch
import torch.nn as nn
import torch.optim as optim

class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(10, 2)

    def forward(self, x):
        return self.fc(x)

model = SimpleNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

X = torch.randn(100, 10)
y = torch.randint(0, 2, (100,))

with mlflow.start_run():
    mlflow.log_param("learning_rate", 0.01)

    for epoch in range(5):
        optimizer.zero_grad()
        outputs = model(X)
        loss = criterion(outputs, y)
        loss.backward()
        optimizer.step()

        mlflow.log_metric("loss", loss.item(), step=epoch)

    mlflow.pytorch.log_model(model, "model")

The above code logs the loss at each epoch and saves the trained model to MLflow. The MLflow UI allows visualization and comparison of multiple experiments.

Using with PyTorch Lightning

MLflow integrates well with PyTorch Lightning. Below is an example using LightningModule and MLFlowLogger to track experiments.


import torch
import torch.nn as nn
import mlflow
import mlflow.pytorch
import pytorch_lightning as pl
from pytorch_lightning.loggers import MLFlowLogger
from torch.utils.data import DataLoader, TensorDataset
from sklearn.metrics import accuracy_score
import numpy as np

# Define Lightning model
class LitModel(pl.LightningModule):
    def __init__(self, input_dim=10, output_dim=2, lr=0.01, batch_size=16, max_epochs=5):
        super().__init__()
        self.save_hyperparameters()  # Log hyperparameters
        self.model = nn.Linear(input_dim, output_dim)
        self.criterion = nn.CrossEntropyLoss()

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self.forward(x)
        loss = self.criterion(logits, y)
        preds = torch.argmax(logits, dim=1)
        acc = accuracy_score(y.cpu(), preds.cpu())
        self.log("train_loss", loss, on_epoch=True)
        self.log("train_acc", acc, on_epoch=True)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.hparams.lr)

# Create dummy data
X = torch.randn(100, 10)
y = torch.randint(0, 2, (100,))
dataloader = DataLoader(TensorDataset(X, y), batch_size=16)

# Start MLflow experiment
with mlflow.start_run() as run:
    # Hyperparams to log
    hparams = {
        "input_dim": 10,
        "output_dim": 2,
        "lr": 0.01,
        "batch_size": 16,
        "max_epochs": 5
    }
    mlflow.log_params(hparams)

    # Set logger with run ID to bind logs correctly
    mlflow_logger = MLFlowLogger(experiment_name="lightning_exp", run_id=run.info.run_id)

    # Train model
    model = LitModel(**hparams)
    trainer = pl.Trainer(max_epochs=hparams["max_epochs"], logger=mlflow_logger)
    trainer.fit(model, dataloader)

    # Evaluate on training set to log final accuracy
    model.eval()
    with torch.no_grad():
        all_preds = []
        all_labels = []
        for xb, yb in dataloader:
            preds = torch.argmax(model(xb), dim=1)
            all_preds.append(preds)
            all_labels.append(yb)
        all_preds = torch.cat(all_preds)
        all_labels = torch.cat(all_labels)
        final_acc = accuracy_score(all_labels.numpy(), all_preds.numpy())
        mlflow.log_metric("final_train_accuracy", final_acc)

    # Input example and model signature
    input_example = torch.randn(1, 10)
    signature = mlflow.models.infer_signature(input_example.numpy(), model(input_example).detach().numpy())

    # Save model with metadata
    mlflow.pytorch.log_model(
        pytorch_model=model.model,  # Only the inner torch model
        artifact_path="model",
        input_example=input_example.numpy(),
        signature=signature
    )

When using PyTorch Lightning as the training backend, you can log information such as the dataset, hyperparameters, and metrics to MLflow automatically without explicitly specifying them by using mlflow.pytorch.autolog(). In that case, the above code can be simplified as follows.


import torch
import torch.nn as nn
import mlflow
import mlflow.pytorch
import pytorch_lightning as pl
from torch.utils.data import DataLoader, TensorDataset
from sklearn.metrics import accuracy_score
import numpy as np

mlflow.pytorch.autolog()
# Create a new MLflow Experiment
mlflow.set_experiment("lion_cheetah2")


# Define Lightning model
class LitModel(pl.LightningModule):
    def __init__(self, input_dim=10, output_dim=2, lr=0.01, batch_size=16, max_epochs=5):
        super().__init__()
        self.save_hyperparameters()  # Log hyperparameters
        self.model = nn.Linear(input_dim, output_dim)
        self.criterion = nn.CrossEntropyLoss()

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self.forward(x)
        loss = self.criterion(logits, y)
        preds = torch.argmax(logits, dim=1)
        acc = accuracy_score(y.cpu(), preds.cpu())
        self.log("train_loss", loss, on_epoch=True)
        self.log("train_acc", acc, on_epoch=True)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.hparams.lr)

# Create dummy data
X = torch.randn(100, 10)
y = torch.randint(0, 2, (100,))
dataloader = DataLoader(TensorDataset(X, y), batch_size=16)

# Start MLflow experiment
with mlflow.start_run() as run:
    # Hyperparams to log
    hparams = {
        "input_dim": 10,
        "output_dim": 2,
        "lr": 0.01,
        "batch_size": 16,
        "max_epochs": 5
    }

    # Train model
    model = LitModel(**hparams)
    trainer = pl.Trainer(max_epochs=hparams["max_epochs"])
    trainer.fit(model, dataloader)

    # Evaluate on training set to log final accuracy
    model.eval()
    with torch.no_grad():
        all_preds = []
        all_labels = []
        for xb, yb in dataloader:
            preds = torch.argmax(model(xb), dim=1)
            all_preds.append(preds)
            all_labels.append(yb)
        all_preds = torch.cat(all_preds)
        all_labels = torch.cat(all_labels)
        final_acc = accuracy_score(all_labels.numpy(), all_preds.numpy())

Monitoring Experiments with MLflow UI

MLflow provides a web-based UI to visually explore and compare experiment logs. Launch it with the following command:

mlflow ui --port 5000

Visit http://localhost:5000 in your browser to use features such as:

Inspect parameters, metrics, and model artifacts
Compare experiments using graphs
Download and reuse models

MLflow Result:

Using MLflow Model Registry

Systematic model versioning is essential for deploying or rolling back models in production. MLflow Model Registry supports this process. Here's an example of registering and transitioning a model version:


from mlflow.tracking import MlflowClient

client = MlflowClient()
model_uri = "runs://model"

model_version = client.create_model_version(
    name="MyModel",
    source=model_uri,
    run_id=""
)

client.transition_model_version_stage(
    name="MyModel",
    version=model_version.version,
    stage="Production"
)

This enables easy tracking and movement of models across stages like Staging, Testing, and Production.

Conclusion

MLflow is a powerful tool for managing deep learning and machine learning experiments in a transparent and organized way. It integrates training, comparison, storage, and deployment into one workflow, enhancing reproducibility and collaboration. Widely adopted as a key MLOps component, MLflow is especially useful in automating the pipeline from model development to serving.

References

Understanding SentencePiece: A Language-Independent Tokenizer for AI Engineers

In the realm of Natural Language Processing (NLP), tokenization plays a pivotal role in preparing text data for machine learning models. Traditional tokenization methods often rely on language-specific rules and pre-tokenized inputs, which can be limiting when dealing with diverse languages and scripts. Enter SentencePiece—a language-independent tokenizer and detokenizer designed to address these challenges and streamline the preprocessing pipeline for neural text processing systems. What is SentencePiece? SentencePiece is an open-source tokenizer and detokenizer developed by Google, tailored for neural-based text processing tasks such as Neural Machine Translation (NMT). Unlike conventional tokenizers that depend on whitespace and language-specific rules, SentencePiece treats the input text as a raw byte sequence, enabling it to process languages without explicit word boundaries, such as Japanese, Chinese, and Korean. This approach allows SentencePiece to train subword models di...

AI Practitioner

Search This Blog