🤖 Machine Learning

Quick references for Machine Learning

🏠 Home

🔍

🌙

📚 Fundamentals

Learning Types

▼

Supervised Learning

Learn from labeled data to predict outcomes

Uses: You have input-output pairs

Examples:

Classification (spam detection)
Regression (price prediction)

Algorithms:

Linear Regression
Decision Trees
Neural Networks

Unsupervised Learning

Discover patterns in unlabeled data

Uses: Exploring data structure

Examples:

Customer segmentation
Anomaly detection

Algorithms:

K-means Clustering
DBSCAN
Autoencoders

Reinforcement Learning

Learn through trial-and-error with rewards

Uses: Sequential decision-making needed

Examples:

Game AI (AlphaGo)
Robotic control

Algorithms:

Q-Learning
Deep Q-Networks (DQN)
Multi-Agent RL

Train/Validation/Test Split

▼

Why Split Data?

Prevent overfitting and get honest performance estimates.

Typical Split

Training (60-80%): Learn patterns
Validation (10-20%): Tune hyperparameters
Test (10-20%): Final performance evaluation

Quick Code

from sklearn.model_selection import train_test_split

# Split into train and temp
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Split temp into validation and test
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=42
)

Rule: Never train on test data!

Bias-Variance Tradeoff

▼

The Balance

Total Error = Bias² + Variance + Irreducible Error

High Bias (Underfitting)

Model too simple
Poor performance on training AND test data
Fix: Add features, increase model complexity, reduce regularization

High Variance (Overfitting)

Model too complex
Great on training, poor on test data
Fix: Get more data, reduce features, increase regularization, use ensemble methods

Sweet Spot

Balance both to minimize total error. Use cross-validation to find it!

Goal: Generalize well to new data

Overfitting/Underfitting

▼

How to Detect

Underfitting:

Training accuracy is low (<80%)
Validation accuracy similar to training
Learning curves plateau early

Overfitting:

Training accuracy very high (>95%)
Large gap between training and validation accuracy
Validation loss increases while training loss decreases

Solutions

For Underfitting: More features, complex model, less regularization

For Overfitting: More data, dropout, early stopping, regularization (L1/L2)

Monitor both train and validation metrics

🔧 Key Algorithms

Linear/Logistic Regression

▼

Linear Regression

Predicts continuous values: y = mx + b

When: Linear relationship between features and target
Assumptions: Linearity, independence, homoscedasticity, normality
Pros: Fast, interpretable, works with small data
Cons: Assumes linearity, sensitive to outliers

Logistic Regression

Binary classification using sigmoid function

When: Binary outcomes (yes/no, 0/1)
Output: Probability between 0 and 1
Pros: Probabilistic output, fast, interpretable
Cons: Linear decision boundary

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

Decision Trees/Random Forests

▼

Decision Trees

Tree structure of if-else decisions

Pros: Easy to interpret, handles non-linear relationships, no scaling needed
Cons: Prone to overfitting, unstable (small changes → different tree)

Random Forests

Ensemble of many decision trees (bagging)

How: Build multiple trees on random subsets, average predictions
Pros: Reduces overfitting, handles missing values, feature importance
Cons: Less interpretable, slower than single tree
Best for: Tabular data, when you need robust performance

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(n_estimators=100, max_depth=10)
rf.fit(X_train, y_train)

# Feature importance
importances = rf.feature_importances_

Great baseline model for tabular data

Support Vector Machines

▼

Core Concept

Find the hyperplane that maximizes margin between classes

The Kernel Trick

Transform data to higher dimensions without computing coordinates

Linear: For linearly separable data
RBF (Radial Basis Function): Most common, handles non-linear
Polynomial: For polynomial relationships

When to Use

High-dimensional spaces (text, images)
Clear margin of separation
Small to medium datasets

Pros & Cons

Pros: Effective in high dimensions, memory efficient

Cons: Slow on large datasets, requires feature scaling

from sklearn.svm import SVC

svm = SVC(kernel='rbf', C=1.0, gamma='scale')
svm.fit(X_train, y_train)

Best for: Text classification, image recognition

Neural Networks

▼

Architecture Components

Input Layer: Receives features
Hidden Layers: Learn representations (deep = many layers)
Output Layer: Produces predictions
Activation Functions: ReLU (hidden), Sigmoid/Softmax (output)

Key Concepts

Backpropagation: Update weights using gradient descent
Learning Rate: How big each update step is (0.001-0.01 typical)
Epochs: Full passes through training data
Batch Size: Samples processed before updating weights

When to Use

Complex patterns, images, text, audio, large datasets

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential([
    Dense(64, activation='relu', input_shape=(10,)),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(X_train, y_train, epochs=50, batch_size=32)

Deep learning powerhouse

k-NN, k-Means, Naive Bayes

▼

k-Nearest Neighbors (k-NN)

Classify based on k closest training examples

Pros: Simple, no training phase, works for multi-class
Cons: Slow prediction, sensitive to scale and irrelevant features
Tip: Always scale features, try k=3,5,7

k-Means Clustering

Partition data into k clusters (unsupervised)

How: Assign points to nearest centroid, update centroids, repeat
Use for: Customer segmentation, data compression
Choosing k: Elbow method (plot within-cluster sum of squares)

Naive Bayes

Probabilistic classifier using Bayes' theorem

Assumption: Features are independent (rarely true but works anyway)
Best for: Text classification (spam detection, sentiment)
Pros: Fast, works with small data, handles high dimensions

# k-NN
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5)

# k-Means
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)

# Naive Bayes
from sklearn.naive_bayes import GaussianNB
nb = GaussianNB()

Deep Learning Frameworks

▼

TensorFlow/Keras

Google's production-ready deep learning framework

Best for: Production deployment, mobile (TensorFlow Lite), research
Pros: Industry standard, excellent documentation, TensorBoard visualization
Keras: High-level API for TensorFlow (easy to use)
Use when: Need production deployment, mobile apps, or serving at scale

import tensorflow as tf
from tensorflow import keras

# Sequential API (simple)
model = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(10,)),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    callbacks=[keras.callbacks.EarlyStopping(patience=5)]
)

# Functional API (complex architectures)
inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(64, activation='relu')(inputs)
x = keras.layers.Dense(32, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

PyTorch

Facebook's research-focused deep learning framework

Best for: Research, experimentation, dynamic models
Pros: Pythonic, dynamic computation graphs, easier debugging
Popular in: Academic research, NLP (Hugging Face), computer vision
Use when: Need flexibility, research, or custom architectures

import torch
import torch.nn as nn
import torch.optim as optim

# Define model
class NeuralNet(nn.Module):
    def __init__(self):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(10, 64)
        self.fc2 = nn.Linear(64, 32)
        self.fc3 = nn.Linear(32, 1)
        self.dropout = nn.Dropout(0.2)
        
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.dropout(x)
        x = torch.relu(self.fc2(x))
        x = torch.sigmoid(self.fc3(x))
        return x

model = NeuralNet()
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(50):
    model.train()
    optimizer.zero_grad()
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    loss.backward()
    optimizer.step()

TensorFlow vs PyTorch

Aspect	TensorFlow	PyTorch
Ease of Use	Keras makes it easy	More Pythonic, intuitive
Learning Curve	Moderate	Easier for Python devs
Deployment	Excellent (TF Serving, Lite)	Good (TorchServe)
Research	Good	Dominant in academia
Debugging	Harder (static graphs)	Easier (dynamic graphs)
Community	Large, industry-focused	Large, research-focused

Common Use Cases

Computer Vision: Both (PyTorch slightly preferred)
NLP: PyTorch (Hugging Face Transformers)
Production/Mobile: TensorFlow
Research Papers: PyTorch
Time Series: Both

Key Libraries

TensorFlow: Keras, TensorBoard, TF Data, TF Lite
PyTorch: torchvision, torchtext, Lightning (wrapper)
Both: ONNX (model interchange format)

Start with Keras for simplicity, PyTorch for research

📊 Model Evaluation

Classification Metrics

▼

Accuracy

Correct predictions / Total predictions

When: Balanced classes
Misleading when: Imbalanced data (e.g., 95% class A, 5% class B)

Precision

True Positives / (True Positives + False Positives)

Question: Of predicted positives, how many are correct?
Use when: False positives are costly (spam filter)

Recall (Sensitivity)

True Positives / (True Positives + False Negatives)

Question: Of actual positives, how many did we catch?
Use when: False negatives are costly (disease detection)

F1-Score

Harmonic mean of precision and recall: 2 × (Precision × Recall) / (Precision + Recall)

Use when: Balance between precision and recall matters

ROC-AUC

Area Under the Receiver Operating Characteristic curve

Plots True Positive Rate vs False Positive Rate
AUC = 1.0: Perfect classifier
AUC = 0.5: Random guessing
Use when: Comparing models across thresholds

from sklearn.metrics import classification_report, roc_auc_score

print(classification_report(y_test, y_pred))
auc = roc_auc_score(y_test, y_pred_proba)

Choose metric based on business impact

Regression Metrics

▼

Mean Squared Error (MSE)

Average of squared differences: Σ(actual - predicted)² / n

Penalizes large errors heavily
Same units as target variable squared

Root Mean Squared Error (RMSE)

Square root of MSE: √MSE

Same units as target variable
Most common regression metric
More interpretable than MSE

Mean Absolute Error (MAE)

Average of absolute differences: Σ|actual - predicted| / n

Less sensitive to outliers than MSE/RMSE
Same units as target variable
More robust metric

R² (Coefficient of Determination)

Proportion of variance explained: 1 - (SS_res / SS_tot)

R² = 1.0: Perfect predictions
R² = 0.0: As good as predicting mean
Can be negative for bad models
Scale-independent (compare across datasets)

from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

RMSE for magnitude, R² for model quality

Cross-Validation

▼

Why Cross-Validation?

Get more reliable performance estimates using all data for both training and validation

k-Fold Cross-Validation

Split data into k folds (typically k=5 or 10)
Train on k-1 folds, validate on remaining fold
Repeat k times, average results
Pros: Every sample used for both training and validation

Stratified k-Fold

Maintains class distribution in each fold
Use for: Imbalanced classification problems

Leave-One-Out (LOO)

k = n (number of samples)
Use for: Very small datasets
Con: Computationally expensive

Time Series Split

Respects temporal ordering
Critical for: Sequential data (stocks, sales)

from sklearn.model_selection import cross_val_score, StratifiedKFold

# Simple k-fold
scores = cross_val_score(model, X, y, cv=5)
print(f"Accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")

# Stratified k-fold
skf = StratifiedKFold(n_splits=5, shuffle=True)
scores = cross_val_score(model, X, y, cv=skf)

Always use CV for model selection

Confusion Matrix

▼

The Matrix

                Predicted
                 Pos    Neg
Actual  Pos     TP     FN
        Neg     FP     TN

Understanding Each Cell

True Positive (TP): Correctly predicted positive
True Negative (TN): Correctly predicted negative
False Positive (FP): Incorrectly predicted positive (Type I error)
False Negative (FN): Incorrectly predicted negative (Type II error)

What to Look For

High FP? Model too aggressive (reduce threshold)
High FN? Model too conservative (increase threshold)
Imbalanced diagonal? Class imbalance or poor model

Quick Code

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
plt.show()

Always visualize your confusion matrix

🔄 Data Preprocessing

Feature Scaling

▼

Why Scale?

Algorithms using distances (k-NN, SVM, Neural Networks) are sensitive to feature magnitude

Normalization (Min-Max Scaling)

Scale to [0, 1]: (x - min) / (max - min)

Use when: Bounded range needed, distribution not Gaussian
Sensitive to: Outliers

Standardization (Z-score)

Scale to mean=0, std=1: (x - mean) / std

Use when: Features roughly Gaussian, algorithm assumes this
Better for: Algorithms with no bounded range assumption
More robust to: Outliers (compared to normalization)

Robust Scaling

Use median and IQR: (x - median) / IQR

Use when: Heavy outliers present

When NOT to Scale

Tree-based models (Random Forest, XGBoost)
Already on same scale

from sklearn.preprocessing import StandardScaler, MinMaxScaler

# Standardization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)  # Use same params!

# Normalization
normalizer = MinMaxScaler()
X_train_norm = normalizer.fit_transform(X_train)

⚠️ Fit only on training data!

Handling Missing Data

▼

Detection

df.isnull().sum()  # Count missing per column
df.isnull().sum() / len(df) * 100  # Percentage

Strategy 1: Delete

Drop rows: When <5% rows affected
Drop columns: When >50% values missing
Risk: Lose valuable information

Strategy 2: Imputation

Mean/Median:

Use mean for normal distribution
Use median for skewed or with outliers

Mode:

For categorical variables

Forward/Backward Fill:

For time series data

KNN Imputation:

Use similar samples to estimate
More sophisticated but slower

Strategy 3: Add Indicator

Create binary "was_missing" column
Preserves information about missingness

from sklearn.impute import SimpleImputer, KNNImputer

# Mean imputation
imputer = SimpleImputer(strategy='mean')
X_imputed = imputer.fit_transform(X)

# KNN imputation
knn_imputer = KNNImputer(n_neighbors=5)
X_imputed = knn_imputer.fit_transform(X)

Understand WHY data is missing

Encoding Categorical Variables

▼

Label Encoding

Convert categories to integers: Red→0, Blue→1, Green→2

Use for: Ordinal data (Low, Medium, High)
Don't use for: Nominal data (implies ordering)

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['color_encoded'] = le.fit_transform(df['color'])

One-Hot Encoding

Create binary column for each category

Use for: Nominal data with few categories (<20)
Pros: No artificial ordering
Cons: High dimensionality with many categories

import pandas as pd

# Pandas
df_encoded = pd.get_dummies(df, columns=['color'], drop_first=True)

# Scikit-learn
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(drop='first', sparse=False)
encoded = ohe.fit_transform(df[['color']])

Target Encoding

Replace category with mean of target variable

Use for: High cardinality features (zip codes, user IDs)
Risk: Can cause overfitting (use smoothing/cross-validation)

Frequency Encoding

Replace with frequency/count of each category

Simple and effective for high cardinality

Drop first column to avoid multicollinearity

Feature Engineering Tips

▼

Create New Features

Interactions: Feature1 × Feature2 (e.g., income × age)
Polynomials: x², x³ for non-linear relationships
Ratios: price/sqft, sales/employees
Aggregations: sum, mean, std of related features

Time-Based Features

Hour, day of week, month, quarter
Is weekend? Is holiday?
Days since last event
Cyclical encoding (sin/cos for hours, months)

Text Features

Length of text
Number of words, sentences
Presence of keywords
Sentiment scores
TF-IDF for important terms

Domain-Specific

Use domain knowledge to create meaningful features
Example (housing): age of house, distance to city center
Example (finance): moving averages, volatility

Feature Selection

Remove low variance features
Remove highly correlated features (>0.95)
Use feature importance from tree models
Recursive Feature Elimination (RFE)
L1 regularization (Lasso)

from sklearn.feature_selection import SelectKBest, f_classif

# Select top k features
selector = SelectKBest(f_classif, k=10)
X_selected = selector.fit_transform(X, y)

# From tree model
feature_imp = pd.DataFrame({
    'feature': X.columns,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=False)

Domain knowledge > automated methods

💡 Practical Tips

Hyperparameter Tuning

▼

Grid Search

Try every combination of specified parameters

Pros: Exhaustive, guaranteed to find best in grid
Cons: Exponentially slow with more parameters
Use when: Few parameters, small ranges

from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

grid_search = GridSearchCV(
    RandomForestClassifier(),
    param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)

grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_

Random Search

Sample random combinations

Pros: Faster, explores more space
Cons: May miss optimal
Use when: Many parameters, large ranges

from sklearn.model_selection import RandomizedSearchCV

param_dist = {
    'n_estimators': [100, 200, 300, 400],
    'max_depth': [10, 20, 30, 40, None],
    'min_samples_split': [2, 5, 10, 15]
}

random_search = RandomizedSearchCV(
    RandomForestClassifier(),
    param_dist,
    n_iter=20,  # Number of random combinations
    cv=5,
    n_jobs=-1
)

random_search.fit(X_train, y_train)

Key Hyperparameters by Algorithm

Random Forest: n_estimators, max_depth, min_samples_split

SVM: C (regularization), kernel, gamma

Neural Networks: learning_rate, batch_size, hidden_layers, neurons

XGBoost: learning_rate, max_depth, n_estimators, subsample

Start with defaults, then tune most important params

Algorithm Selection Guide

▼

By Problem Type

Binary Classification:

Logistic Regression (baseline)
Random Forest (robust)
XGBoost (high performance)
Neural Networks (complex patterns)

Multi-class Classification:

Random Forest
XGBoost
Naive Bayes (text)

Regression:

Linear Regression (baseline)
Random Forest
XGBoost
Neural Networks

Clustering:

K-Means (spherical clusters)
DBSCAN (arbitrary shapes, outliers)
Hierarchical (dendrograms)

By Data Characteristics

Small Data (<10k samples):

Logistic Regression, Naive Bayes
Simple models to avoid overfitting

Large Data (>100k samples):

Neural Networks, XGBoost
Can learn complex patterns

High Dimensional (many features):

Regularized models (Lasso, Ridge)
Random Forest (handles many features)
Feature selection first

Imbalanced Classes:

Random Forest with class_weight='balanced'
XGBoost with scale_pos_weight
SMOTE for oversampling

Quick Decision Tree

Need interpretability? → Logistic Regression or Decision Tree

Need high accuracy? → XGBoost or Random Forest

Have images/text? → Neural Networks (CNN/RNN)

Limited time? → Start with Random Forest

Always try multiple algorithms

Common Pitfalls & Debugging

▼

Data Leakage

Information from test set leaks into training

Example: Scaling before train/test split
Fix: Always split first, then preprocess
Example: Using future information in time series
Fix: Use time-based split

Class Imbalance

One class dominates dataset (e.g., 95% vs 5%)

Symptom: High accuracy but poor recall on minority class
Solutions:
- Use stratified sampling
- Oversample minority class (SMOTE)
- Undersample majority class
- Use class weights
- Change evaluation metric (F1, AUC instead of accuracy)

Poor Performance Checklist

✓ Check for data leakage
✓ Verify train/test split is correct
✓ Look for missing values
✓ Check feature scaling
✓ Examine class distribution
✓ Plot learning curves (more data needed?)
✓ Try different algorithms
✓ Engineer better features

Model Not Learning

Neural Networks: Learning rate too high/low, bad initialization
All models: Features not informative, need more data

Overfitting Signs

Training accuracy >> test accuracy (gap >10%)
Performance degrades on new data
Model too complex for data size

# Check for data leakage
from sklearn.model_selection import cross_val_score

# If cross-validation score much worse than train score → leakage
cv_scores = cross_val_score(model, X, y, cv=5)
print(f"CV: {cv_scores.mean():.3f}, Train: {train_score:.3f}")

⚠️ Always validate on unseen data

Quick Reference Code

▼

Complete ML Pipeline

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

# 1. Load data
df = pd.read_csv('data.csv')

# 2. Basic exploration
print(df.info())
print(df.describe())
print(df.isnull().sum())

# 3. Prepare features and target
X = df.drop('target', axis=1)
y = df['target']

# 4. Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# 5. Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 6. Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train_scaled, y_train)

# 7. Evaluate
y_pred = model.predict(X_test_scaled)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

# 8. Cross-validation
from sklearn.model_selection import cross_val_score
cv_scores = cross_val_score(model, X_train_scaled, y_train, cv=5)
print(f"CV Score: {cv_scores.mean():.3f} (+/- {cv_scores.std():.3f})")

Pandas Essentials

# Load data
df = pd.read_csv('file.csv')

# Exploration
df.head()
df.shape
df.dtypes
df.describe()
df.isnull().sum()

# Selection
df['column']
df[['col1', 'col2']]
df[df['age'] > 30]

# Missing values
df.dropna()
df.fillna(df.mean())

# Encoding
pd.get_dummies(df, columns=['category'])

# Group by
df.groupby('category')['value'].mean()

Bookmark this for quick reference!

🏗️ MLOps Foundational Practices

📦 Version Control for ML

▼

Git: Code versioning (branches, commits, merges)
DVC: Data Version Control - track datasets and models
MLflow: Experiment tracking, parameter logging
Weights & Biases: Visualization and collaboration
Best Practice: Version data, code, and models together

🔄 Data Pipeline Management

▼

Airflow: Workflow orchestration with DAGs
Prefect: Modern workflow automation
Data Validation: Great Expectations, Pydantic
Feature Engineering: Automated feature stores (Feast)
Pipeline: Ingestion → Validation → Transform → Store

🧪 Model Training & Experimentation

▼

Experiment Tracking: Log metrics, parameters, artifacts
Hyperparameter Tuning: Optuna, Ray Tune, Hyperopt
Reproducibility: Fix random seeds, document environment
Distributed Training: Ray, Horovod for multi-GPU
Checkpointing: Save model states during training

🏷️ Model Registry & Versioning

▼

Centralized Storage: Single source of truth for models
Metadata: Track metrics, parameters, dependencies
Lineage: Data → Training → Model connections
Stages: Development → Staging → Production
Tools: MLflow Registry, Neptune.ai, Weights & Biases

🚀 Model Serving & Deployment

▼

REST APIs: FastAPI, Flask for HTTP endpoints
Batch Inference: Process large datasets offline
Real-time Serving: TensorFlow Serving, TorchServe
Edge Deployment: TensorFlow Lite, ONNX Runtime
Load Balancing: Handle multiple requests efficiently

📊 Monitoring & Observability

▼

Performance Metrics: Accuracy, latency, throughput
Data Drift: Monitor input distribution changes
Concept Drift: Track output/prediction patterns
Alerting: PagerDuty, Opsgenie for anomalies
Tools: Evidently AI, WhyLabs, Prometheus + Grafana

🏗️ Infrastructure as Code

▼

Terraform: Cloud-agnostic infrastructure provisioning
CloudFormation: AWS-specific IaC
Pulumi: IaC using programming languages
Benefits: Reproducible, versionable, auditable
State Management: Track infrastructure changes

🐳 Containerization

▼

Docker: Package code, dependencies, models together
Dockerfile: Define build steps, base image
Multi-stage Builds: Optimize image size
Container Registry: Docker Hub, ECR, GCR
Benefits: Consistency across dev/staging/prod

🔄 CI/CD Workflow

🌳 Source Control & Branching

▼

GitFlow: feature/develop/release/hotfix branches
Trunk-Based: Short-lived branches, frequent merges
Pull Requests: Code review, approval workflows
Branch Protection: Enforce tests, reviews before merge
Merge Strategies: Merge commit, squash, rebase

⚙️ Continuous Integration (CI)

▼

Automated Testing: Run tests on every commit
Linting: flake8, pylint, black for code quality
Code Coverage: pytest-cov, coverage.py
Build Automation: Compile, package, create artifacts
Tools: Jenkins, GitHub Actions, GitLab CI, CircleCI

✅ Automated Testing

▼

Unit Tests: Test individual functions (pytest, unittest)
Integration Tests: Test component interactions
E2E Tests: Test complete workflows (Selenium, Playwright)
Model Tests: Validate predictions, data quality
Test Pyramid: Many unit, some integration, few E2E

📦 Artifact Management

▼

Container Registry: Store Docker images (ECR, GCR, ACR)
Model Registry: Store trained models with metadata
Package Registry: PyPI, npm for dependencies
Versioning: Semantic versioning (v1.2.3)
Caching: Speed up builds with dependency caching

🚢 Continuous Deployment (CD)

▼

Blue-Green: Two identical environments, instant switch
Canary Release: Gradual rollout to subset of users
Rolling Update: Replace instances incrementally
Rollback: Quick revert to previous version
Tools: Spinnaker, ArgoCD, Flux for GitOps

🌍 Environment Management

▼

Dev: Development, rapid iteration, debugging
Staging: Production-like, final testing
Production: Live environment serving users
Parity: Keep environments identical
Secrets: Vault, AWS Secrets Manager, env variables

🔗 Pipeline Orchestration

▼

Multi-stage: Build → Test → Deploy stages
Dependencies: Stage order, parallel execution
Pipeline as Code: YAML (.github/workflows), Jenkinsfile
Triggers: On push, PR, schedule, manual
Artifacts: Pass outputs between stages

📈 Monitoring & Feedback

▼

Pipeline Metrics: Success rate, build time, failure rate
Deployment Tracking: DORA metrics (lead time, frequency)
Failure Analysis: Root cause, trends over time
Notifications: Slack, email, PagerDuty on failures
Rollback Triggers: Auto-rollback on errors