Best Practices Library

A curated collection of frameworks, methodologies, and principles for modern software engineering and migration.

AWS Well-Architected Framework

Amazon Web Services•Nov 30, 2015

A set of cloud design principles and check-lists for building secure, high-performing, resilient, and efficient workloads on AWS.

cloud-architectureresiliencecost-optimization

Azure Well-Architected Framework

Microsoft•Mar 15, 2018

Microsoft’s five-pillar guidance (reliability, security, cost, performance, ops) for designing and operating workloads on Azure.

cloud-architectureperformance-efficiencygovernance

Google Cloud Architecture Framework

Google Cloud•Jul 21, 2020

Prescriptive guidance covering reliability, cost, performance, security, and operational excellence for GCP workloads.

cloud-architecturegcpdesign-principles

CNCF Cloud-Native Definition & Principles

Cloud Native Computing Foundation•Jun 4, 2018

The CNCF’s formal definition of cloud-native computing and core principles for micro-services, containers, and dynamic orchestration.

cloud-nativemicroservicescontainers

Twelve-Factor App Methodology

Heroku (Salesforce)•Jan 1, 2015

Twelve practical guidelines for building modern, portable, cloud-ready web applications.

application-architecturestatelessdevops+1

Google Site Reliability Engineering Practices

Google•Mar 23, 2016

Codified principles (error budgets, toil elimination, SLIs/SLOs) for operating large-scale services reliably.

srereliability-engineeringservice-level-objectives+1

DORA Four Key Metrics

DevOps Research & Assessment (DORA)•Sep 15, 2018

Research-backed metrics (deployment frequency, lead time, MTTR, change failure rate) for high-performing software teams.

devopsperformance-metricssoftware-delivery

CALMS DevOps Principles

DevOps Enterprise Summit (Gene Kim et al.)•Feb 10, 2015

Framework emphasising Culture, Automation, Lean, Measurement, and Sharing as pillars of DevOps success.

devopsculturecontinuous-improvement

GitOps Principles v1

OpenGitOps (CNCF WG)•Jun 9, 2021

Declarative, verifiable and automated operations — using Git as the single source of truth for infra and apps.

gitopscontinuous-deliverykubernetes+1

Strangler Fig Modernization Pattern

ThoughtWorks•May 1, 2015

Incrementally replacing legacy systems by routing new functionality to a new service while ‘strangling’ the old.

modernizationlegacy-migrationincremental-refactor+1

Blue-Green Deployment Strategy

Continuous Delivery Community•Aug 20, 2016

Operating two identical production environments to achieve zero-downtime releases and quick rollbacks.

deployment-strategyzero-downtimerollback+1

Canary Releases Best Practice Guide

Spinnaker Community•Apr 5, 2017

Progressively rolling out new software to a small subset of users to minimise risk before full release.

deployment-strategyprogressive-deliveryrisk-mitigation+1

OWASP Top 10 (2023)

OWASP Foundation•Sep 24, 2023

The ten most critical web application security risks; updated community consensus.

application-securitysecure-codingrisk-management+1

NIST Secure Software Development Framework (SSDF)

NIST•Feb 4, 2022

Guidelines for secure software development practices across the SDLC (SP 800-218).

secure-sdlcfederal-guidancesoftware-security+1

Supply-chain Levels for Software Artifacts (SLSA)

OpenSSF•Jun 17, 2021

End-to-end integrity guarantees for software supply-chain; defines levels 1-4.

supply-chain-securitysbomdevsecops+1

CycloneDX SBOM Specification

OWASP Foundation•Sep 1, 2017

Lightweight Bill-of-Materials standard for software components, vulnerabilities, and licenses.

sbomsoftware-compositionsecurity

Infrastructure-as-Code Security Playbook

HashiCorp & Bridgecrew•Oct 11, 2020

Best practices for securing Terraform, CloudFormation, and ARM templates in CI/CD pipelines.

infrastructure-as-codesecuritydevsecops

Kubernetes Pod Security Standards

Kubernetes SIG Auth•Nov 16, 2021

Baseline, restricted, and privileged policy levels for securing pod workloads.

kubernetes-securitypolicycontainers+1

CNCF Cloud-Native Security Whitepaper

CNCF Security TAG•Nov 18, 2020

Guidance on building, shipping, and running secure cloud-native applications.

cloud-nativesecuritycontainers

OpenTelemetry Instrumentation Guidelines

OpenTelemetry Project•Jun 8, 2022

Best practices for generating consistent traces, metrics, and logs using OpenTelemetry.

observabilitytracingmetrics

RED & USE Monitoring Methodologies

Google SRE & Brendan Gregg•May 1, 2018

Standard approaches for selecting golden signals (Rate-Errors-Duration / Utilisation-Saturation-Errors).

monitoringobservabilitymetrics

Google Web Vitals

Google Chrome Team•May 5, 2020

Core performance metrics (LCP, FID, CLS, INP) for measuring real-world user experience.

web-performanceuser-experiencefrontend

Production-Ready Micro-services Checklist

Susan Fowler•Jul 12, 2017

A checklist covering operability, reliability, deployability, and observability of micro-services.

microservicesoperabilityarchitecture

Data Mesh Principles

ThoughtWorks (Zhamak Dehghani)•May 27, 2020

Domain-oriented, self-serve data infrastructure principles promoting product thinking for data.

data-architecturedistributed-ownershipanalytics+1

dbt Style Guide

dbt Labs•Feb 10, 2021

Community conventions for naming, structuring, and documenting dbt transformation projects.

analytics-engineeringsql-modellingdataops+1

Google API Design Guide

Google•Apr 19, 2022

Opinionated REST and gRPC design rules: resource-oriented URIs, plural nouns, pagination, errors.

api-designrestgrpc

Microsoft REST API Guidelines

Microsoft•Jun 30, 2021

Cross-company REST consistency rules (nouns, verbs, versioning, errors).

api-designrestversioning

Stripe API Versioning Policy

Stripe•Nov 14, 2019

Backwards-compatible evolution strategy and pinned versions for API consumers.

api-versioningproduct-managementsaas+1

Semantic Versioning 2.0.0

SemVer.Org•Jun 20, 2017

Consistent MAJOR.MINOR.PATCH versioning rules for APIs and packages.

versioningpackage-managementrelease-management+1

Conventional Commits Spec

Conventional Commits Initiative•Feb 16, 2019

Machine-readable Git commit messages enabling automated changelogs and semantic releases.

gitrelease-automationsemver+1

Trunk-Based Development Guidelines

Paul Hammant•Jan 10, 2017

Branching strategy promoting short-lived branches, frequent commits to trunk, and feature flags.

ci-cdbranching-strategydevops+1

Feature Flag Best Practices

LaunchDarkly•Sep 3, 2018

Operational guidelines for creating, managing, and retiring feature toggles safely.

feature-flagsrelease-managementprogressive-delivery+1

Shift-Left Testing Manifesto

Testing Community•Apr 22, 2016

Encourages earlier testing (unit, security, performance) in the SDLC to catch defects sooner.

testing-strategyquality-assurancedevops+1

Contract-Driven Development with Pact

Pact Foundation•May 11, 2019

Consumer-driven contract testing methodology to ensure micro-service compatibility.

testingmicroservicescontracts

Chaos Engineering Principles

PrinciplesOfChaos.org•Sep 19, 2017

Run controlled experiments to build confidence in system resilience under turbulent conditions.

resilience-testingsrefailure-mode+1

FinOps Cloud Cost Best Practices

FinOps Foundation•Jun 30, 2020

Shared responsibility model for cloud spend: Inform, Optimize, Operate phases.

cloud-costgovernanceoptimization+1

IBM Garage Methodology

IBM•Apr 1, 2019

End-to-end practices merging agile, DevOps, and design thinking for cloud transformation.

digital-transformationagilecloud-native+1

SAFe Continuous Delivery Pipeline

Scaled Agile Inc.•Oct 2, 2018

Scaled Agile Framework’s model for continuous exploration, integration, deployment, and release on demand.

scaled-agileci-cdvalue-stream

ISO/IEC 27001:2022 Annex A Controls

ISO/IEC JTC 1/SC 27•Oct 25, 2022

Industry baseline for information-security policies and management controls.

information-securitycontrolscompliance

NIST AI Risk Management Framework 1.0

NIST•Jan 26, 2023

Guidelines to integrate trustworthiness considerations into the design, development, and deployment of AI systems.

ai-governancerisk-managementresponsible-ai

EU AI Act (Political Agreement)

European Parliament & Council•Feb 13, 2024

First comprehensive regulatory framework for trustworthy AI in the European Union.

regulationai-governancecompliance

Google Responsible AI Principles

Google•Jun 7, 2018

Seven commitments guiding the ethical development and deployment of AI at Google.

ai-ethicsresponsible-aipolicy+1

Microsoft Responsible AI Standard v2

Microsoft•Jun 21, 2022

Company-wide governance framework translating principles into measurable requirements.

ai-governancepolicyethics

OpenAI Safety & Alignment Best Practices

OpenAI•Mar 14, 2023

Mitigation strategies (RLHF, red-teaming, tiered access) for large language model deployment.

ai-safetyalignmentllm+1

Terraform Module Design Patterns

HashiCorp•Aug 12, 2020

Guidelines for writing reusable, versioned, and documented Terraform modules.

iacterraformmodule-best-practices+1

Helm Chart Best Practices

Helm Maintainers•Oct 5, 2019

Recommendations for structure, naming, versioning, and values of Helm charts.

kubernetespackage-managementhelm+1

Container Image Hardening Guide

CIS & Docker•May 18, 2021

Steps to build minimal, non-root, signed container images with SBOMs.

containerssecurityhardening

Zero Trust Architecture Principles (NIST SP 800-207)

NIST•Aug 11, 2020

Conceptual zero-trust model: continuous verification, least privilege, assume breach.

zero-trustnetwork-securityarchitecture+1

Privacy by Design 7 Principles

International Assembly for Privacy Commissioners•Jan 15, 2018

Framework embedding privacy into systems engineering from the outset.

privacydesign-principlesgdpr+1

Continuous Modernization Playbook

Vibgrate (Draft)•May 1, 2024

Iterative roadmap for refactoring, re-platforming, and replacing legacy systems using automation and AI.

legacy-migrationai-assistedproject-management+1

OWASP Application Security Verification Standard (ASVS)

OWASP Foundation•Oct 1, 2021

A framework of security requirements that defines testable controls for designing, building, and verifying secure web applications and services.

securityapplication-securitysecure-coding+2

OWASP Software Assurance Maturity Model (SAMM)

OWASP Foundation•Jan 31, 2020

A maturity model that helps organizations assess and improve their software security program across governance, design, implementation, verification, and operations.

securitydevsecopsgovernance+2

OWASP API Security Top 10 (2023)

OWASP Foundation•Jun 5, 2023

A ranked list of the most critical security risks specific to APIs, covering broken authorization, authentication, and unsafe resource consumption.

securityapi-securityapplication-security+2

OWASP Mobile Application Security Verification Standard (MASVS)

OWASP Foundation•Apr 21, 2023

A standard of security requirements for mobile apps, covering storage, cryptography, authentication, network communication, and platform interaction.

securityapplication-securitymobile-security+2

CWE Top 25 Most Dangerous Software Weaknesses

MITRE Corporation•Jun 29, 2023

An annually updated list of the most common and impactful software weaknesses, derived from real-world vulnerability data, to guide prevention and prioritization.

securityapplication-securitysecure-coding+2

NIST Cybersecurity Framework 2.0

National Institute of Standards and Technology•Feb 26, 2024

A voluntary framework of cybersecurity outcomes organized into six functions, govern, identify, protect, detect, respond, and recover, for managing organizational cyber risk.

securitycompliancerisk-management+2

NIST SP 800-53 Security and Privacy Controls

National Institute of Standards and Technology•Sep 23, 2020

A comprehensive catalog of security and privacy controls for information systems, organized into control families with baselines for different risk levels.

securitycompliancerisk-management+2

CIS Critical Security Controls v8

Center for Internet Security•May 18, 2021

A prioritized set of 18 safeguards and implementation groups that defend against the most common cyber attacks, mapped to other major frameworks.

securitycompliancerisk-management+2

Microsoft Security Development Lifecycle (SDL)

Microsoft•Jan 1, 2004

A set of security practices integrated across every phase of software development, from training and design through implementation, verification, and response.

securitydevsecopssecure-coding+2

STRIDE Threat Modeling

Microsoft•Jan 1, 1999

A structured method for finding security threats by category, spoofing, tampering, repudiation, information disclosure, denial of service, and elevation of privilege.

securitythreat-modelingsecure-design+2

OWASP Secure Headers Project

OWASP Foundation•Jan 1, 2020

Guidance and recommended values for HTTP response security headers that harden web applications against common client-side attacks.

securityapplication-securityweb-security+2

Sigstore Keyless Signing

Open Source Security Foundation (OpenSSF)•Mar 9, 2021

An open standard for signing software artifacts using short-lived certificates tied to identity, removing the burden of managing long-lived private keys.

securitysupply-chain-securitysigning+2

in-toto Supply Chain Attestation

Open Source Security Foundation (OpenSSF)•Aug 14, 2019

A framework that secures the software supply chain by cryptographically verifying that each step in the build and release process was performed as intended.

securitysupply-chain-securityattestation+2

Secrets Management Best Practices

OWASP Foundation•Jan 1, 2021

Practices for storing, rotating, and accessing credentials and keys securely, keeping them out of source code and limiting their exposure.

securityidentity-authsecrets-management+2

Principle of Least Privilege

OWASP Foundation•Jan 1, 1975

A security principle that grants every user, service, and process only the minimum access required to perform its function, and no more.

securityidentity-authaccess-control+2

Continuous Integration Best Practices

Martin Fowler (ThoughtWorks)•May 1, 2006

A development practice where engineers merge code into a shared mainline many times a day, each merge verified by an automated build and test suite.

ci-cdcontinuous-integrationautomated-testing+2

Continuous Delivery

Jez Humble and David Farley•Aug 1, 2010

A discipline where software is built so it can be released to production safely at any time, with every change proven release-ready by an automated pipeline.

ci-cdcontinuous-deliverydeployment-pipeline+2

Deployment Pipeline Pattern

Jez Humble and David Farley•Aug 1, 2010

An automated, staged path that takes every code change from commit through build, tests, and successive environments, providing a single auditable route to production.

ci-cddeployment-pipelineautomation+2

GitHub Flow

GitHub•Aug 31, 2011

A lightweight, branch-based workflow built around short-lived feature branches, pull requests, and continuous deployment from a single always-deployable main branch.

version-controlgithub-flowpull-request+2

GitFlow Branching Model

Vincent Driessen•Jan 5, 2010

A structured Git branching model using long-lived main and develop branches plus dedicated feature, release, and hotfix branches to coordinate scheduled releases.

version-controlgitflowbranching-strategy+2

Expand and Contract Database Migration Pattern

Martin Fowler / Pramod Sadalage•Jan 1, 2014

A zero-downtime schema change technique that adds new structures, migrates reads and writes in phases, then removes the old structures once nothing depends on them.

databaseexpand-and-contractschema-migration+2

Immutable Infrastructure

HashiCorp•Jun 23, 2013

An operations model where servers and components are never modified after deployment; changes ship as freshly built, versioned replacements rather than in-place edits.

infrastructureimmutable-infrastructureinfrastructure-as-code+2

Pipeline as Code

Jenkins (CloudBees)•May 2, 2016

Defining CI/CD pipelines in version-controlled configuration files stored alongside the application, so the delivery process is reviewable, reproducible, and auditable.

ci-cdpipeline-as-codeinfrastructure-as-code+2

Artifact Repository Management

JFrog•Aug 1, 2010

The practice of storing, versioning, and governing build artifacts and dependencies in a dedicated repository so the same trusted binary is promoted from build to production.

ci-cdartifact-repositorysupply-chain-security+2

Deployment Rings

Microsoft•Nov 13, 2018

A progressive rollout strategy that releases changes to expanding audience groups, or rings, validating each ring before exposing the next to limit blast radius.

deploymentdeployment-ringsprogressive-delivery+2

Dark Launching

Facebook (Meta) Engineering•May 1, 2009

Deploying new functionality to production in a hidden state and exercising it with real traffic before exposing it to users, to validate behavior and capacity safely.

deploymentdark-launchingprogressive-delivery+2

Reproducible Builds

Reproducible Builds Project•Jan 1, 2013

A set of practices ensuring a given source plus build environment always produces bit-for-bit identical binaries, so anyone can independently verify what shipped.

ci-cdreproducible-buildssupply-chain-security+2

Pre-Commit Hooks Automation

pre-commit (Anthony Sottile)•Dec 1, 2014

Automating checks such as formatting, linting, and secret scanning that run on every Git commit, catching issues locally before they ever reach the shared repository.

version-controlpre-commit-hooksautomation+2

Release Train Model

Scaled Agile, Inc.•Jan 1, 2011

A delivery cadence where releases ship on a fixed schedule and any change not ready in time simply catches the next train, decoupling release timing from feature completion.

software-processrelease-trainrelease-management+2

Configuration as Code

ThoughtWorks•Jan 1, 2016

Managing application and system configuration in version-controlled, machine-readable files instead of manual settings, making configuration reviewable, auditable, and reproducible.

devopsconfiguration-as-codeinfrastructure-as-code+2

Service Level Objectives (SLOs)

Google SRE•Aug 1, 2018

A target reliability level for a service, expressed as a measurable percentage of good events over a window, used to balance reliability against feature velocity.

observabilitysloreliability+2

Error Budgets

Google SRE•Apr 1, 2016

The allowed amount of unreliability derived from an SLO (100% minus the target), spent deliberately to balance new features against reliability work.

observabilityerror-budgetslo+2

The Four Golden Signals

Google SRE•Apr 1, 2016

Google SRE's four core metrics for monitoring a user-facing system: latency, traffic, errors, and saturation.

observabilitygolden-signalsmonitoring+2

OpenTelemetry Semantic Conventions

OpenTelemetry (CNCF)•Nov 1, 2023

Standardized names and attributes for telemetry (spans, metrics, logs) so observability data is consistent and portable across tools and languages.

observabilityopentelemetrysemantic-conventions+2

Structured Logging

OpenTelemetry (CNCF)•Jan 1, 2021

Emitting logs as machine-parseable key-value records (typically JSON) with consistent fields, so logs can be searched, filtered, and correlated at scale.

observabilitystructured-logginglogging+2

Distributed Tracing Best Practices

OpenTelemetry (CNCF)•Feb 1, 2022

Techniques for instrumenting and propagating trace context across services so requests can be followed end-to-end, with sampling and span design that aid debugging.

observabilitydistributed-tracingtracing+2

Prometheus Monitoring Best Practices

Prometheus (CNCF)•Aug 9, 2018

Guidance for naming metrics, controlling label cardinality, and writing alerting rules in Prometheus, the CNCF metrics and alerting system.

observabilityprometheusmetrics+2

Symptom-Based Alerting

Google SRE•Apr 1, 2016

Alerting on user-visible symptoms (errors, latency, SLO burn) rather than internal causes, to reduce noise and page only on things that matter.

observabilityalertingsymptom-based+2

Incident Management Best Practices

PagerDuty•Jan 1, 2021

A structured process for detecting, coordinating, and resolving outages with clear roles, communication, and severity levels to restore service quickly.

observabilityincident-managementreliability+2

Blameless Postmortems

Google SRE•Apr 1, 2016

Post-incident reviews focused on systemic causes and learning rather than individual blame, producing concrete action items to prevent recurrence.

observabilitypostmortemblameless+2

On-Call Best Practices

PagerDuty•Jan 1, 2021

Sustainable on-call practices covering rotation design, escalation, actionable alerts, runbooks, and workload limits to keep services reliable without burning out engineers.

devopson-callreliability+2

Runbook Automation

PagerDuty•Jan 1, 2021

Codifying operational procedures as automated, repeatable workflows so common incident responses and maintenance tasks run reliably with less manual toil.

devopsrunbook-automationautomation+2

Capacity Planning

Google SRE•Apr 1, 2016

Forecasting future demand and provisioning resources ahead of need, combining organic growth, launches, and headroom to avoid both outages and waste.

infrastructurecapacity-planningreliability+2

Toil Reduction

Google SRE•Apr 1, 2016

Systematically identifying and eliminating repetitive, manual, automatable operational work so engineers can spend time on durable engineering instead.

devopstoil-reductionautomation+2

Observability-Driven Development

CNCF•May 1, 2022

Building instrumentation into software as a first-class part of development so engineers can ask new questions of production behavior without shipping new code.

observabilityobservability-driven-developmenttelemetry+2

AWS Well-Architected Sustainability Pillar

Amazon Web Services•Dec 2, 2021

AWS guidance for reducing the environmental impact of cloud workloads by maximizing utilization, right-sizing, and choosing efficient regions, services, and hardware.

cloud-architecturesustainabilitycost-optimization+2

Cloud Migration 7 Rs Strategy

Amazon Web Services•Jun 1, 2021

A decision framework for choosing how to migrate each application to the cloud across seven options: retire, retain, rehost, relocate, repurchase, replatform, and refactor.

cloudcloud-migrationmigration+2

Cloud Landing Zone

Amazon Web Services•Jun 1, 2018

A pre-configured, secure, multi-account cloud foundation with baked-in identity, networking, governance, and guardrails so teams can deploy workloads safely at scale.

cloud-architecturegovernanceinfrastructure-as-code+2

CQRS (Command Query Responsibility Segregation)

Martin Fowler•Jul 14, 2011

An architecture pattern that separates the model that writes data (commands) from the model that reads it (queries), allowing each side to scale and evolve independently.

microservicesarchitecturedata-architecture+2

Event Sourcing

Martin Fowler•Dec 12, 2005

An architecture pattern that stores every change to application state as an immutable sequence of events, making the event log the source of truth instead of current state.

microservicesarchitecturedata-architecture+2

Saga Pattern

Microsoft•Jan 1, 2018

A pattern for managing data consistency across microservices using a sequence of local transactions coordinated by events or a central orchestrator, with compensating actions on failure.

microservicesarchitectureresilience+2

Circuit Breaker Pattern

Martin Fowler•Mar 6, 2014

A resilience pattern that stops calls to a failing dependency once errors cross a threshold, preventing cascading failures and giving the dependency time to recover.

microservicesresiliencereliability+2

Bulkhead Pattern

Microsoft•Jun 23, 2018

A resilience pattern that isolates resources into separate pools so a failure or overload in one part of a system cannot consume the resources others depend on.

microservicesresiliencereliability+2

Backends for Frontends (BFF)

Sam Newman•Nov 18, 2015

An architecture pattern that gives each frontend client its own tailored backend service, instead of forcing web, mobile, and other clients to share one general-purpose API.

api-designmicroservicesarchitecture+2

API Gateway Pattern

Microsoft•Sep 19, 2018

An architecture pattern that places a single entry point in front of backend services to handle routing, authentication, rate limiting, and other cross-cutting concerns.

api-designmicroservicesarchitecture+2

Service Mesh Best Practices

Cloud Native Computing Foundation•Apr 1, 2020

Guidance for using a service mesh to manage service-to-service traffic, security, and observability through sidecar proxies, keeping that logic out of application code.

microservicescloud-nativeobservability+2

Sidecar Pattern

Microsoft•Jun 23, 2018

A design pattern that deploys a helper component alongside the main application in the same unit, adding capabilities like proxying, logging, or config without changing the app.

containersmicroservicesarchitecture+2

Cell-Based Architecture

Amazon Web Services•Sep 1, 2023

An architecture that partitions a system into independent, self-contained cells, each serving a subset of traffic, to limit blast radius and scale through replication.

cloud-architectureresiliencereliability+2

Hexagonal Architecture (Ports and Adapters)

Alistair Cockburn•Jun 4, 2005

An architecture that isolates core application logic behind ports, with adapters connecting external concerns like databases and UIs, so the core stays independent of technology.

backendarchitecturedomain-driven-design+2

Domain-Driven Design (DDD)

Eric Evans•Aug 30, 2003

A software design approach that models complex business domains in code, using a shared language and bounded contexts to align software structure with the business it serves.

backendarchitecturedomain-driven-design+2

Modular Monolith

ThoughtWorks•Apr 29, 2019

An architecture that keeps a single deployable application but enforces strong internal module boundaries, capturing many microservices benefits without distributed-system complexity.

backendarchitecturemicroservices+2

Data Governance Framework

DAMA International•Jan 1, 2021

A structured set of roles, policies, and processes that make an organization accountable for the quality, security, and proper use of its data assets.

data-engineeringdata-governancedata-quality+3

Data Quality Management

DAMA International•Jan 1, 2021

The practice of measuring, monitoring, and improving data across dimensions like accuracy, completeness, consistency, timeliness, and validity so it stays fit for use.

data-engineeringdata-qualitydata-validation+2

Data Contracts

dbt Labs•Jan 1, 2023

Explicit, version-controlled agreements between data producers and consumers that define schema, semantics, quality, and SLAs to prevent breaking changes.

data-engineeringdata-contractsschema+2

Medallion Architecture

Databricks•Jan 1, 2021

A layered data design that refines data through Bronze (raw), Silver (cleaned and conformed), and Gold (business-ready) tables to improve quality and reuse.

data-engineeringmedallion-architecturelakehouse+2

Data Lakehouse Architecture

Databricks•Jan 11, 2021

An architecture that combines the low-cost, open storage of a data lake with the transactions, schema, and performance of a data warehouse using open table formats.

data-engineeringdata-lakehousedata-architecture+2

ELT vs ETL Best Practices

dbt Labs•Jan 1, 2022

Guidance on when to transform data before loading (ETL) versus loading raw and transforming in the warehouse (ELT), and how to run each pattern well.

data-engineeringeltetl+2

Data Lineage

Linux Foundation (OpenLineage)•Jan 1, 2021

The traceable record of data's origin, movement, and transformation across systems, enabling impact analysis, debugging, compliance, and trust.

data-engineeringdata-lineageobservability+2

Reverse ETL

dbt Labs•Jun 1, 2021

The practice of moving modeled data from the warehouse back into operational tools like CRM and marketing platforms so business teams act on it directly.

data-engineeringreverse-etldata-activation+2

Feature Store Best Practices

Linux Foundation (Feast)•Jan 1, 2022

A centralized system for defining, storing, and serving machine learning features consistently for training and inference, avoiding skew and duplicated work.

ai-mlfeature-storemlops+2

MLOps Principles

Google•Jan 1, 2021

The discipline of applying DevOps and engineering rigor to machine learning so models are built, deployed, monitored, and retrained reliably and reproducibly.

ai-mlmlopsmachine-learning+2

ML Model Monitoring and Drift Detection

Evidently AI•Jan 1, 2022

Continuously tracking deployed ML models for performance decay, data drift, and concept drift so degradation is caught and corrected before it harms outcomes.

ai-mlmodel-monitoringdrift-detection+2

Data Version Control (DVC)

Iterative (DVC)•Jan 1, 2020

Versioning datasets, models, and ML pipelines alongside code so experiments are reproducible, using Git for metadata and external storage for large files.

ai-mldata-version-controldvc+2

Schema Evolution and Schema Registry

Confluent•Jan 1, 2021

Managing how data schemas change over time with compatibility rules and a central registry so producers and consumers evolve without breaking each other.

data-formatschema-evolutionschema-registry+2

Data Catalog and Discovery

Linux Foundation (DataHub)•Jan 1, 2021

A searchable inventory of an organization's data assets with metadata, ownership, and lineage so people can find, understand, and trust the data they need.

data-engineeringdata-catalogmetadata+2

Apache Kafka Streaming Best Practices

Confluent•Jan 1, 2021

Design and operational guidance for building reliable, scalable event streaming on Apache Kafka, covering topics, partitions, delivery semantics, and consumers.

data-engineeringkafkaevent-streaming+2

Retrieval-Augmented Generation (RAG) Best Practices

Meta AI (RAG paper authors)•May 22, 2020

RAG grounds a large language model in external documents retrieved at query time, reducing hallucination and letting answers reflect current, private data without retraining the model.

ragretrieval-augmented-generationai-ml+3

Prompt Engineering Best Practices

OpenAI•Mar 1, 2023

Prompt engineering is the practice of designing clear instructions, examples, and structure so a large language model returns accurate, consistent, and useful output.

prompt-engineeringai-mlllm+3

LLM Evaluation and Evals

OpenAI•May 1, 2023

LLM evaluation measures model and application quality with repeatable tests, scoring accuracy, faithfulness, safety, and cost so teams can ship and improve with evidence.

llm-evaluationevalsai-ml+3

LLM Guardrails

Guardrails AI•Apr 1, 2023

LLM guardrails are programmatic checks on model inputs and outputs that enforce safety, format, topic, and policy rules, blocking or correcting unsafe or off-policy responses.

llm-guardrailsai-mlllm+3

AI Red Teaming

OWASP•Jul 1, 2023

AI red teaming is structured adversarial testing of AI systems to find harmful, biased, or insecure behavior before attackers or real users do, using crafted attacks and probes.

ai-red-teamingsecurityllm+3

Model Context Protocol (MCP)

Anthropic•Nov 25, 2024

The Model Context Protocol is an open standard that lets AI applications connect to external tools and data sources through a uniform client-server interface.

model-context-protocolmcpai-ml+3

AI Agent Design Patterns

Anthropic•Jun 1, 2024

AI agent design patterns are reusable structures for LLM systems that plan, use tools, and act over multiple steps, covering reflection, tool use, planning, and multi-agent collaboration.

ai-agentsai-mlllm+3

LLM Observability

Cloud Native Computing Foundation (OpenTelemetry)•Aug 1, 2023

LLM observability is the practice of tracing, logging, and measuring LLM applications in production to monitor quality, cost, latency, and safety and to debug failures.

llm-observabilityobservabilityai-ml+3

Vector Database Best Practices

Pinecone•Feb 1, 2023

A vector database stores embeddings and serves fast similarity search for AI features like RAG and semantic search; best practices cover indexing, metadata, and freshness.

vector-databaseai-mlembeddings+3

OWASP Top 10 for LLM Applications (2025)

OWASP•Nov 18, 2024

The OWASP Top 10 for LLM Applications lists the most critical security risks for generative AI systems, including prompt injection, sensitive data disclosure, and supply chain risk.

owasp-llm-top-10securityllm+3

Prompt Injection Defense

OWASP•Sep 1, 2023

Prompt injection defense protects LLM applications from attacks that hide malicious instructions in user input or retrieved content to override the system's intended behavior.

prompt-injectionsecurityllm+3

ISO/IEC 42001 AI Management System

ISO/IEC•Dec 18, 2023

ISO/IEC 42001 is the first international standard for an Artificial Intelligence Management System, giving organizations a certifiable framework to govern AI responsibly.

iso-iec-42001ai-governanceaims+3

AI TRiSM (Trust, Risk and Security Management)

Gartner•Sep 1, 2022

AI TRiSM is a framework for managing the trust, risk, and security of AI systems across explainability, model operations, data protection, and runtime application security.

ai-trismai-governanceai-risk+3

LLM Cost Optimization

OpenAI•Oct 1, 2023

LLM cost optimization reduces the spend of running language model applications through model selection, caching, prompt efficiency, and token-aware design without sacrificing quality.

llm-cost-optimizationai-mlllm+3

Fine-Tuning vs RAG Decision Framework

OpenAI•Nov 1, 2023

A decision framework for choosing between fine-tuning, RAG, or both, based on whether the goal is new knowledge, consistent behavior, freshness, or domain adaptation.

fine-tuning-vs-ragai-mlllm+3

Hallucination Mitigation

Anthropic•Jun 1, 2023

Hallucination mitigation reduces confident but false LLM output through grounding, retrieval, citation, verification, and uncertainty handling so answers can be trusted.

hallucination-mitigationai-mlllm+3

Richardson Maturity Model

Leonard Richardson•Jan 1, 2008

A four-level model for grading how fully an HTTP API embraces REST, from RPC-style endpoints up to hypermedia controls (HATEOAS).

api-designresthateoas+2

OpenAPI Specification Best Practices

OpenAPI Initiative (Linux Foundation)•Feb 15, 2021

Guidance for writing accurate, machine-readable OpenAPI documents that describe HTTP APIs and drive docs, client SDKs, mocks, and contract tests.

api-specopenapiswagger+2

GraphQL API Best Practices

GraphQL Foundation•Nov 6, 2018

Practical guidance for designing GraphQL schemas and servers: typed schemas, pagination, error handling, query cost limits, and avoiding the N+1 problem.

api-designgraphqlschema-design+2

gRPC Best Practices

Cloud Native Computing Foundation•Aug 23, 2016

Guidance for building high-performance gRPC services with Protocol Buffers: service design, streaming, deadlines, error codes, and backward-compatible schema evolution.

api-designgrpcprotobuf+2

API-First Design

OpenAPI Initiative (Linux Foundation)•Jan 1, 2020

An approach that treats the API contract as a product designed before implementation, so teams agree on the interface, then build clients and servers in parallel.

api-designapi-firstdesign-first+2

Idempotency Keys

Stripe•Feb 22, 2017

A pattern where clients send a unique key with unsafe requests so the server can safely retry without applying the same operation twice, preventing duplicate charges or records.

api-designidempotencyreliability+2

API Rate Limiting

IETF•Jan 1, 2021

Controlling how many requests a client can make in a time window to protect API capacity, ensure fair use, and defend against abuse, using algorithms like token bucket.

api-designrate-limitingthrottling+2

API Pagination Best Practices

Google•Apr 25, 2017

Techniques for returning large result sets in pages without breaking under concurrent writes: offset, cursor (keyset), and page-token pagination, with stable ordering.

api-designpaginationcursor+2

Webhook Best Practices

Stripe•May 1, 2019

Guidance for sending and receiving reliable webhooks: signature verification, idempotent handlers, retries with backoff, and fast acknowledgement of events.

api-designwebhooksevents+2

OAuth 2.0 and OpenID Connect

OpenID Foundation•Feb 25, 2014

OAuth 2.0 delegates authorization via access tokens; OpenID Connect adds an identity layer for authentication. Together they secure API access and single sign-on.

identity-authoauthoidc+2

AsyncAPI Specification

AsyncAPI Initiative (Linux Foundation)•Jun 22, 2021

A standard, machine-readable format for describing event-driven and message-based APIs across protocols like Kafka, MQTT, and AMQP, analogous to OpenAPI for REST.

messaging-protocolasyncapievent-driven+2

Problem Details for HTTP APIs (RFC 9457)

IETF•Jul 1, 2023

An IETF standard JSON format for machine-readable HTTP error responses, defining fields like type, title, status, detail, and instance for consistent error handling.

api-designerror-handlingrfc-9457+2

API Backward Compatibility

Google•Apr 25, 2017

Evolving an API without breaking existing clients by making only additive changes, versioning breaking changes, and deprecating fields gracefully over time.

api-designbackward-compatibilityversioning+2

JSON:API Specification

JSON:API•May 29, 2015

A convention for building JSON APIs that standardizes resource structure, relationships, pagination, filtering, and sparse fieldsets to reduce bikeshedding and over-fetching.

api-specjson-apirest+2

Progressive Enhancement

Steven Champeon / A List Apart•Jan 1, 2003

A frontend strategy that builds a baseline experience with semantic HTML first, then layers CSS and JavaScript so the site works for every browser and device.

progressive-enhancementfrontendaccessibility+2

WCAG 2.2 Accessibility Compliance

World Wide Web Consortium (W3C)•Oct 5, 2023

The W3C Web Content Accessibility Guidelines 2.2 define testable success criteria across four principles so web content is perceivable, operable, understandable, and robust.

accessibilitywcaga11y+2

Performance Budgets

Google (web.dev)•Jan 1, 2013

A performance budget sets quantitative limits on metrics like page weight, request count, and load timings, enforced in development and CI to stop regressions.

performance-budgetsfrontendweb-performance+2

Mobile-First Design

Luke Wroblewski•Jan 1, 2011

An approach that designs the smallest-screen experience first, then progressively adds layout and features for larger viewports, prioritizing content and performance.

mobile-firstfrontendresponsive-design+2

Image Optimization Best Practices

Google (web.dev)•Jan 1, 2019

Techniques to reduce image bytes and improve loading using modern formats, responsive sizing, compression, lazy loading, and CDNs without sacrificing visual quality.

image-optimizationfrontendweb-performance+2

Atomic Design

Brad Frost•Jan 1, 2016

A methodology by Brad Frost for building UI from five composable levels: atoms, molecules, organisms, templates, and pages, giving design systems a consistent structure.

atomic-designfrontenddesign-systems+2

Component-Driven Development

Tom Coleman / Chromatic•Nov 1, 2017

A development approach that builds UIs bottom-up from isolated, reusable components, developed and tested independently before assembly into pages and apps.

component-drivenfrontenddesign-systems+2

Design Systems

Nielsen Norman Group•Jan 1, 2017

A design system is a single source of truth combining reusable components, design tokens, patterns, and guidelines that keep products consistent and faster to build.

design-systemsfrontenddesign-tokens+2

Micro-Frontends

Cam Jackson / Thoughtworks•Jun 19, 2019

An architecture that splits a web app into independently developed and deployed frontend pieces owned by separate teams, then composes them into one experience.

micro-frontendsfrontendarchitecture+2

Progressive Web Apps (PWA)

Google (web.dev)•Jan 1, 2015

Web apps that use service workers, a manifest, and HTTPS to deliver installable, offline-capable, app-like experiences from a single codebase across platforms.

pwafrontendservice-worker+2

Content Security Policy (CSP)

World Wide Web Consortium (W3C)•Jun 15, 2018

A W3C security standard delivered via an HTTP header that controls which sources a browser may load, mitigating cross-site scripting and data injection attacks.

content-security-policysecuritycsp+2

Responsive Web Design

Ethan Marcotte / A List Apart•May 25, 2010

An approach by Ethan Marcotte that uses fluid grids, flexible media, and media queries so one layout adapts seamlessly across screen sizes and devices.

responsive-designfrontendmobile-first+2

Lazy Loading and Code Splitting

Google (web.dev)•Jan 1, 2018

Techniques that defer loading of non-critical code and assets and split bundles by route or component, reducing initial payload and speeding up first load.

lazy-loadingfrontendcode-splitting+2

Frontend Internationalization (i18n)

World Wide Web Consortium (W3C)•Dec 16, 2014

Designing and building UIs so they can adapt to multiple languages, regions, and formats without code changes, separating translatable text from logic.

internationalizationfrontendi18n+2

The Test Pyramid

Martin Fowler•May 1, 2012

A testing strategy that favors many fast unit tests, fewer integration tests, and a small number of slow end-to-end tests.

testingtest-pyramidunit-testing+3

The Testing Trophy

Kent C. Dodds•May 1, 2018

A testing model that weights integration tests most heavily, balancing static analysis, unit, integration, and end-to-end tests by confidence-per-cost.

testingtesting-trophyintegration-testing+3

Test-Driven Development (TDD)

Kent Beck•Nov 8, 2002

A development discipline where you write a failing test first, write minimal code to pass it, then refactor, in short red-green-refactor cycles.

testingtest-driven-developmenttdd+3

Behavior-Driven Development (BDD)

Dan North•Sep 1, 2006

A collaborative practice that expresses requirements as concrete, executable examples in plain language shared by business, development, and testing.

software-processbehavior-driven-developmentbdd+3

Property-Based Testing

QuickCheck (Koen Claessen and John Hughes)•Sep 1, 2000

A technique that asserts general properties of code and lets a framework generate many randomized inputs to find counterexamples and shrink them.

testingproperty-based-testingfuzzing+3

Mutation Testing

PIT (Pitest)•Jan 1, 2016

A technique that injects small faults (mutants) into code and checks whether tests detect them, measuring how effective the test suite really is.

testingmutation-testingtest-quality+3

Flaky Test Management

Google Testing Blog•May 27, 2016

A disciplined approach to detecting, quarantining, and fixing nondeterministic tests so CI signal stays trustworthy and developers keep merging.

testingflaky-teststest-reliability+3

Code Coverage Best Practices

Google Testing Blog•Aug 4, 2020

Guidance on using code coverage as a signal of untested code rather than a target, including diff coverage and avoiding coverage gaming.

testingcode-coveragetest-quality+3

Code Review Best Practices

Google Engineering Practices•Sep 1, 2019

Guidance for effective, fast, and respectful code review, drawn from Google's engineering practices, to improve code health over time.

quality-managementcode-reviewpull-request+3

Definition of Done

Scrum.org•Nov 1, 2020

A shared, explicit checklist of conditions a work item must meet to be considered complete, ensuring consistent quality across a team.

software-processdefinition-of-doneagile+3

End-to-End Testing Best Practices

Playwright (Microsoft)•Jan 1, 2023

Guidance for writing reliable, maintainable end-to-end tests that exercise critical user journeys without becoming slow and flaky.

testingend-to-end-testinge2e+3

Test Data Management

Thoughtworks•Jan 1, 2021

Practices for provisioning realistic, isolated, and compliant test data so tests are reliable, repeatable, and free of production data exposure.

testingtest-data-managementdata-masking+3

Static Application Security Testing in CI

OWASP•Sep 1, 2021

Integrating SAST tools into the CI pipeline to scan source code for security vulnerabilities automatically on every change.

securitysastci-cd+3

Visual Regression Testing

BackstopJS•Jan 1, 2017

Automated testing that captures screenshots of UI states and compares them against baselines to detect unintended visual changes.

testingvisual-regression-testingui-testing+3

Scrum Framework

Scrum.org / Ken Schwaber and Jeff Sutherland•Nov 1, 2020

Scrum is a lightweight agile framework for delivering products in short, fixed-length iterations called sprints, using empirical inspection and adaptation to manage complex work.

scrumagilesprint+3

Kanban Method

David J. Anderson / Kanban University•Apr 1, 2010

The Kanban Method is an evolutionary approach to managing knowledge work that visualizes flow, limits work in progress, and improves delivery continuously without prescribing fixed iterations.

kanbanflowwork-in-progress+3

Lean Software Development

Mary Poppendieck and Tom Poppendieck•May 8, 2003

Lean Software Development applies Lean manufacturing principles to software, emphasizing eliminating waste, amplifying learning, deferring decisions, and delivering fast to maximize customer value.

leanwaste-eliminationflow+3

Team Topologies

Matthew Skelton and Manuel Pais•Sep 17, 2019

Team Topologies is a model for organizing business and technology teams using four team types and three interaction modes to optimize fast flow and reduce cognitive load.

team-topologiesorganization-designcognitive-load+3

InnerSource

InnerSource Commons Foundation•Jan 1, 2015

InnerSource applies open source development practices inside an organization, letting teams share, contribute to, and reuse internal code through transparent, contribution-friendly repositories.

innersourcecollaborationcode-reuse+3

Architecture Decision Records (ADRs)

Michael Nygard (popularized); ADR community•Nov 15, 2011

An Architecture Decision Record (ADR) is a short, version-controlled document that captures one significant architectural decision, its context, and its consequences for future maintainers.

adrarchitecture-decisionsdocumentation+3

C4 Model for Software Architecture

Simon Brown•Jan 1, 2018

The C4 model is a lean, hierarchical way to diagram software architecture at four levels of abstraction: System Context, Containers, Components, and Code.

c4-modelarchitecture-diagramsdocumentation+3

Documentation as Code

Write the Docs community•Sep 1, 2017

Documentation as Code treats docs like software: stored in version control, written in plain text, reviewed in pull requests, and published automatically through continuous integration.

docs-as-codedocumentationversion-control+3

Keep a Changelog

Olivier Lacan / Keep a Changelog•Jun 1, 2017

Keep a Changelog is a convention for writing human-readable, chronologically ordered changelogs grouped by change type, so users and maintainers can see what changed in each release.

changelogrelease-notesversion-control+3

Platform Engineering

Cloud Native Computing Foundation (CNCF)•Jun 1, 2022

Platform engineering builds and runs internal self-service platforms and paved roads that let product teams ship software faster with lower cognitive load and consistent guardrails.

platform-engineeringself-servicepaved-roads+3

Internal Developer Platform

Cloud Native Computing Foundation (CNCF)•Jan 1, 2021

An Internal Developer Platform (IDP) is the self-service product built by platform teams that gives developers golden paths to provision, build, deploy, and operate software with built-in guardrails.

internal-developer-platformself-servicedeveloper-portal+3

SOC 2 Compliance

American Institute of Certified Public Accountants (AICPA)•Apr 1, 2017

SOC 2 is an AICPA auditing framework that assesses how a service organization protects customer data against five Trust Services Criteria: security, availability, processing integrity, confidentiality, and privacy.

soc-2trust-services-criteriacompliance+3

PCI DSS Compliance

PCI Security Standards Council•Mar 31, 2022

PCI DSS is the global security standard for organizations that handle payment card data, defining requirements to protect cardholder data across networks, systems, and processes.

pci-dsscardholder-datacompliance+3

GDPR Compliance Engineering

European Union•May 25, 2018

GDPR compliance engineering turns the EU General Data Protection Regulation's legal principles into concrete technical controls: lawful processing, data minimization, consent, and data-subject rights.

gdprdata-protectioncompliance+3

Cloud Cost Allocation and Tagging

FinOps Foundation•Jun 1, 2023

Cloud cost allocation and tagging is the FinOps practice of labeling cloud resources with consistent metadata so spend can be attributed accurately to teams, products, and environments.

cost-allocationtaggingfinops+3

Green Software Engineering

Green Software Foundation•May 1, 2022

Green software engineering is the practice of building applications that are carbon-efficient, energy-efficient, and carbon-aware, reducing the environmental impact of software at every layer.

green-softwarecarbon-efficiencyfinops+3