Retrieval Augmented Generation (RAG)
Return to A Simple Guide to Retrieval Augmented Generation, Knowledge Graph-Enhanced RAG, Prompt Engineering, Manning AI-ML-DL-NLP-GAN-LLM-RAG-Chatbot-ChatGPT Series, Manning Data Science Series, AI Bibliography, Manning Books Purchased by Cloud Monk, Manning Bibliography, Cloud Monk's Book Purchases, Cloud Monk Library, Bibliography, Manning Publications
- Snippet from Wikipedia: Retrieval-augmented generation
Retrieval-augmented generation (RAG) is a technique that enables generative artificial intelligence (Gen AI) models to retrieve and incorporate new information. It modifies interactions with a large language model (LLM) so that the model responds to user queries with reference to a specified set of documents, using this information to supplement information from its pre-existing training data. This allows LLMs to use domain-specific and/or updated information. Use cases include providing chatbot access to internal company data or generating responses based on authoritative sources.
RAG improves large language models (LLMs) by incorporating information retrieval before generating responses. Unlike traditional LLMs that rely on static training data, RAG pulls relevant text from databases, uploaded documents, or web sources. According to Ars Technica, "RAG is a way of improving LLM performance, in essence by blending the LLM process with a web search or other document look-up process to help LLMs stick to the facts." This method helps reduce AI hallucinations, which have led to real-world issues like chatbots inventing policies or lawyers citing nonexistent legal cases.
By dynamically retrieving information, RAG enables AI to provide more accurate responses without frequent retraining. According to IBM, "RAG also reduces the need for users to continuously train the model on new data and update its parameters as circumstances evolve. In this way, RAG can lower the computational and financial costs of running LLM-powered chatbots in an enterprise setting."
Beyond efficiency gains, RAG also allows LLMs to include source references in their responses, enabling users to verify information by reviewing cited documents or original sources. This can provide greater transparency, as users can cross-check retrieved content to ensure accuracy and relevance.
The term "retrieval-augmented generation" (RAG) was first introduced in 2020 by Douwe Kiela, Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, and Sebastian Riedel in their research paper Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, at Meta.
LLM: Large Language Models (LLMs), Alpaca, Retrieval Augmented Generation (RAG), Awesome LLMs. (navbar_llm - see also navbar_chatbot, navbar_chatgpt, navbar_nlp, navbar_ai, navbar_dl, navbar_ml, borg_usage_disclaimer)
Terms related to: AI-ML-DL-NLP-GenAI-LLM-GPT-RAG-MLOps-Chatbots-ChatGPT-Gemini-Copilot-HuggingFace-GPU-Prompt Engineering-Data Science-DataOps-Data Engineering-Big Data-Analytics-Databases-SQL-NoSQL
AI, Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), Neural Network, Generative AI (GenAI), Natural Language Processing (NLP), Large Language Model (LLM), Transformer Models, GPT (Generative Pre-trained Transformer), ChatGPT, Chatbots, Prompt Engineering, HuggingFace, GPU (Graphics Processing Unit), RAG (Retrieval-Augmented Generation), MLOps (Machine Learning Operations), Data Science, DataOps (Data Operations), Data Engineering, Big Data, Analytics, Databases, SQL (Structured Query Language), NoSQL, Gemini (Google AI Model), Copilot (AI Pair Programmer), Foundation Models, LLM Fine-Tuning, LLM Inference, LLM Training, Parameter-Efficient Tuning, Instruction Tuning, Few-Shot Learning, Zero-Shot Learning, One-Shot Learning, Meta-Learning, Reinforcement Learning from Human Feedback (RLHF), Self-Supervised Learning, Contrastive Learning, Masked Language Modeling, Causal Language Modeling, Attention Mechanism, Self-Attention, Multi-Head Attention, Positional Embeddings, Word Embeddings, Tokenization, Byte Pair Encoding (BPE), SentencePiece Tokenization, Subword Tokenization, Prompt Templates, Prompt Context Window, Context Length, Scaling Laws, Parameter Scaling, Model Architecture, Model Distillation, Model Pruning, Model Quantization, Model Compression, Low-Rank Adaptation (LoRA), Sparse Models, Mixture of Experts, Neural Architecture Search (NAS), AutoML, Gradient Descent Optimization, Stochastic Gradient Descent (SGD), Adam Optimizer, AdamW Optimizer, RMSProp Optimizer, Adagrad Optimizer, Adadelta Optimizer, Nesterov Momentum, Learning Rate Schedules, Warmup Steps, Cosine Decay, Hyperparameter Tuning, Bayesian Optimization, Grid Search, Random Search, Population Based Training, Early Stopping, Regularization, Dropout, Weight Decay, Label Smoothing, Batch Normalization, Layer Normalization, Instance Normalization, Group Normalization, Residual Connections, Skip Connections, Encoder-Decoder Architecture, Encoder Stack, Decoder Stack, Cross-Attention, Feed-Forward Layers, Position-Wise Feed-Forward Network, Pre-LN vs Post-LN, Sequence-to-Sequence Models, Causal Decoder-Only Models, Masked Autoencoder, Domain Adaptation, Task-Specific Heads, Classification Head, Regression Head, Token Classification Head, Sequence Classification Head, Multiple-Choice Head, Span Prediction Head, Causal Head, Next Sentence Prediction, MLM (Masked Language Modeling), NSP (Next Sentence Prediction), C4 Dataset, WebText Dataset, Common Crawl Corpus, Wikipedia Corpus, BooksCorpus, Pile Dataset, LAION Dataset, Curated Corpora, Fine-Tuning Datasets, Instruction Data, Alignment Data, Human Feedback Data, Preference Ranking, Reward Modeling, RLHF Policy Optimization, Batch Inference, Online Inference, Vector Databases, FAISS Integration, Chroma Integration, Weaviate Integration, Pinecone Integration, Milvus Integration, Data Embeddings, Semantic Search, Embedding Models, Text-to-Vector Encoding, Vector Similarity Search, Approximate Nearest Neighbor (ANN), HNSW Index, IVF Index, ScaNN Index, Memory Footprint Optimization, HuggingFace Transformers, HuggingFace Hub, HuggingFace Datasets, HuggingFace Model Cards, HuggingFace Spaces, HuggingFace Inference Endpoints, HuggingFace Accelerate, HuggingFace PEFT (Parameter Efficient Fine-Tuning), HuggingFace Safetensors Format, HuggingFace Tokenizers, HuggingFace Pipeline, HuggingFace Trainer, HuggingFace Auto Classes (AutoModel, AutoTokenizer), HuggingFace Model Conversion, HuggingFace Community Models, HuggingFace Diffusers, Stable Diffusion, HuggingFace Model Hub Search, HuggingFace Secrets Management, OpenAI GPT models, OpenAI API, OpenAI Chat Completions, OpenAI Text Completions, OpenAI Embeddings API, OpenAI Rate Limits, OpenAI Fine-Tuning (GPT-3.5, GPT-4), OpenAI System Messages, OpenAI Assistant Messages, OpenAI User Messages, OpenAI Function Calls, OpenAI ChatML Format, OpenAI Temperature Parameter, OpenAI Top_p Parameter, OpenAI Frequency Penalty, OpenAI Presence Penalty, OpenAI Max Tokens Parameter, OpenAI Logit Bias, OpenAI Stop Sequences, Azure OpenAI Integration, Anthropic Claude Integration, Anthropic Claude Context Window, Anthropic Claude Constitutional AI, Cohere Integration LLM provider, Llama2 (Meta's LLM), Llama2 Chat Model, Vicuna Model (LLM)), Alpaca Model, StableLM, MPT (MosaicML Pretrained Transformer), Falcon LLM, Baichuan LLM, Code Llama, WizardCoder Model, WizardLM Model, Phoenix LLM, Samantha LLM, LoRA Adapters, PEFT for LLM, BitFit Parameters Tuning, QLoRA (Quantized LoRA), GLoRA, GGML Quantization, GPTQ Quantization, SmoothQuant, Int4 Quantization, Int8 Quantization, FP16 Mixed Precision, BF16 Precision, MLOps Tools, MLOps CI/CD, MLOps CD4ML, MLOps Feature Store, MLOps Model Registry, MLOps Model Serving, MLOps Model Monitoring, MLOps Model Drift Detection, MLOps Data Drift Detection, MLOps Model Explainability Integration, MLOps MLFlow Integration, MLOps Kubeflow Integration, MLOps MLRun, MLOps Seldon Core for serving, MLOps BentoML for serving, MLOps MLflow Tracking, MLOps MLflow Model Registry, MLOps DVC (Data Version Control), MLOps Delta Lake, RAG (Retrieval-Augmented Generation), RAG Document Store, RAG Vector Store Backend, RAG Memory Augmentation, RAG On-the-fly Retrieval, RAG Re-ranking Step, RAG HyDE Technique - It's known as hypothetical document embeddings - advanced but known in RAG, RAG chain-of-thought, chain-of-thought related to LLM reasoning, Chain-of-Thought Reasoning, Self-Consistency Decoding, Tree-of-thoughts, ReAct (Reason+Act) Prompting Strategy, Prompt Engineering Techniques, Prompt Templates (LLM), Prompt Variables Replacement, Prompt Few-Shot Examples, Prompt Zero-Shot Mode, Prompt Retrieval Injection, Prompt System Message, Prompt Assistant Message, Prompt Role Specification, Prompt Content Filtering, Prompt Moderation Tools, AI-Generated Code Completion, Copilot (GitHub) Integration, CoPilot CLI, Copilot Labs, Gemini (Google Model) Early access, LLM from Google, LaMDA (Language Model for Dialog Applications), PaLM (Pathways Language Model), PaLM2 (PaLM 2 Model), Flan PaLM Models, Google Vertex AI Integration, AWS Sagemaker Integration, Azure Machine Learning Integration, Databricks MLFlow Integration, HuggingFace Hub LFS for large models, LFS big files management, OPT (Open Pretrained Transformer) Meta Model, Bloom LLM, Ernie Bot (Baidu LLM), Zhipu-Chat - Another LLM from China, Salesforce CodeT5 - It's a code model, Finetune with LoRA on GPT-4, Anthropic Claude 2
Artificial Intelligence (AI): The Borg, SkyNet, Google Gemini, ChatGPT, AI Fundamentals, AI Inventor: Arthur Samuel of IBM 1959 coined term Machine Learning. Synonym Self-Teaching Computers from 1950s. Experimental AI “Learning Machine” called Cybertron in early 1960s by Raytheon Company; ChatGPT, Generative AI, NLP, GAN, AI winter, The Singularity, AI FUD, Quantum FUD (Fake Quantum Computers), AI Propaganda, Quantum Propaganda, Cloud AI (AWS AI, Azure AI, Google AI-GCP AI-Google Cloud AI, IBM AI, Apple AI), Deep Learning (DL), Machine learning (ML), AI History, AI Bibliography, Manning AI-ML-DL-NLP-GAN Series, AI Glossary, AI Topics, AI Courses, AI Libraries, AI frameworks, AI GitHub, AI Awesome List. (navbar_ai - See also navbar_dl, navbar_ml, navbar_nlp, navbar_chatbot, navbar_chatgpt, navbar_llm, navbar_openai, borg_usage_disclaimer, navbar_bigtech, navbar_cia)
Cloud Monk is Retired ( for now). Buddha with you. © 2025 and Beginningless Time - Present Moment - Three Times: The Buddhas or Fair Use. Disclaimers
SYI LU SENG E MU CHYWE YE. NAN. WEI LA YE. WEI LA YE. SA WA HE.