Plain-language definitions of AI terms31 terms
31 terms
agentic ai
Agentic AI refers to an artificial intelligence system designed with the capability to autonomously perceive its environment, process information, formulate plans, make decisions, and execute actions to achieve predefined goals. These systems often incorporate components for memory, reasoning, learning, and interaction, allowing them to operate effectively in dynamic and complex environments without constant human intervention.
attention
In neural networks, particularly within sequence-to-sequence models like Transformers, attention is a mechanism that enables the model to selectively focus on and weigh the importance of different parts of the input sequence when generating or processing each element of the output sequence. It computes a context vector as a weighted sum of input features, where the weights are dynamically determined by the similarity between the current hidden state (query) and all input hidden states (keys), often followed by a softmax function to normalize these weights.
bert
BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained deep learning model developed by Google for natural language processing (NLP) tasks. It leverages the Transformer architecture and is notable for its bidirectional training approach, which enables it to understand the context of a word by simultaneously considering the words that precede and follow it. BERT is typically fine-tuned on specific downstream NLP tasks after its initial unsupervised pre-training on large text corpora.
chain-of-thought
Chain-of-Thought (CoT) is a prompting technique employed with large language models (LLMs) that encourages the model to generate a series of intermediate reasoning steps before arriving at a final answer. This explicit decomposition of a complex problem into a sequence of logical steps significantly enhances the model's ability to perform multi-step reasoning, arithmetic, and symbolic manipulation, leading to more accurate and interpretable results compared to direct prompting.
context window
In the domain of Large Language Models (LLMs) and other sequence-to-sequence architectures, the 'context window' (also known as context length or attention window) denotes the maximum number of tokens (e.g., words, subword units, or characters) that the model can simultaneously consider or attend to when processing an input sequence and generating an output. This limit dictates how much historical information or input data the model can 'remember' and utilize to inform its current prediction, directly impacting its ability to understand long-range dependencies and maintain coherence over extended dialogues or documents.
diffusion
In the context of AI/ML, 'diffusion' refers to a class of generative models, specifically Denoising Diffusion Probabilistic Models (DDPMs) and their variants, that learn to generate data by reversing a gradual noising process. These models operate in two phases: a 'forward diffusion' (or noising) process that progressively adds Gaussian noise to input data until it becomes pure noise, and a 'reverse diffusion' (or denoising) process that learns to iteratively remove this noise, starting from random noise, to reconstruct a clean data sample. The reverse process is typically modeled by a neural network (e.g., U-Net) that predicts the noise added at each step, allowing for the generation of high-quality, diverse samples.
embeddings
Embeddings are low-dimensional, continuous vector representations of discrete objects (e.g., words, images, users, or items) in a continuous vector space. These vectors are learned in such a way that objects with similar semantic or functional properties are mapped to nearby points in the embedding space, thereby capturing their relationships and facilitating machine learning tasks by converting sparse, high-dimensional inputs into dense, meaningful representations.
fine-tuning
Fine-tuning is a transfer learning technique in machine learning where a pre-trained model, which has been previously trained on a large, general dataset (e.g., ImageNet for computer vision or a massive text corpus for NLP), is further trained on a smaller, task-specific dataset. This process involves adjusting the model's weights and biases to adapt its learned features and knowledge to a new, often more specialized, downstream task, typically with a lower learning rate than the initial pre-training phase.
gan
A Generative Adversarial Network (GAN) is a class of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks contesting with each other in a zero-sum game framework. The two main components are a 'generator' network, which learns to create new data instances that resemble the training data, and a 'discriminator' network, which learns to distinguish between real data from the training set and fake data produced by the generator. During training, the generator aims to produce data realistic enough to fool the discriminator, while the discriminator aims to accurately identify fake data, leading to a dynamic equilibrium where both networks improve their performance.
gpt
GPT, an acronym for Generative Pre-trained Transformer, refers to a family of large language models (LLMs) developed by OpenAI. These models are based on the transformer neural network architecture, which utilizes self-attention mechanisms to process input sequences in parallel, enabling highly efficient contextual understanding and generation. GPT models are 'pre-trained' on vast corpora of text data using unsupervised learning to predict the next token, and are 'generative' in their ability to produce coherent and contextually relevant human-like text outputs for a wide range of natural language processing tasks, including text generation, translation, summarization, and question answering.
grounding
In AI and machine learning, "grounding" refers to the process of connecting abstract symbols, concepts, or linguistic expressions within a computational system to their corresponding real-world referents, sensory experiences, or perceptual data. This enables the system to understand the meaning of its internal representations by linking them to observable phenomena or actions, thereby moving beyond purely syntactic manipulation to semantic understanding.
hallucination
In the context of generative AI models, particularly large language models (LLMs) and diffusion models, 'hallucination' refers to the phenomenon where the model generates content (text, images, etc.) that is factually incorrect, nonsensical, or unfaithful to the provided source data or prompts, despite being presented in a confident and coherent manner. This output is not based on real-world facts or the input context but rather is an artifact of the model's internal statistical patterns and learned representations.
inference
In machine learning, inference is the process of applying a trained model to new, unseen input data to generate predictions, classifications, or other outputs. This typically involves feeding raw data through the model's learned parameters and computations to derive a result, such as identifying an object in an image, translating text, or forecasting a value.
instruction tuning
Instruction tuning is a supervised fine-tuning technique applied to pre-trained language models, where the model is trained on a dataset of diverse instructions paired with their corresponding desired outputs. The primary goal is to enhance the model's ability to follow natural language instructions, generalize to unseen tasks, and improve its alignment with human intent and preferences across a wide range of prompts, often leading to better performance in zero-shot and few-shot settings.
llm
A Large Language Model (LLM) is a type of artificial intelligence model, typically based on the transformer architecture, that has been trained on a massive dataset of text and code. LLMs are characterized by their vast number of parameters (often billions or trillions), which enables them to learn complex patterns, generate human-like text, understand natural language, and perform a wide range of natural language processing tasks such as translation, summarization, question answering, and content creation.
lora
LoRA (Low-Rank Adaptation of Large Language Models) is a parameter-efficient fine-tuning (PEFT) technique that adapts large pre-trained models to new tasks by injecting trainable low-rank matrices into the transformer architecture's attention layers. Instead of fine-tuning all parameters of the original model, LoRA freezes the pre-trained weights and optimizes only these much smaller, newly introduced low-rank matrices, dramatically reducing the number of trainable parameters and computational cost while often achieving performance comparable to full fine-tuning.
moe
Mixture of Experts (MoE) is a machine learning architecture that employs multiple specialized neural networks, referred to as 'experts,' and a 'gating network' or 'router.' The gating network learns to dynamically select or weigh the outputs of these experts based on the input data. Each expert is typically trained to be proficient in a specific region of the input space or a particular sub-task, allowing the overall model to achieve higher capacity and efficiency by conditionally activating only the relevant parts of the network for a given input, rather than processing it through a single, monolithic model.
multimodal
In artificial intelligence and machine learning, 'multimodal' refers to systems or models capable of processing, integrating, and/or generating information from multiple distinct data modalities. These modalities can include text, images, audio, video, sensor data, and other forms of structured or unstructured input, allowing for a more comprehensive understanding or richer output than single-modality approaches.
prompt engineering
Prompt engineering is the systematic process of designing, refining, and optimizing input queries (prompts) for large language models (LLMs) and other generative AI systems to elicit desired, high-quality, and contextually appropriate outputs. It involves crafting specific instructions, examples, context, and constraints to guide the model's behavior and performance for a particular task or application.
quantization
Quantization, in the context of AI/ML, is the process of converting continuous or high-precision numerical representations (e.g., 32-bit floating-point numbers) used for model weights and activations into a lower-precision format (e.g., 8-bit integers or even binary). This technique aims to reduce the memory footprint, computational cost, and power consumption of machine learning models, particularly for deployment on edge devices or resource-constrained hardware, while striving to maintain acceptable model accuracy.
rag
Retrieval Augmented Generation (RAG) is an architectural pattern in natural language processing (NLP) that enhances the capabilities of large language models (LLMs) by integrating an information retrieval component. Instead of relying solely on the knowledge encoded during its training phase, a RAG model first retrieves relevant information from an external, typically up-to-date, knowledge base or document store based on the user's query. This retrieved context is then provided to the LLM alongside the original query, allowing the model to generate more accurate, factual, and less 'hallucinated' responses, especially for domain-specific or rapidly changing information.
reinforcement learning
Reinforcement Learning (RL) is a paradigm of machine learning where an autonomous agent learns to make optimal decisions by interacting with an environment. The agent performs actions, receives feedback in the form of rewards or penalties, and adjusts its policy (strategy) to maximize the cumulative reward over time, without explicit programming for desired outcomes. Key components include the agent, environment, states, actions, rewards, and a policy.
rlhf
Reinforcement Learning from Human Feedback (RLHF) is a machine learning technique used to align large language models (LLMs) and other AI models with human preferences and values. It involves training a reward model on a dataset of human comparisons or rankings of model outputs, and then using this reward model to fine-tune the original LLM via a reinforcement learning algorithm, typically Proximal Policy Optimization (PPO). The reward model learns to predict which model outputs humans prefer, and the LLM is subsequently optimized to generate outputs that maximize this predicted reward, thereby improving its helpfulness, harmlessness, and honesty.
speculative decoding
Speculative decoding is an inference optimization technique for large language models (LLMs) that accelerates token generation by leveraging a smaller, faster 'draft model' to predict a sequence of future tokens. These predicted tokens are then simultaneously verified by the larger, more accurate 'main model'. If the draft model's predictions are correct, multiple tokens can be accepted in a single main model forward pass, significantly reducing the number of sequential main model computations required. If a prediction is incorrect, the main model generates the correct token from that point, and the process restarts with the main model's output as the new starting point for the draft model.
supervised learning
Supervised learning is a machine learning paradigm where an algorithm learns a mapping function from input variables (features) to an output variable (target) by analyzing a labeled dataset. This dataset consists of pairs of input data and their corresponding correct output labels. The goal is for the model to generalize from these examples to accurately predict the output for new, unseen input data.
tokenization
Tokenization is a fundamental natural language processing (NLP) technique that involves segmenting a continuous sequence of text into smaller, discrete units called tokens. These tokens can be words, subword units (e.g., 'un-','-able'), characters, or even punctuation marks, depending on the specific tokenization strategy and the downstream task. The process typically precedes numerical representation (e.g., embedding) for machine learning models.
transformer
A Transformer is a deep learning architecture introduced in 2017, primarily designed for sequence-to-sequence tasks, most notably in natural language processing (NLP). Its core innovation is the self-attention mechanism, which allows it to weigh the importance of different parts of the input sequence when processing each element, enabling parallel computation across the entire sequence. Unlike recurrent neural networks (RNNs) or convolutional neural networks (CNNs), Transformers do not rely on sequential processing or local receptive fields, making them highly efficient for long-range dependencies and scalable for large datasets.
unsupervised learning
Unsupervised learning is a category of machine learning algorithms that analyze and cluster unlabeled datasets, inferring hidden patterns, structures, or relationships within the data without human intervention or prior knowledge of expected output. Its primary goal is to model the underlying structure or distribution in the data to learn more about the data itself.
vae
A Variational Autoencoder (VAE) is a type of generative neural network that learns a compressed, probabilistic representation (latent space) of input data. Unlike a standard autoencoder which learns a deterministic mapping, a VAE maps inputs to parameters of a probability distribution (mean and variance) within the latent space. It consists of an encoder that transforms input data into these latent distribution parameters, and a decoder that samples from this latent distribution to reconstruct the original data. VAEs are trained to minimize both the reconstruction error and a Kullback-Leibler (KL) divergence term, which regularizes the latent space to approximate a prior distribution (e.g., a standard normal distribution), thereby enabling smooth interpolation and generation of new, similar data points.
vector database
A vector database is a specialized database system designed to store, manage, and query high-dimensional vector embeddings, which are numerical representations of data (e.g., text, images, audio) derived from machine learning models. It utilizes indexing algorithms, such as Approximate Nearest Neighbor (ANN) search methods (e.g., HNSW, IVFFlat, Annoy), to efficiently perform similarity searches, finding vectors that are 'close' to a given query vector in the embedding space, rather than relying on exact keyword matches or structured queries.
zero-shot
Zero-shot learning (ZSL) is a machine learning paradigm where a model is trained to perform a task on data instances from categories or classes that were not present during its training phase. This is typically achieved by leveraging auxiliary information, such as semantic descriptions (e.g., word embeddings, attributes) of the unseen classes, to generalize from seen classes to unseen ones without any direct examples of the latter.