LLM/Agentic AI Interview Questions - Easy

Dec 13, 2025 · 12 min read · llm agentic-ai interview easy gpt agents ·

Share on:

Easy-level LLM and Agentic AI interview questions covering fundamentals, prompting, and basic agent concepts.

Q1: What is a Large Language Model (LLM)?

Answer:

Definition: Neural network trained on massive text data to predict next tokens, enabling text generation and understanding.

Key Characteristics:

Large: Billions of parameters (GPT-3: 175B, GPT-4: estimated 1.7T)
Transformer-based: Uses self-attention mechanism
Pre-trained: Trained on diverse internet text
Few-shot learning: Can perform tasks with minimal examples

How It Works:

Input text → tokenized into pieces
Each token converted to embedding vector
Transformer layers process with attention
Output: probability distribution over next tokens
Sample or pick most likely token
Repeat autoregressively

LangChain Example:

 1from langchain.llms import OpenAI
 2from langchain.prompts import PromptTemplate
 3from langchain.chains import LLMChain
 4
 5# Initialize LLM
 6llm = OpenAI(temperature=0.7, model_name="gpt-3.5-turbo")
 7
 8# Create prompt template
 9template = "The future of {topic} is"
10prompt = PromptTemplate(template=template, input_variables=["topic"])
11
12# Create chain
13chain = LLMChain(llm=llm, prompt=prompt)
14
15# Generate
16result = chain.run(topic="AI")
17print(result)

Use Cases: Text generation, summarization, translation, Q&A, code generation

Q2: What is prompt engineering?

Answer:

Definition: Crafting input text (prompts) to get desired outputs from LLMs.

Why It Matters: LLMs are sensitive to how questions are phrased.

Basic Techniques:

1. Zero-Shot

Just ask directly:

1Classify sentiment: "I love this product!"

2. Few-Shot

Provide examples:

1Classify sentiment:
2"Great service!" → Positive
3"Terrible experience." → Negative
4"It's okay." → Neutral
5"I love this product!" →

3. Chain-of-Thought

Ask for step-by-step reasoning:

1Q: Roger has 5 tennis balls. He buys 2 more cans of 3 balls each. How many balls does he have?
2A: Let's think step by step:
31. Roger starts with 5 balls
42. He buys 2 cans with 3 balls each: 2 × 3 = 6 balls
53. Total: 5 + 6 = 11 balls

4. Role-Playing

Give the model a role:

1You are an expert Python developer. Explain list comprehensions to a beginner.

5. Constraints

Specify format/length:

1Summarize this article in exactly 3 bullet points.

Best Practices:

Be specific and clear
Provide context
Use examples when possible
Iterate and refine
Test different phrasings

LangChain Implementation:

 1from langchain.prompts import (
 2    FewShotPromptTemplate,
 3    PromptTemplate,
 4    ChatPromptTemplate,
 5    SystemMessagePromptTemplate,
 6    HumanMessagePromptTemplate
 7)
 8from langchain.llms import OpenAI
 9
10# Few-Shot Example
11examples = [
12    {"input": "What's 2+2?", "output": "4"},
13    {"input": "What's 5*3?", "output": "15"}
14]
15
16example_prompt = PromptTemplate(
17    input_variables=["input", "output"],
18    template="Input: {input}\nOutput: {output}"
19)
20
21few_shot_prompt = FewShotPromptTemplate(
22    examples=examples,
23    example_prompt=example_prompt,
24    prefix="You are a helpful math tutor.",
25    suffix="Input: {input}\nOutput:",
26    input_variables=["input"]
27)
28
29llm = OpenAI(temperature=0)
30result = llm(few_shot_prompt.format(input="What's 7+8?"))
31print(result)  # Output: 15
32
33# Role-Playing with System Message
34system_template = "You are an expert {role}. Explain {topic} to a beginner."
35system_prompt = SystemMessagePromptTemplate.from_template(system_template)
36
37human_template = "{question}"
38human_prompt = HumanMessagePromptTemplate.from_template(human_template)
39
40chat_prompt = ChatPromptTemplate.from_messages([system_prompt, human_prompt])
41
42from langchain.chat_models import ChatOpenAI
43chat = ChatOpenAI()
44
45result = chat(chat_prompt.format_messages(
46    role="Python developer",
47    topic="list comprehensions",
48    question="How do list comprehensions work?"
49))

Q3: What is temperature in LLM generation?

Answer:

Definition: Parameter controlling randomness of token selection.

How It Works:

Before sampling, divide logits by temperature: $$ p_i = \frac{e^{z_i / T}}{\sum_j e^{z_j / T}} $$

where $T$ is temperature.

Effects:

Temperature = 0: Greedy (always pick most likely)

1"The sky is" → "blue" (deterministic)

Temperature = 0.7 (default): Balanced creativity

1"The sky is" → "blue" or "clear" or "bright"

Temperature = 1.5: Very creative/random

1"The sky is" → "purple" or "singing" or "infinite"

Implementation:

 1import torch
 2import torch.nn.functional as F
 3
 4def sample_with_temperature(logits, temperature=1.0):
 5    """Sample next token with temperature"""
 6    # Apply temperature
 7    logits = logits / temperature
 8    
 9    # Convert to probabilities
10    probs = F.softmax(logits, dim=-1)
11    
12    # Sample
13    next_token = torch.multinomial(probs, num_samples=1)
14    
15    return next_token
16
17# Example
18logits = torch.tensor([2.0, 1.0, 0.5, 0.1])
19
20print("Temperature 0.1 (focused):")
21for _ in range(5):
22    token = sample_with_temperature(logits, temperature=0.1)
23    print(token.item())
24
25print("\nTemperature 2.0 (creative):")
26for _ in range(5):
27    token = sample_with_temperature(logits, temperature=2.0)
28    print(token.item())

When to Use:

Low (0.1-0.5): Factual tasks, code generation, translation
Medium (0.7-1.0): General chat, creative writing
High (1.5-2.0): Brainstorming, poetry, experimental

Q4: What is an AI agent?

Answer:

Definition: System that perceives environment, makes decisions, and takes actions to achieve goals.

LLM-Based Agent: Uses LLM as reasoning engine to decide actions.

Core Components:

Perception: Observe environment (user input, tool outputs)
Reasoning: LLM decides what to do
Action: Execute tools/functions
Memory: Remember past interactions

Simple Agent Loop:

1while not done:
2    observation = get_observation()
3    action = llm.decide_action(observation, memory)
4    result = execute_action(action)
5    memory.add(observation, action, result)
6    done = check_if_goal_achieved()

Example Implementation:

 1class SimpleAgent:
 2    def __init__(self, llm, tools):
 3        self.llm = llm
 4        self.tools = tools
 5        self.memory = []
 6    
 7    def run(self, task, max_steps=10):
 8        for step in range(max_steps):
 9            # Create prompt with task and available tools
10            prompt = self._create_prompt(task)
11            
12            # LLM decides next action
13            response = self.llm.generate(prompt)
14            action, args = self._parse_action(response)
15            
16            # Execute action
17            if action == "FINISH":
18                return args["answer"]
19            
20            result = self.tools[action](**args)
21            
22            # Update memory
23            self.memory.append({
24                "action": action,
25                "args": args,
26                "result": result
27            })
28        
29        return "Max steps reached"
30    
31    def _create_prompt(self, task):
32        prompt = f"Task: {task}\n\n"
33        prompt += "Available tools:\n"
34        for name, tool in self.tools.items():
35            prompt += f"- {name}: {tool.__doc__}\n"
36        
37        if self.memory:
38            prompt += "\nPrevious actions:\n"
39            for mem in self.memory:
40                prompt += f"{mem['action']}({mem['args']}) → {mem['result']}\n"
41        
42        prompt += "\nWhat should I do next? (respond with action and args)"
43        return prompt
44    
45    def _parse_action(self, response):
46        # Parse LLM response to extract action and arguments
47        # Simplified - real implementation would be more robust
48        lines = response.strip().split('\n')
49        action = lines[0].split(':')[1].strip()
50        args = eval(lines[1].split(':')[1].strip())
51        return action, args
52
53# Usage
54def search_web(query):
55    """Search the web for information"""
56    return f"Search results for: {query}"
57
58def calculate(expression):
59    """Calculate mathematical expression"""
60    return eval(expression)
61
62tools = {
63    "search": search_web,
64    "calculate": calculate,
65    "FINISH": lambda answer: answer
66}
67
68agent = SimpleAgent(llm=my_llm, tools=tools)
69result = agent.run("What is 15% of 240?")

Types of Agents:

ReAct: Reasoning + Acting (think, then act)
Tool-using: Can call external functions
Conversational: Maintains dialogue context
Multi-agent: Multiple agents collaborate

Q5: What is Retrieval-Augmented Generation (RAG)?

Answer:

Definition: Enhance LLM responses by retrieving relevant information from external knowledge base.

Problem RAG Solves:

LLMs have knowledge cutoff date
Can't access private/proprietary data
May hallucinate facts

How RAG Works:

Index: Embed documents into vector database
Retrieve: Find relevant docs for query
Augment: Add retrieved docs to prompt
Generate: LLM answers using retrieved context

LangChain Implementation:

 1from langchain.embeddings import OpenAIEmbeddings
 2from langchain.vectorstores import FAISS
 3from langchain.text_splitter import CharacterTextSplitter
 4from langchain.chains import RetrievalQA
 5from langchain.llms import OpenAI
 6from langchain.document_loaders import TextLoader
 7
 8# Load documents
 9loader = TextLoader('documents.txt')
10documents = loader.load()
11
12# Split into chunks
13text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
14texts = text_splitter.split_documents(documents)
15
16# Create embeddings and vector store
17embeddings = OpenAIEmbeddings()
18vectorstore = FAISS.from_documents(texts, embeddings)
19
20# Create RAG chain
21llm = OpenAI(temperature=0)
22qa_chain = RetrievalQA.from_chain_type(
23    llm=llm,
24    chain_type="stuff",
25    retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
26    return_source_documents=True
27)
28
29# Query
30query = "Who created Python?"
31result = qa_chain({"query": query})
32
33print(f"Answer: {result['result']}")
34print(f"\nSources:")
35for doc in result['source_documents']:
36    print(f"- {doc.page_content[:100]}...")
37
38# Alternative: Custom RAG with more control
39from langchain.chains import LLMChain
40from langchain.prompts import PromptTemplate
41
42# Custom prompt template
43template = """Use the following pieces of context to answer the question at the end.
44If you don't know the answer, just say that you don't know, don't try to make up an answer.
45
46Context:
47{context}
48
49Question: {question}
50
51Answer:"""
52
53prompt = PromptTemplate(template=template, input_variables=["context", "question"])
54
55# Retrieval + Generation
56def rag_query(question, top_k=3):
57    # Retrieve
58    docs = vectorstore.similarity_search(question, k=top_k)
59    context = "\n\n".join([doc.page_content for doc in docs])
60    
61    # Generate
62    chain = LLMChain(llm=llm, prompt=prompt)
63    answer = chain.run(context=context, question=question)
64    
65    return answer, docs
66
67answer, sources = rag_query("What is Python known for?")
68print(answer)

Key Components:

Embedding Model: Convert text to vectors (OpenAI, Cohere, HuggingFace)
Vector Database: Store and search embeddings (FAISS, Pinecone, Weaviate, Chroma)
Retrieval Strategy: Similarity search, MMR, threshold filtering
Prompt Template: How to format context + query

Benefits:

Up-to-date information
Access to private data
Reduced hallucinations
Citable sources

Q6: What is few-shot learning in LLMs?

Answer:

Definition: LLM learns task from just a few examples in the prompt (no fine-tuning).

How It Works: LLM recognizes pattern from examples and applies to new input.

Example:

 1def few_shot_classification(text, examples):
 2    """Classify text using few-shot learning"""
 3    prompt = "Classify the sentiment:\n\n"
 4    
 5    # Add examples
 6    for example_text, label in examples:
 7        prompt += f'Text: "{example_text}"\nSentiment: {label}\n\n'
 8    
 9    # Add query
10    prompt += f'Text: "{text}"\nSentiment:'
11    
12    return llm.generate(prompt)
13
14# Usage
15examples = [
16    ("I love this product!", "Positive"),
17    ("Terrible service.", "Negative"),
18    ("It's okay, nothing special.", "Neutral")
19]
20
21result = few_shot_classification("This is amazing!", examples)
22# Output: "Positive"

Variants:

Zero-Shot: No examples

1Translate to French: "Hello"

One-Shot: One example

1Translate to French:
2English: "Goodbye" → French: "Au revoir"
3English: "Hello" → French:

Few-Shot: Multiple examples (typically 3-10)

Why It Works:

LLM learned patterns during pre-training
Examples activate relevant knowledge
In-context learning (no weight updates)

Best Practices:

Use diverse, representative examples
Order matters (put similar examples last)
More examples = better (but limited by context window)
Balance classes in classification tasks

Q7: What is the context window in LLMs?

Answer:

Definition: Maximum number of tokens LLM can process at once (input + output).

Examples:

GPT-3.5: 4,096 tokens (~3,000 words)
GPT-4: 8,192 or 32,768 tokens
Claude 2: 100,000 tokens
GPT-4 Turbo: 128,000 tokens

Why It Matters:

Limits how much context you can provide
Affects RAG (how many documents to include)
Determines conversation history length

Token Counting:

 1from transformers import GPT2Tokenizer
 2
 3tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
 4
 5text = "Hello, how are you today?"
 6tokens = tokenizer.encode(text)
 7
 8print(f"Text: {text}")
 9print(f"Tokens: {tokens}")
10print(f"Token count: {len(tokens)}")
11
12# Approximate: 1 token ≈ 0.75 words (English)
13# So 1,000 tokens ≈ 750 words

Handling Long Texts:

Chunking: Split into smaller pieces

 1def chunk_text(text, max_tokens=1000):
 2    tokens = tokenizer.encode(text)
 3    chunks = []
 4    
 5    for i in range(0, len(tokens), max_tokens):
 6        chunk_tokens = tokens[i:i+max_tokens]
 7        chunk_text = tokenizer.decode(chunk_tokens)
 8        chunks.append(chunk_text)
 9    
10    return chunks

Summarization: Summarize long context

 1def summarize_long_text(text):
 2    chunks = chunk_text(text)
 3    summaries = []
 4    
 5    for chunk in chunks:
 6        summary = llm.generate(f"Summarize: {chunk}")
 7        summaries.append(summary)
 8    
 9    # Combine summaries
10    combined = " ".join(summaries)
11    
12    # Final summary if still too long
13    if len(tokenizer.encode(combined)) > max_tokens:
14        return llm.generate(f"Summarize: {combined}")
15    
16    return combined

Sliding Window: Process with overlap
Hierarchical: Summarize sections, then combine

Q8: What are embeddings in NLP?

Answer:

Definition: Dense vector representations of text that capture semantic meaning.

Key Property: Similar meanings → similar vectors

Example:

1"king" → [0.2, 0.5, -0.1, ...]
2"queen" → [0.3, 0.4, -0.2, ...]
3"car" → [-0.5, 0.1, 0.8, ...]

"king" and "queen" are closer than "king" and "car"

How to Get Embeddings:

 1from sentence_transformers import SentenceTransformer
 2
 3model = SentenceTransformer('all-MiniLM-L6-v2')
 4
 5# Get embeddings
 6texts = ["I love programming", "Coding is fun", "I hate bugs"]
 7embeddings = model.encode(texts)
 8
 9print(f"Shape: {embeddings.shape}")  # (3, 384)
10
11# Calculate similarity
12from sklearn.metrics.pairwise import cosine_similarity
13
14similarities = cosine_similarity(embeddings)
15print(similarities)
16# [[1.0, 0.8, 0.3],   # "love programming" similar to "coding is fun"
17#  [0.8, 1.0, 0.2],
18#  [0.3, 0.2, 1.0]]

Use Cases:

Semantic search: Find similar documents
RAG: Retrieve relevant context
Clustering: Group similar texts
Classification: Use as features

Popular Models:

sentence-transformers (SBERT)
OpenAI embeddings (text-embedding-ada-002)
Cohere embeddings
Google Universal Sentence Encoder

Q9: What is fine-tuning vs. prompting?

Answer:

Prompting (In-Context Learning)

What: Provide examples/instructions in prompt, no model changes

Pros:

No training needed
Instant
Flexible (change anytime)
No data labeling

Cons:

Limited by context window
Less consistent
Higher inference cost (longer prompts)

Example:

1prompt = """You are a customer service bot. Be polite and helpful.
2
3User: I want a refund!
4Bot: I understand your frustration. Let me help you with that refund.
5
6User: This product is broken!
7Bot:"""
8
9response = llm.generate(prompt)

Fine-Tuning

What: Update model weights on task-specific data

Pros:

Better performance
More consistent
Shorter prompts (lower cost)
Can learn new knowledge

Cons:

Requires labeled data
Training time/cost
Less flexible
May forget general knowledge

Example:

 1from transformers import Trainer, TrainingArguments
 2
 3# Prepare dataset
 4train_dataset = [
 5    {"input": "User: I want a refund!", "output": "I understand..."},
 6    {"input": "User: This is broken!", "output": "I'm sorry..."},
 7    # ... more examples
 8]
 9
10# Fine-tune
11training_args = TrainingArguments(
12    output_dir="./results",
13    num_train_epochs=3,
14    per_device_train_batch_size=4,
15    learning_rate=2e-5
16)
17
18trainer = Trainer(
19    model=model,
20    args=training_args,
21    train_dataset=train_dataset
22)
23
24trainer.train()

When to Use Each

Use Prompting:

Quick prototyping
Few examples available
Task changes frequently
General-purpose use

Use Fine-Tuning:

Have lots of labeled data (1000+)
Need consistent behavior
Specific domain/style
High-volume inference (cost savings)

Q10: What is chain-of-thought prompting?

Answer:

Definition: Ask LLM to show step-by-step reasoning before answering.

Why It Works: Breaking down complex problems improves accuracy.

Basic Example:

 1Q: Roger has 5 tennis balls. He buys 2 more cans of 3 balls each. 
 2   How many balls does he have?
 3
 4Without CoT:
 5A: 11
 6
 7With CoT:
 8A: Let's think step by step:
 91. Roger starts with 5 balls
102. He buys 2 cans
113. Each can has 3 balls
124. So he buys 2 × 3 = 6 balls
135. Total: 5 + 6 = 11 balls
14
15Answer: 11

Implementation:

 1def chain_of_thought(question):
 2    prompt = f"""{question}
 3
 4Let's solve this step by step:
 51."""
 6    
 7    return llm.generate(prompt)
 8
 9# Usage
10question = "If a train travels 60 mph for 2.5 hours, how far does it go?"
11answer = chain_of_thought(question)

Variants:

Zero-Shot CoT: Just add "Let's think step by step"

1prompt = f"{question}\n\nLet's think step by step:"

Few-Shot CoT: Provide examples with reasoning

1prompt = """Q: If 3 apples cost $6, how much do 5 apples cost?
2A: Let's think:
31. 3 apples = $6
42. 1 apple = $6 / 3 = $2
53. 5 apples = 5 × $2 = $10
6
7Q: {new_question}
8A: Let's think:"""

Self-Consistency: Generate multiple reasoning paths, pick most common answer

Benefits:

Better accuracy on math/logic problems
Interpretable (can see reasoning)
Catches mistakes in reasoning

When to Use:

Math problems
Logic puzzles
Multi-step reasoning
Complex questions

Summary

Key LLM/Agent concepts:

LLMs: Large models that predict next tokens
Prompting: Craft inputs to guide outputs
Temperature: Control randomness
Agents: LLMs that take actions
RAG: Retrieve context for better answers
Embeddings: Vector representations of text
Context Window: Token limit
Fine-tuning vs. Prompting: Training vs. in-context learning
Chain-of-Thought: Step-by-step reasoning

Related Snippets

AI/ML Interview Questions - Easy
Easy-level AI/ML interview questions with LangChain examples and Mermaid …
AI/ML Interview Questions - Hard
Hard-level AI/ML interview questions covering advanced architectures, …
AI/ML Interview Questions - Medium
Medium-level AI/ML interview questions covering neural networks, ensemble …
LLM/Agentic AI Interview Questions - Hard
Hard-level LLM and Agentic AI interview questions covering multi-agent systems, …
LLM/Agentic AI Interview Questions - Medium
Medium-level LLM and Agentic AI interview questions covering agent …