LLM/Agentic AI Interview Questions - Easy
Easy-level LLM and Agentic AI interview questions covering fundamentals, prompting, and basic agent concepts.
Q1: What is a Large Language Model (LLM)?
Answer:
Definition: Neural network trained on massive text data to predict next tokens, enabling text generation and understanding.
Key Characteristics:
- Large: Billions of parameters (GPT-3: 175B, GPT-4: estimated 1.7T)
- Transformer-based: Uses self-attention mechanism
- Pre-trained: Trained on diverse internet text
- Few-shot learning: Can perform tasks with minimal examples
How It Works:
- Input text → tokenized into pieces
- Each token converted to embedding vector
- Transformer layers process with attention
- Output: probability distribution over next tokens
- Sample or pick most likely token
- Repeat autoregressively
LangChain Example:
1from langchain.llms import OpenAI
2from langchain.prompts import PromptTemplate
3from langchain.chains import LLMChain
4
5# Initialize LLM
6llm = OpenAI(temperature=0.7, model_name="gpt-3.5-turbo")
7
8# Create prompt template
9template = "The future of {topic} is"
10prompt = PromptTemplate(template=template, input_variables=["topic"])
11
12# Create chain
13chain = LLMChain(llm=llm, prompt=prompt)
14
15# Generate
16result = chain.run(topic="AI")
17print(result)
Use Cases: Text generation, summarization, translation, Q&A, code generation
Q2: What is prompt engineering?
Answer:
Definition: Crafting input text (prompts) to get desired outputs from LLMs.
Why It Matters: LLMs are sensitive to how questions are phrased.
Basic Techniques:
1. Zero-Shot
Just ask directly:
1Classify sentiment: "I love this product!"
2. Few-Shot
Provide examples:
1Classify sentiment:
2"Great service!" → Positive
3"Terrible experience." → Negative
4"It's okay." → Neutral
5"I love this product!" →
3. Chain-of-Thought
Ask for step-by-step reasoning:
1Q: Roger has 5 tennis balls. He buys 2 more cans of 3 balls each. How many balls does he have?
2A: Let's think step by step:
31. Roger starts with 5 balls
42. He buys 2 cans with 3 balls each: 2 × 3 = 6 balls
53. Total: 5 + 6 = 11 balls
4. Role-Playing
Give the model a role:
1You are an expert Python developer. Explain list comprehensions to a beginner.
5. Constraints
Specify format/length:
1Summarize this article in exactly 3 bullet points.
Best Practices:
- Be specific and clear
- Provide context
- Use examples when possible
- Iterate and refine
- Test different phrasings
LangChain Implementation:
1from langchain.prompts import (
2 FewShotPromptTemplate,
3 PromptTemplate,
4 ChatPromptTemplate,
5 SystemMessagePromptTemplate,
6 HumanMessagePromptTemplate
7)
8from langchain.llms import OpenAI
9
10# Few-Shot Example
11examples = [
12 {"input": "What's 2+2?", "output": "4"},
13 {"input": "What's 5*3?", "output": "15"}
14]
15
16example_prompt = PromptTemplate(
17 input_variables=["input", "output"],
18 template="Input: {input}\nOutput: {output}"
19)
20
21few_shot_prompt = FewShotPromptTemplate(
22 examples=examples,
23 example_prompt=example_prompt,
24 prefix="You are a helpful math tutor.",
25 suffix="Input: {input}\nOutput:",
26 input_variables=["input"]
27)
28
29llm = OpenAI(temperature=0)
30result = llm(few_shot_prompt.format(input="What's 7+8?"))
31print(result) # Output: 15
32
33# Role-Playing with System Message
34system_template = "You are an expert {role}. Explain {topic} to a beginner."
35system_prompt = SystemMessagePromptTemplate.from_template(system_template)
36
37human_template = "{question}"
38human_prompt = HumanMessagePromptTemplate.from_template(human_template)
39
40chat_prompt = ChatPromptTemplate.from_messages([system_prompt, human_prompt])
41
42from langchain.chat_models import ChatOpenAI
43chat = ChatOpenAI()
44
45result = chat(chat_prompt.format_messages(
46 role="Python developer",
47 topic="list comprehensions",
48 question="How do list comprehensions work?"
49))
Q3: What is temperature in LLM generation?
Answer:
Definition: Parameter controlling randomness of token selection.
How It Works:
Before sampling, divide logits by temperature: $$ p_i = \frac{e^{z_i / T}}{\sum_j e^{z_j / T}} $$
where $T$ is temperature.
Effects:
Temperature = 0: Greedy (always pick most likely)
1"The sky is" → "blue" (deterministic)
Temperature = 0.7 (default): Balanced creativity
1"The sky is" → "blue" or "clear" or "bright"
Temperature = 1.5: Very creative/random
1"The sky is" → "purple" or "singing" or "infinite"
Implementation:
1import torch
2import torch.nn.functional as F
3
4def sample_with_temperature(logits, temperature=1.0):
5 """Sample next token with temperature"""
6 # Apply temperature
7 logits = logits / temperature
8
9 # Convert to probabilities
10 probs = F.softmax(logits, dim=-1)
11
12 # Sample
13 next_token = torch.multinomial(probs, num_samples=1)
14
15 return next_token
16
17# Example
18logits = torch.tensor([2.0, 1.0, 0.5, 0.1])
19
20print("Temperature 0.1 (focused):")
21for _ in range(5):
22 token = sample_with_temperature(logits, temperature=0.1)
23 print(token.item())
24
25print("\nTemperature 2.0 (creative):")
26for _ in range(5):
27 token = sample_with_temperature(logits, temperature=2.0)
28 print(token.item())
When to Use:
- Low (0.1-0.5): Factual tasks, code generation, translation
- Medium (0.7-1.0): General chat, creative writing
- High (1.5-2.0): Brainstorming, poetry, experimental
Q4: What is an AI agent?
Answer:
Definition: System that perceives environment, makes decisions, and takes actions to achieve goals.
LLM-Based Agent: Uses LLM as reasoning engine to decide actions.
Core Components:
- Perception: Observe environment (user input, tool outputs)
- Reasoning: LLM decides what to do
- Action: Execute tools/functions
- Memory: Remember past interactions
Simple Agent Loop:
1while not done:
2 observation = get_observation()
3 action = llm.decide_action(observation, memory)
4 result = execute_action(action)
5 memory.add(observation, action, result)
6 done = check_if_goal_achieved()
Example Implementation:
1class SimpleAgent:
2 def __init__(self, llm, tools):
3 self.llm = llm
4 self.tools = tools
5 self.memory = []
6
7 def run(self, task, max_steps=10):
8 for step in range(max_steps):
9 # Create prompt with task and available tools
10 prompt = self._create_prompt(task)
11
12 # LLM decides next action
13 response = self.llm.generate(prompt)
14 action, args = self._parse_action(response)
15
16 # Execute action
17 if action == "FINISH":
18 return args["answer"]
19
20 result = self.tools[action](**args)
21
22 # Update memory
23 self.memory.append({
24 "action": action,
25 "args": args,
26 "result": result
27 })
28
29 return "Max steps reached"
30
31 def _create_prompt(self, task):
32 prompt = f"Task: {task}\n\n"
33 prompt += "Available tools:\n"
34 for name, tool in self.tools.items():
35 prompt += f"- {name}: {tool.__doc__}\n"
36
37 if self.memory:
38 prompt += "\nPrevious actions:\n"
39 for mem in self.memory:
40 prompt += f"{mem['action']}({mem['args']}) → {mem['result']}\n"
41
42 prompt += "\nWhat should I do next? (respond with action and args)"
43 return prompt
44
45 def _parse_action(self, response):
46 # Parse LLM response to extract action and arguments
47 # Simplified - real implementation would be more robust
48 lines = response.strip().split('\n')
49 action = lines[0].split(':')[1].strip()
50 args = eval(lines[1].split(':')[1].strip())
51 return action, args
52
53# Usage
54def search_web(query):
55 """Search the web for information"""
56 return f"Search results for: {query}"
57
58def calculate(expression):
59 """Calculate mathematical expression"""
60 return eval(expression)
61
62tools = {
63 "search": search_web,
64 "calculate": calculate,
65 "FINISH": lambda answer: answer
66}
67
68agent = SimpleAgent(llm=my_llm, tools=tools)
69result = agent.run("What is 15% of 240?")
Types of Agents:
- ReAct: Reasoning + Acting (think, then act)
- Tool-using: Can call external functions
- Conversational: Maintains dialogue context
- Multi-agent: Multiple agents collaborate
Q5: What is Retrieval-Augmented Generation (RAG)?
Answer:
Definition: Enhance LLM responses by retrieving relevant information from external knowledge base.
Problem RAG Solves:
- LLMs have knowledge cutoff date
- Can't access private/proprietary data
- May hallucinate facts
How RAG Works:
- Index: Embed documents into vector database
- Retrieve: Find relevant docs for query
- Augment: Add retrieved docs to prompt
- Generate: LLM answers using retrieved context
LangChain Implementation:
1from langchain.embeddings import OpenAIEmbeddings
2from langchain.vectorstores import FAISS
3from langchain.text_splitter import CharacterTextSplitter
4from langchain.chains import RetrievalQA
5from langchain.llms import OpenAI
6from langchain.document_loaders import TextLoader
7
8# Load documents
9loader = TextLoader('documents.txt')
10documents = loader.load()
11
12# Split into chunks
13text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
14texts = text_splitter.split_documents(documents)
15
16# Create embeddings and vector store
17embeddings = OpenAIEmbeddings()
18vectorstore = FAISS.from_documents(texts, embeddings)
19
20# Create RAG chain
21llm = OpenAI(temperature=0)
22qa_chain = RetrievalQA.from_chain_type(
23 llm=llm,
24 chain_type="stuff",
25 retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
26 return_source_documents=True
27)
28
29# Query
30query = "Who created Python?"
31result = qa_chain({"query": query})
32
33print(f"Answer: {result['result']}")
34print(f"\nSources:")
35for doc in result['source_documents']:
36 print(f"- {doc.page_content[:100]}...")
37
38# Alternative: Custom RAG with more control
39from langchain.chains import LLMChain
40from langchain.prompts import PromptTemplate
41
42# Custom prompt template
43template = """Use the following pieces of context to answer the question at the end.
44If you don't know the answer, just say that you don't know, don't try to make up an answer.
45
46Context:
47{context}
48
49Question: {question}
50
51Answer:"""
52
53prompt = PromptTemplate(template=template, input_variables=["context", "question"])
54
55# Retrieval + Generation
56def rag_query(question, top_k=3):
57 # Retrieve
58 docs = vectorstore.similarity_search(question, k=top_k)
59 context = "\n\n".join([doc.page_content for doc in docs])
60
61 # Generate
62 chain = LLMChain(llm=llm, prompt=prompt)
63 answer = chain.run(context=context, question=question)
64
65 return answer, docs
66
67answer, sources = rag_query("What is Python known for?")
68print(answer)
Key Components:
- Embedding Model: Convert text to vectors (OpenAI, Cohere, HuggingFace)
- Vector Database: Store and search embeddings (FAISS, Pinecone, Weaviate, Chroma)
- Retrieval Strategy: Similarity search, MMR, threshold filtering
- Prompt Template: How to format context + query
Benefits:
- Up-to-date information
- Access to private data
- Reduced hallucinations
- Citable sources
Q6: What is few-shot learning in LLMs?
Answer:
Definition: LLM learns task from just a few examples in the prompt (no fine-tuning).
How It Works: LLM recognizes pattern from examples and applies to new input.
Example:
1def few_shot_classification(text, examples):
2 """Classify text using few-shot learning"""
3 prompt = "Classify the sentiment:\n\n"
4
5 # Add examples
6 for example_text, label in examples:
7 prompt += f'Text: "{example_text}"\nSentiment: {label}\n\n'
8
9 # Add query
10 prompt += f'Text: "{text}"\nSentiment:'
11
12 return llm.generate(prompt)
13
14# Usage
15examples = [
16 ("I love this product!", "Positive"),
17 ("Terrible service.", "Negative"),
18 ("It's okay, nothing special.", "Neutral")
19]
20
21result = few_shot_classification("This is amazing!", examples)
22# Output: "Positive"
Variants:
Zero-Shot: No examples
1Translate to French: "Hello"
One-Shot: One example
1Translate to French:
2English: "Goodbye" → French: "Au revoir"
3English: "Hello" → French:
Few-Shot: Multiple examples (typically 3-10)
Why It Works:
- LLM learned patterns during pre-training
- Examples activate relevant knowledge
- In-context learning (no weight updates)
Best Practices:
- Use diverse, representative examples
- Order matters (put similar examples last)
- More examples = better (but limited by context window)
- Balance classes in classification tasks
Q7: What is the context window in LLMs?
Answer:
Definition: Maximum number of tokens LLM can process at once (input + output).
Examples:
- GPT-3.5: 4,096 tokens (~3,000 words)
- GPT-4: 8,192 or 32,768 tokens
- Claude 2: 100,000 tokens
- GPT-4 Turbo: 128,000 tokens
Why It Matters:
- Limits how much context you can provide
- Affects RAG (how many documents to include)
- Determines conversation history length
Token Counting:
1from transformers import GPT2Tokenizer
2
3tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
4
5text = "Hello, how are you today?"
6tokens = tokenizer.encode(text)
7
8print(f"Text: {text}")
9print(f"Tokens: {tokens}")
10print(f"Token count: {len(tokens)}")
11
12# Approximate: 1 token ≈ 0.75 words (English)
13# So 1,000 tokens ≈ 750 words
Handling Long Texts:
- Chunking: Split into smaller pieces
1def chunk_text(text, max_tokens=1000):
2 tokens = tokenizer.encode(text)
3 chunks = []
4
5 for i in range(0, len(tokens), max_tokens):
6 chunk_tokens = tokens[i:i+max_tokens]
7 chunk_text = tokenizer.decode(chunk_tokens)
8 chunks.append(chunk_text)
9
10 return chunks
- Summarization: Summarize long context
1def summarize_long_text(text):
2 chunks = chunk_text(text)
3 summaries = []
4
5 for chunk in chunks:
6 summary = llm.generate(f"Summarize: {chunk}")
7 summaries.append(summary)
8
9 # Combine summaries
10 combined = " ".join(summaries)
11
12 # Final summary if still too long
13 if len(tokenizer.encode(combined)) > max_tokens:
14 return llm.generate(f"Summarize: {combined}")
15
16 return combined
- Sliding Window: Process with overlap
- Hierarchical: Summarize sections, then combine
Q8: What are embeddings in NLP?
Answer:
Definition: Dense vector representations of text that capture semantic meaning.
Key Property: Similar meanings → similar vectors
Example:
1"king" → [0.2, 0.5, -0.1, ...]
2"queen" → [0.3, 0.4, -0.2, ...]
3"car" → [-0.5, 0.1, 0.8, ...]
"king" and "queen" are closer than "king" and "car"
How to Get Embeddings:
1from sentence_transformers import SentenceTransformer
2
3model = SentenceTransformer('all-MiniLM-L6-v2')
4
5# Get embeddings
6texts = ["I love programming", "Coding is fun", "I hate bugs"]
7embeddings = model.encode(texts)
8
9print(f"Shape: {embeddings.shape}") # (3, 384)
10
11# Calculate similarity
12from sklearn.metrics.pairwise import cosine_similarity
13
14similarities = cosine_similarity(embeddings)
15print(similarities)
16# [[1.0, 0.8, 0.3], # "love programming" similar to "coding is fun"
17# [0.8, 1.0, 0.2],
18# [0.3, 0.2, 1.0]]
Use Cases:
- Semantic search: Find similar documents
- RAG: Retrieve relevant context
- Clustering: Group similar texts
- Classification: Use as features
Popular Models:
- sentence-transformers (SBERT)
- OpenAI embeddings (text-embedding-ada-002)
- Cohere embeddings
- Google Universal Sentence Encoder
Q9: What is fine-tuning vs. prompting?
Answer:
Prompting (In-Context Learning)
What: Provide examples/instructions in prompt, no model changes
Pros:
- No training needed
- Instant
- Flexible (change anytime)
- No data labeling
Cons:
- Limited by context window
- Less consistent
- Higher inference cost (longer prompts)
Example:
1prompt = """You are a customer service bot. Be polite and helpful.
2
3User: I want a refund!
4Bot: I understand your frustration. Let me help you with that refund.
5
6User: This product is broken!
7Bot:"""
8
9response = llm.generate(prompt)
Fine-Tuning
What: Update model weights on task-specific data
Pros:
- Better performance
- More consistent
- Shorter prompts (lower cost)
- Can learn new knowledge
Cons:
- Requires labeled data
- Training time/cost
- Less flexible
- May forget general knowledge
Example:
1from transformers import Trainer, TrainingArguments
2
3# Prepare dataset
4train_dataset = [
5 {"input": "User: I want a refund!", "output": "I understand..."},
6 {"input": "User: This is broken!", "output": "I'm sorry..."},
7 # ... more examples
8]
9
10# Fine-tune
11training_args = TrainingArguments(
12 output_dir="./results",
13 num_train_epochs=3,
14 per_device_train_batch_size=4,
15 learning_rate=2e-5
16)
17
18trainer = Trainer(
19 model=model,
20 args=training_args,
21 train_dataset=train_dataset
22)
23
24trainer.train()
When to Use Each
Use Prompting:
- Quick prototyping
- Few examples available
- Task changes frequently
- General-purpose use
Use Fine-Tuning:
- Have lots of labeled data (1000+)
- Need consistent behavior
- Specific domain/style
- High-volume inference (cost savings)
Q10: What is chain-of-thought prompting?
Answer:
Definition: Ask LLM to show step-by-step reasoning before answering.
Why It Works: Breaking down complex problems improves accuracy.
Basic Example:
1Q: Roger has 5 tennis balls. He buys 2 more cans of 3 balls each.
2 How many balls does he have?
3
4Without CoT:
5A: 11
6
7With CoT:
8A: Let's think step by step:
91. Roger starts with 5 balls
102. He buys 2 cans
113. Each can has 3 balls
124. So he buys 2 × 3 = 6 balls
135. Total: 5 + 6 = 11 balls
14
15Answer: 11
Implementation:
1def chain_of_thought(question):
2 prompt = f"""{question}
3
4Let's solve this step by step:
51."""
6
7 return llm.generate(prompt)
8
9# Usage
10question = "If a train travels 60 mph for 2.5 hours, how far does it go?"
11answer = chain_of_thought(question)
Variants:
Zero-Shot CoT: Just add "Let's think step by step"
1prompt = f"{question}\n\nLet's think step by step:"
Few-Shot CoT: Provide examples with reasoning
1prompt = """Q: If 3 apples cost $6, how much do 5 apples cost?
2A: Let's think:
31. 3 apples = $6
42. 1 apple = $6 / 3 = $2
53. 5 apples = 5 × $2 = $10
6
7Q: {new_question}
8A: Let's think:"""
Self-Consistency: Generate multiple reasoning paths, pick most common answer
Benefits:
- Better accuracy on math/logic problems
- Interpretable (can see reasoning)
- Catches mistakes in reasoning
When to Use:
- Math problems
- Logic puzzles
- Multi-step reasoning
- Complex questions
Summary
Key LLM/Agent concepts:
- LLMs: Large models that predict next tokens
- Prompting: Craft inputs to guide outputs
- Temperature: Control randomness
- Agents: LLMs that take actions
- RAG: Retrieve context for better answers
- Embeddings: Vector representations of text
- Context Window: Token limit
- Fine-tuning vs. Prompting: Training vs. in-context learning
- Chain-of-Thought: Step-by-step reasoning
Related Snippets
- AI/ML Interview Questions - Easy
Easy-level AI/ML interview questions with LangChain examples and Mermaid … - AI/ML Interview Questions - Hard
Hard-level AI/ML interview questions covering advanced architectures, … - AI/ML Interview Questions - Medium
Medium-level AI/ML interview questions covering neural networks, ensemble … - LLM/Agentic AI Interview Questions - Hard
Hard-level LLM and Agentic AI interview questions covering multi-agent systems, … - LLM/Agentic AI Interview Questions - Medium
Medium-level LLM and Agentic AI interview questions covering agent …