LLM/Agentic AI Interview Questions - Easy

Easy-level LLM and Agentic AI interview questions covering fundamentals, prompting, and basic agent concepts.

Q1: What is a Large Language Model (LLM)?

Answer:

Definition: Neural network trained on massive text data to predict next tokens, enabling text generation and understanding.

Key Characteristics:

  • Large: Billions of parameters (GPT-3: 175B, GPT-4: estimated 1.7T)
  • Transformer-based: Uses self-attention mechanism
  • Pre-trained: Trained on diverse internet text
  • Few-shot learning: Can perform tasks with minimal examples

How It Works:

  1. Input text → tokenized into pieces
  2. Each token converted to embedding vector
  3. Transformer layers process with attention
  4. Output: probability distribution over next tokens
  5. Sample or pick most likely token
  6. Repeat autoregressively

LangChain Example:

 1from langchain.llms import OpenAI
 2from langchain.prompts import PromptTemplate
 3from langchain.chains import LLMChain
 4
 5# Initialize LLM
 6llm = OpenAI(temperature=0.7, model_name="gpt-3.5-turbo")
 7
 8# Create prompt template
 9template = "The future of {topic} is"
10prompt = PromptTemplate(template=template, input_variables=["topic"])
11
12# Create chain
13chain = LLMChain(llm=llm, prompt=prompt)
14
15# Generate
16result = chain.run(topic="AI")
17print(result)

Use Cases: Text generation, summarization, translation, Q&A, code generation


Q2: What is prompt engineering?

Answer:

Definition: Crafting input text (prompts) to get desired outputs from LLMs.

Why It Matters: LLMs are sensitive to how questions are phrased.

Basic Techniques:

1. Zero-Shot

Just ask directly:

1Classify sentiment: "I love this product!"

2. Few-Shot

Provide examples:

1Classify sentiment:
2"Great service!" → Positive
3"Terrible experience." → Negative
4"It's okay." → Neutral
5"I love this product!" → 

3. Chain-of-Thought

Ask for step-by-step reasoning:

1Q: Roger has 5 tennis balls. He buys 2 more cans of 3 balls each. How many balls does he have?
2A: Let's think step by step:
31. Roger starts with 5 balls
42. He buys 2 cans with 3 balls each: 2 × 3 = 6 balls
53. Total: 5 + 6 = 11 balls

4. Role-Playing

Give the model a role:

1You are an expert Python developer. Explain list comprehensions to a beginner.

5. Constraints

Specify format/length:

1Summarize this article in exactly 3 bullet points.

Best Practices:

  • Be specific and clear
  • Provide context
  • Use examples when possible
  • Iterate and refine
  • Test different phrasings

LangChain Implementation:

 1from langchain.prompts import (
 2    FewShotPromptTemplate,
 3    PromptTemplate,
 4    ChatPromptTemplate,
 5    SystemMessagePromptTemplate,
 6    HumanMessagePromptTemplate
 7)
 8from langchain.llms import OpenAI
 9
10# Few-Shot Example
11examples = [
12    {"input": "What's 2+2?", "output": "4"},
13    {"input": "What's 5*3?", "output": "15"}
14]
15
16example_prompt = PromptTemplate(
17    input_variables=["input", "output"],
18    template="Input: {input}\nOutput: {output}"
19)
20
21few_shot_prompt = FewShotPromptTemplate(
22    examples=examples,
23    example_prompt=example_prompt,
24    prefix="You are a helpful math tutor.",
25    suffix="Input: {input}\nOutput:",
26    input_variables=["input"]
27)
28
29llm = OpenAI(temperature=0)
30result = llm(few_shot_prompt.format(input="What's 7+8?"))
31print(result)  # Output: 15
32
33# Role-Playing with System Message
34system_template = "You are an expert {role}. Explain {topic} to a beginner."
35system_prompt = SystemMessagePromptTemplate.from_template(system_template)
36
37human_template = "{question}"
38human_prompt = HumanMessagePromptTemplate.from_template(human_template)
39
40chat_prompt = ChatPromptTemplate.from_messages([system_prompt, human_prompt])
41
42from langchain.chat_models import ChatOpenAI
43chat = ChatOpenAI()
44
45result = chat(chat_prompt.format_messages(
46    role="Python developer",
47    topic="list comprehensions",
48    question="How do list comprehensions work?"
49))

Q3: What is temperature in LLM generation?

Answer:

Definition: Parameter controlling randomness of token selection.

How It Works:

Before sampling, divide logits by temperature: $$ p_i = \frac{e^{z_i / T}}{\sum_j e^{z_j / T}} $$

where $T$ is temperature.

Effects:

Temperature = 0: Greedy (always pick most likely)

1"The sky is" → "blue" (deterministic)

Temperature = 0.7 (default): Balanced creativity

1"The sky is" → "blue" or "clear" or "bright"

Temperature = 1.5: Very creative/random

1"The sky is" → "purple" or "singing" or "infinite"

Implementation:

 1import torch
 2import torch.nn.functional as F
 3
 4def sample_with_temperature(logits, temperature=1.0):
 5    """Sample next token with temperature"""
 6    # Apply temperature
 7    logits = logits / temperature
 8    
 9    # Convert to probabilities
10    probs = F.softmax(logits, dim=-1)
11    
12    # Sample
13    next_token = torch.multinomial(probs, num_samples=1)
14    
15    return next_token
16
17# Example
18logits = torch.tensor([2.0, 1.0, 0.5, 0.1])
19
20print("Temperature 0.1 (focused):")
21for _ in range(5):
22    token = sample_with_temperature(logits, temperature=0.1)
23    print(token.item())
24
25print("\nTemperature 2.0 (creative):")
26for _ in range(5):
27    token = sample_with_temperature(logits, temperature=2.0)
28    print(token.item())

When to Use:

  • Low (0.1-0.5): Factual tasks, code generation, translation
  • Medium (0.7-1.0): General chat, creative writing
  • High (1.5-2.0): Brainstorming, poetry, experimental

Q4: What is an AI agent?

Answer:

Definition: System that perceives environment, makes decisions, and takes actions to achieve goals.

LLM-Based Agent: Uses LLM as reasoning engine to decide actions.

Core Components:

  1. Perception: Observe environment (user input, tool outputs)
  2. Reasoning: LLM decides what to do
  3. Action: Execute tools/functions
  4. Memory: Remember past interactions

Simple Agent Loop:

1while not done:
2    observation = get_observation()
3    action = llm.decide_action(observation, memory)
4    result = execute_action(action)
5    memory.add(observation, action, result)
6    done = check_if_goal_achieved()

Example Implementation:

 1class SimpleAgent:
 2    def __init__(self, llm, tools):
 3        self.llm = llm
 4        self.tools = tools
 5        self.memory = []
 6    
 7    def run(self, task, max_steps=10):
 8        for step in range(max_steps):
 9            # Create prompt with task and available tools
10            prompt = self._create_prompt(task)
11            
12            # LLM decides next action
13            response = self.llm.generate(prompt)
14            action, args = self._parse_action(response)
15            
16            # Execute action
17            if action == "FINISH":
18                return args["answer"]
19            
20            result = self.tools[action](**args)
21            
22            # Update memory
23            self.memory.append({
24                "action": action,
25                "args": args,
26                "result": result
27            })
28        
29        return "Max steps reached"
30    
31    def _create_prompt(self, task):
32        prompt = f"Task: {task}\n\n"
33        prompt += "Available tools:\n"
34        for name, tool in self.tools.items():
35            prompt += f"- {name}: {tool.__doc__}\n"
36        
37        if self.memory:
38            prompt += "\nPrevious actions:\n"
39            for mem in self.memory:
40                prompt += f"{mem['action']}({mem['args']}) → {mem['result']}\n"
41        
42        prompt += "\nWhat should I do next? (respond with action and args)"
43        return prompt
44    
45    def _parse_action(self, response):
46        # Parse LLM response to extract action and arguments
47        # Simplified - real implementation would be more robust
48        lines = response.strip().split('\n')
49        action = lines[0].split(':')[1].strip()
50        args = eval(lines[1].split(':')[1].strip())
51        return action, args
52
53# Usage
54def search_web(query):
55    """Search the web for information"""
56    return f"Search results for: {query}"
57
58def calculate(expression):
59    """Calculate mathematical expression"""
60    return eval(expression)
61
62tools = {
63    "search": search_web,
64    "calculate": calculate,
65    "FINISH": lambda answer: answer
66}
67
68agent = SimpleAgent(llm=my_llm, tools=tools)
69result = agent.run("What is 15% of 240?")

Types of Agents:

  • ReAct: Reasoning + Acting (think, then act)
  • Tool-using: Can call external functions
  • Conversational: Maintains dialogue context
  • Multi-agent: Multiple agents collaborate

Q5: What is Retrieval-Augmented Generation (RAG)?

Answer:

Definition: Enhance LLM responses by retrieving relevant information from external knowledge base.

Problem RAG Solves:

  • LLMs have knowledge cutoff date
  • Can't access private/proprietary data
  • May hallucinate facts

How RAG Works:

  1. Index: Embed documents into vector database
  2. Retrieve: Find relevant docs for query
  3. Augment: Add retrieved docs to prompt
  4. Generate: LLM answers using retrieved context

LangChain Implementation:

 1from langchain.embeddings import OpenAIEmbeddings
 2from langchain.vectorstores import FAISS
 3from langchain.text_splitter import CharacterTextSplitter
 4from langchain.chains import RetrievalQA
 5from langchain.llms import OpenAI
 6from langchain.document_loaders import TextLoader
 7
 8# Load documents
 9loader = TextLoader('documents.txt')
10documents = loader.load()
11
12# Split into chunks
13text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
14texts = text_splitter.split_documents(documents)
15
16# Create embeddings and vector store
17embeddings = OpenAIEmbeddings()
18vectorstore = FAISS.from_documents(texts, embeddings)
19
20# Create RAG chain
21llm = OpenAI(temperature=0)
22qa_chain = RetrievalQA.from_chain_type(
23    llm=llm,
24    chain_type="stuff",
25    retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
26    return_source_documents=True
27)
28
29# Query
30query = "Who created Python?"
31result = qa_chain({"query": query})
32
33print(f"Answer: {result['result']}")
34print(f"\nSources:")
35for doc in result['source_documents']:
36    print(f"- {doc.page_content[:100]}...")
37
38# Alternative: Custom RAG with more control
39from langchain.chains import LLMChain
40from langchain.prompts import PromptTemplate
41
42# Custom prompt template
43template = """Use the following pieces of context to answer the question at the end.
44If you don't know the answer, just say that you don't know, don't try to make up an answer.
45
46Context:
47{context}
48
49Question: {question}
50
51Answer:"""
52
53prompt = PromptTemplate(template=template, input_variables=["context", "question"])
54
55# Retrieval + Generation
56def rag_query(question, top_k=3):
57    # Retrieve
58    docs = vectorstore.similarity_search(question, k=top_k)
59    context = "\n\n".join([doc.page_content for doc in docs])
60    
61    # Generate
62    chain = LLMChain(llm=llm, prompt=prompt)
63    answer = chain.run(context=context, question=question)
64    
65    return answer, docs
66
67answer, sources = rag_query("What is Python known for?")
68print(answer)

Key Components:

  • Embedding Model: Convert text to vectors (OpenAI, Cohere, HuggingFace)
  • Vector Database: Store and search embeddings (FAISS, Pinecone, Weaviate, Chroma)
  • Retrieval Strategy: Similarity search, MMR, threshold filtering
  • Prompt Template: How to format context + query

Benefits:

  • Up-to-date information
  • Access to private data
  • Reduced hallucinations
  • Citable sources

Q6: What is few-shot learning in LLMs?

Answer:

Definition: LLM learns task from just a few examples in the prompt (no fine-tuning).

How It Works: LLM recognizes pattern from examples and applies to new input.

Example:

 1def few_shot_classification(text, examples):
 2    """Classify text using few-shot learning"""
 3    prompt = "Classify the sentiment:\n\n"
 4    
 5    # Add examples
 6    for example_text, label in examples:
 7        prompt += f'Text: "{example_text}"\nSentiment: {label}\n\n'
 8    
 9    # Add query
10    prompt += f'Text: "{text}"\nSentiment:'
11    
12    return llm.generate(prompt)
13
14# Usage
15examples = [
16    ("I love this product!", "Positive"),
17    ("Terrible service.", "Negative"),
18    ("It's okay, nothing special.", "Neutral")
19]
20
21result = few_shot_classification("This is amazing!", examples)
22# Output: "Positive"

Variants:

Zero-Shot: No examples

1Translate to French: "Hello"

One-Shot: One example

1Translate to French:
2English: "Goodbye" → French: "Au revoir"
3English: "Hello" → French:

Few-Shot: Multiple examples (typically 3-10)

Why It Works:

  • LLM learned patterns during pre-training
  • Examples activate relevant knowledge
  • In-context learning (no weight updates)

Best Practices:

  • Use diverse, representative examples
  • Order matters (put similar examples last)
  • More examples = better (but limited by context window)
  • Balance classes in classification tasks

Q7: What is the context window in LLMs?

Answer:

Definition: Maximum number of tokens LLM can process at once (input + output).

Examples:

  • GPT-3.5: 4,096 tokens (~3,000 words)
  • GPT-4: 8,192 or 32,768 tokens
  • Claude 2: 100,000 tokens
  • GPT-4 Turbo: 128,000 tokens

Why It Matters:

  • Limits how much context you can provide
  • Affects RAG (how many documents to include)
  • Determines conversation history length

Token Counting:

 1from transformers import GPT2Tokenizer
 2
 3tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
 4
 5text = "Hello, how are you today?"
 6tokens = tokenizer.encode(text)
 7
 8print(f"Text: {text}")
 9print(f"Tokens: {tokens}")
10print(f"Token count: {len(tokens)}")
11
12# Approximate: 1 token ≈ 0.75 words (English)
13# So 1,000 tokens ≈ 750 words

Handling Long Texts:

  1. Chunking: Split into smaller pieces
 1def chunk_text(text, max_tokens=1000):
 2    tokens = tokenizer.encode(text)
 3    chunks = []
 4    
 5    for i in range(0, len(tokens), max_tokens):
 6        chunk_tokens = tokens[i:i+max_tokens]
 7        chunk_text = tokenizer.decode(chunk_tokens)
 8        chunks.append(chunk_text)
 9    
10    return chunks
  1. Summarization: Summarize long context
 1def summarize_long_text(text):
 2    chunks = chunk_text(text)
 3    summaries = []
 4    
 5    for chunk in chunks:
 6        summary = llm.generate(f"Summarize: {chunk}")
 7        summaries.append(summary)
 8    
 9    # Combine summaries
10    combined = " ".join(summaries)
11    
12    # Final summary if still too long
13    if len(tokenizer.encode(combined)) > max_tokens:
14        return llm.generate(f"Summarize: {combined}")
15    
16    return combined
  1. Sliding Window: Process with overlap
  2. Hierarchical: Summarize sections, then combine

Q8: What are embeddings in NLP?

Answer:

Definition: Dense vector representations of text that capture semantic meaning.

Key Property: Similar meanings → similar vectors

Example:

1"king" → [0.2, 0.5, -0.1, ...]
2"queen" → [0.3, 0.4, -0.2, ...]
3"car" → [-0.5, 0.1, 0.8, ...]

"king" and "queen" are closer than "king" and "car"

How to Get Embeddings:

 1from sentence_transformers import SentenceTransformer
 2
 3model = SentenceTransformer('all-MiniLM-L6-v2')
 4
 5# Get embeddings
 6texts = ["I love programming", "Coding is fun", "I hate bugs"]
 7embeddings = model.encode(texts)
 8
 9print(f"Shape: {embeddings.shape}")  # (3, 384)
10
11# Calculate similarity
12from sklearn.metrics.pairwise import cosine_similarity
13
14similarities = cosine_similarity(embeddings)
15print(similarities)
16# [[1.0, 0.8, 0.3],   # "love programming" similar to "coding is fun"
17#  [0.8, 1.0, 0.2],
18#  [0.3, 0.2, 1.0]]

Use Cases:

  • Semantic search: Find similar documents
  • RAG: Retrieve relevant context
  • Clustering: Group similar texts
  • Classification: Use as features

Popular Models:

  • sentence-transformers (SBERT)
  • OpenAI embeddings (text-embedding-ada-002)
  • Cohere embeddings
  • Google Universal Sentence Encoder

Q9: What is fine-tuning vs. prompting?

Answer:

Prompting (In-Context Learning)

What: Provide examples/instructions in prompt, no model changes

Pros:

  • No training needed
  • Instant
  • Flexible (change anytime)
  • No data labeling

Cons:

  • Limited by context window
  • Less consistent
  • Higher inference cost (longer prompts)

Example:

1prompt = """You are a customer service bot. Be polite and helpful.
2
3User: I want a refund!
4Bot: I understand your frustration. Let me help you with that refund.
5
6User: This product is broken!
7Bot:"""
8
9response = llm.generate(prompt)

Fine-Tuning

What: Update model weights on task-specific data

Pros:

  • Better performance
  • More consistent
  • Shorter prompts (lower cost)
  • Can learn new knowledge

Cons:

  • Requires labeled data
  • Training time/cost
  • Less flexible
  • May forget general knowledge

Example:

 1from transformers import Trainer, TrainingArguments
 2
 3# Prepare dataset
 4train_dataset = [
 5    {"input": "User: I want a refund!", "output": "I understand..."},
 6    {"input": "User: This is broken!", "output": "I'm sorry..."},
 7    # ... more examples
 8]
 9
10# Fine-tune
11training_args = TrainingArguments(
12    output_dir="./results",
13    num_train_epochs=3,
14    per_device_train_batch_size=4,
15    learning_rate=2e-5
16)
17
18trainer = Trainer(
19    model=model,
20    args=training_args,
21    train_dataset=train_dataset
22)
23
24trainer.train()

When to Use Each

Use Prompting:

  • Quick prototyping
  • Few examples available
  • Task changes frequently
  • General-purpose use

Use Fine-Tuning:

  • Have lots of labeled data (1000+)
  • Need consistent behavior
  • Specific domain/style
  • High-volume inference (cost savings)

Q10: What is chain-of-thought prompting?

Answer:

Definition: Ask LLM to show step-by-step reasoning before answering.

Why It Works: Breaking down complex problems improves accuracy.

Basic Example:

 1Q: Roger has 5 tennis balls. He buys 2 more cans of 3 balls each. 
 2   How many balls does he have?
 3
 4Without CoT:
 5A: 11
 6
 7With CoT:
 8A: Let's think step by step:
 91. Roger starts with 5 balls
102. He buys 2 cans
113. Each can has 3 balls
124. So he buys 2 × 3 = 6 balls
135. Total: 5 + 6 = 11 balls
14
15Answer: 11

Implementation:

 1def chain_of_thought(question):
 2    prompt = f"""{question}
 3
 4Let's solve this step by step:
 51."""
 6    
 7    return llm.generate(prompt)
 8
 9# Usage
10question = "If a train travels 60 mph for 2.5 hours, how far does it go?"
11answer = chain_of_thought(question)

Variants:

Zero-Shot CoT: Just add "Let's think step by step"

1prompt = f"{question}\n\nLet's think step by step:"

Few-Shot CoT: Provide examples with reasoning

1prompt = """Q: If 3 apples cost $6, how much do 5 apples cost?
2A: Let's think:
31. 3 apples = $6
42. 1 apple = $6 / 3 = $2
53. 5 apples = 5 × $2 = $10
6
7Q: {new_question}
8A: Let's think:"""

Self-Consistency: Generate multiple reasoning paths, pick most common answer

Benefits:

  • Better accuracy on math/logic problems
  • Interpretable (can see reasoning)
  • Catches mistakes in reasoning

When to Use:

  • Math problems
  • Logic puzzles
  • Multi-step reasoning
  • Complex questions

Summary

Key LLM/Agent concepts:

  • LLMs: Large models that predict next tokens
  • Prompting: Craft inputs to guide outputs
  • Temperature: Control randomness
  • Agents: LLMs that take actions
  • RAG: Retrieve context for better answers
  • Embeddings: Vector representations of text
  • Context Window: Token limit
  • Fine-tuning vs. Prompting: Training vs. in-context learning
  • Chain-of-Thought: Step-by-step reasoning

Related Snippets