Research Interview Questions - Hard

Hard-level research interview questions covering advanced methodologies and complex analysis.

Q1: Explain Bayesian vs. Frequentist approaches to statistics.

Answer:

Bayes' Theorem: $$P(\theta|D) = \frac{P(D|\theta) \times P(\theta)}{P(D)}$$

When to Use:

  • Frequentist: Large samples, no prior knowledge
  • Bayesian: Small samples, incorporate prior knowledge, sequential updating

Q2: Design a randomized controlled trial with complex interventions.

Answer:

Cluster Randomized Trial

Intraclass Correlation (ICC): Similarity within clusters

  • Requires larger sample size than individual randomization
  • Design effect = 1 + (m-1) × ICC

Stepped-Wedge Design

Advantages: Ethical (all get treatment), controls for time trends Disadvantages: Complex analysis, longer duration


Q3: Explain structural equation modeling (SEM).

Answer:

Path Diagram

Fit Indices:

  • CFI (Comparative Fit Index): > 0.95 good
  • RMSEA (Root Mean Square Error): < 0.06 good
  • SRMR (Standardized Root Mean Square Residual): < 0.08 good

Q4: How do you handle multiple testing problems?

Answer:

Family-Wise Error Rate (FWER)

Bonferroni Correction: $$\alpha_{adjusted} = \frac{\alpha}{n}$$

False Discovery Rate (FDR)

Benjamini-Hochberg Procedure:

Less conservative than Bonferroni, controls proportion of false discoveries


Q5: Explain time series analysis and forecasting.

Answer:

Decomposition

ARIMA Models

Model Selection:

  • ACF/PACF plots: Identify p, q
  • AIC/BIC: Compare models
  • Stationarity tests: Determine d

Q6: Design a mixed-methods research study.

Answer:

Convergent Parallel Design

Collect both simultaneously, compare and integrate

Explanatory Sequential Design

Quant first, then qual to explain


Q7: Implement machine learning for causal inference.

Answer:

Double/Debiased Machine Learning

Advantages:

  • Flexible modeling of confounders
  • Reduces bias from model misspecification
  • Valid inference

Causal Forests


Q8: Explain survival analysis and competing risks.

Answer:

Kaplan-Meier Curve

Censoring: Participant lost to follow-up or study ends

Competing Risks

Example: Studying death from disease

  • Event of interest: Death from disease
  • Competing risk: Death from other causes

Cumulative Incidence Function (CIF): Accounts for competing risks


Q9: Design and analyze network experiments.

Answer:

Network Structure

Graph Cluster Randomization

Analysis Considerations:

  • Direct effects vs. spillover effects
  • Network autocorrelation
  • Exposure mapping (who affects whom)

Q10: Implement Bayesian hierarchical models.

Answer:

Model Structure

Advantages:

  • Partial pooling: Borrow strength across groups
  • Shrinkage: Pull extreme estimates toward mean
  • Uncertainty quantification: Full posterior distributions

MCMC Sampling

Diagnostics:

  • Trace plots: Visual convergence check
  • R-hat: < 1.01 indicates convergence
  • Effective sample size: > 1000 recommended

Summary

Hard research topics:

  • Bayesian vs. Frequentist: Different statistical philosophies
  • Complex RCTs: Cluster, stepped-wedge, adaptive designs
  • SEM: Latent variables and structural relationships
  • Multiple Testing: FWER and FDR control
  • Time Series: ARIMA, decomposition, forecasting
  • Mixed Methods: Integrating qual and quant
  • ML for Causality: Double ML, causal forests
  • Survival Analysis: Competing risks, censoring
  • Network Experiments: Spillover effects
  • Hierarchical Bayesian: Partial pooling, MCMC

These advanced methods enable tackling complex research questions with rigor.

Related Snippets