Scalable Trustworthy AI

Creating scalable and trustworthy AI with human guidance

Overview

AI is no longer a research curiosity. It is reshaping how we live and work. To fully exploit its benefits, we must address critical gaps in trustworthiness.

Current foundational models like LLMs have critical trustworthiness problems: they hallucinate false information, fail at continual learning, resist knowledge editing (making GDPR compliance impractical), leak private information embedded in parameters, and require prohibitive compute for training and personalisation. These issues are blocking the widespread adoption of AI and the productivity revolution it promises.

Our approach: Knowledge-Intelligence Separation. Just as the code-data separation in the 1960s enabled the modern software industry, we believe this separation is the key to unlocking AI’s full potential. When knowledge is stored in interpretable, editable external modules while intelligence (reasoning, generalisation) remains in the model, we enable faster customisation, training data attribution by design, and knowledge editing and unlearning .

Our work spans a range of interconnected areas:

We are not alone in this effort. Many research labs worldwide contribute to Trustworthy AI. Our group finds its uniqueness by striving for working solutions that are widely applicable and can be deployed at scale. We thus name our group Scalable Trustworthy AI. For impact at scale, we commit ourselves to the following principles:

For prospective students: You might be interested in our internal curriculum and guidelines for a PhD program: Principles for a PhD Program.

Members

Seong Joon Oh

Seong Joon Oh

Associate Professor

Elisa Nguyen

Elisa Nguyen

PhD Student

Arnas Uselis

Arnas Uselis

PhD Student

Sohyung Kim

Sohyung Kim

PhD Student

Stefano Woerner

Stefano Woerner

PhD Student

Yejin Kim

Yejin Kim

Research Intern

Yunjae Won

Yunjae Won

Collaborating PhD Student

Hoyeon Chang

Hoyeon Chang

Collaborating PhD Student

Ankit Sonthalia

Ankit Sonthalia

PhD Student

Bryan Truong

Bryan Truong

PhD Student

Lennart Bramlage

Lennart Bramlage

Collaborating PhD Student

Jihyeok Jung

Jihyeok Jung

MSc student

Seokwon Jung

Seokwon Jung

MSc Student

Alumni

Elif Akata

Elif Akata

PhD Student

Michael Kirchhof

Michael Kirchhof

Collaborating PhD Student

Evgenii Kortukov

Evgenii Kortukov

MSc Student

Johannes Bertram

Johannes Bertram

Research Assistant

Bora Kargi

Bora Kargi

MSc Student

Philipp Davydov

Philipp Davydov

MSc Student

Luca Füger

Luca Füger

MSc student

Fabian Morelli

Fabian Morelli

MSc Student

Publications

MASEval: Extending Multi-Agent Evaluation from Models to Systems

arXiv 2026

Half-Truths Break Similarity-Based Retrieval

arXiv 2026

Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models

arXiv 2026

Dynamics Reveals Structure: Challenging the Linear Propagation Assumption

arXiv 2026

CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally

ICLR 2026

DISCO: Diversifying Sample Condensation for Efficient Model Evaluation

ICLR 2026

Dr.LLM: Dynamic Layer Routing for LLMs

ICLR 2026

Enhancing Multi-Image Understanding through Delimiter Token Scaling

ICLR 2026

SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?

ICLR 2026

Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models

arXiv 2026

LLM generation novelty through the lens of semantic similarity

arXiv 2025

Diffusion Classifiers Understand Compositionality, but Conditions Apply

NeurIPS D&B 2025

On the Rankability of Visual Embeddings

NeurIPS 2025

OVS Meets Continual Learning: Towards Sustainable Open-Vocabulary Segmentation

NeurIPS 2025

Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers

EMNLP 2025

C-SEO Bench: Does Conversational SEO Work?

NeurIPS D&B 2025

Does Data Scaling Lead to Visual Compositional Generalization?

ICML 2025

Do Deep Neural Network Solutions Form a Star Domain?

ICLR 2025

Intermediate Layer Classifiers for OOD Generalization

ICLR 2025

Decoupled Finetuning for Domain Generalizable Semantic Segmentation

ICLR 2025

Are We Done with Object-Centric Learning?

SCSL @ ICLR 2025

DiCoTTA: Domain-invariant Learning for Continual Test-time Adaptation

arXiv 2025

Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles

SCSL @ ICLR 2025

Playing repeated games with Large Language Models

Nature Human Behaviour 2025

Benchmarking Uncertainty Disentanglement: Specialized Uncertainties for Specialized Tasks

NeurIPS D&B (Spotlight) 2024

Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models

NAACL Findings 2025

Studying Large Language Model Behaviors Under Realistic Knowledge Conflicts

CoLM 2024

Towards User-Focused Research in Training Data Attribution for Human-Centered Explainable AI

arXiv 2024

Scalable Ensemble Diversification for OOD Generalization and Detection

arXiv 2024

Calibrating Large Language Models Using Their Generations Only

ACL 2024

Pretrained Visual Uncertainties

arXiv 2024

TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification

ACL Findings 2024

A Bayesian Perspective On Training Data Attribution

NeurIPS 2023

Exploring Practitioner Perspectives On Training Data Attribution Explanations

NeurIPS XAI in Action Workshop 2023

ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets

NeurIPS 2023

URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates

NeurIPS D&B 2023

Neglected Free Lunch -- Learning Image Classifiers Using Annotation Byproducts

ICCV 2023

Scratching Visual Transformer's Back with Uniform Attention

ICCV 2023

Probabilistic Contrastive Learning Recovers the Correct Aleatoric Uncertainty of Ambiguous Inputs

ICML 2023

URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates

UAI-EAI Best Student Paper 2023

ProPILE: Probing Privacy Leakage in Large Language Models

NeurIPS Spotlight 2023
Spotlight

ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO

ECCV 2022

Dataset Condensation via Efficient Synthetic-Data Parameterization

ICML 2022

Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data

CVPR 2022

Which Shortcut Cues Will DNNs Choose? A Study from the Parameter-Space Perspective

ICLR 2022

Openings

PhD

Expectations

We expect PhD students to run their own first-author projects, with possible collaborations with both senior and junior members inside and outside the lab.

Application process

PhD application process

  1. Email stai.there@gmail.com with your CV and research statement attached
  2. Coffee chat with Seong Joon to figure out initial fit
  3. Half-day interview
    • Job talk: Present your prior work to the entire lab (30 minutes + discussion)
    • 1-on-1 interviews: Meet individually with 2 lab members (we will connect you via email to arrange times)
    • Interview with Seong Joon: Discuss research directions
  4. Offer
  5. Apply to the grad school with Seong Joon Oh’s supervision intent via KAIST Graduate Admissions

Timeline

Steps 1-4 must be completed at least one week before the group offer announcement dates below. Please reach out well in advance.

Group offer announcement (step 4)

  • International spring: 20 August
  • International autumn (early track): 20 November
  • International autumn (regular track): 20 February
  • Domestic spring: 20 June
  • Domestic autumn: 20 March

MSc

Expectations

We expect MSc students to run their own first-author projects, with possible collaborations with both senior and junior members inside and outside the lab.

Application process

MSc application process

  1. Email stai.there@gmail.com with your CV and research statement attached
  2. Coffee chat with Seong Joon to figure out initial fit
  3. Interview: 30 min + 30 min with Seong Joon
    • First half: Present your prior work (aim for 10 minutes, leaving 20 minutes for discussion)
    • Second half: Discuss future research ideas at the intersection of your expertise and our vision
  4. Offer
  5. Apply to the grad school with Seong Joon Oh’s supervision intent via KAIST Graduate Admissions

Timeline

Steps 1-4 must be completed at least one week before the group offer announcement dates below. Please reach out well in advance.

Group offer announcement (step 4)

  • International spring: 20 August
  • International autumn (early track): 20 November
  • International autumn (regular track): 20 February
  • Domestic spring: 20 June
  • Domestic autumn: 20 March

Internship

Expectations

We expect interns to participate in a predefined research agenda as a co-author, working closely with their PhD student host.

Supervision

Your day-to-day supervisor is the PhD student you apply to work with. They define the research direction, set milestones, and provide regular feedback. Joon is available for broader guidance but does not manage the internship on a daily basis. Choose your PhD student host carefully - your internship experience depends largely on this match.

Application process

Internship application process

  1. Send an email to the relevant PhD student (cc: stai.there@gmail.com) with your CV and research statement attached
  2. Coffee chat with the PhD student
  3. Interview: 30 min + 30 min with the PhD student
    • First half: Present your prior work (aim for 10 minutes, leaving 20 minutes for discussion)
    • Second half: Discuss future research ideas at the intersection of your expertise and our vision
  4. Offer