Research group logo Research group logo
Theme
  • Light
  • Dark
  • System
  • Home
  • Overview
  • Members
  • Alumni
  • Publications
  • Courses
  • Openings
  • Contact
  • Theme
    • Light
    • Dark
    • System

Trustworthy Machine Learning

Spring Semester 2026, KAIST AI

Why does AI solve the Math Olympiad but fail to manage your calendar? General AI is not trustworthy in private settings because of three broken communication channels. Human to AI: underspecification (it doesn't know what I want). AI to Human: unexplainability and overconfidence (I don't know why it did that or if it's guessing). Environment: hostility (privacy leaks and security attacks). This course covers theoretical and technical background for these key topics in Trustworthy Machine Learning (TML). We conduct a critical review of classical and contemporary research papers and provide hands-on practicals.

1. Goal

  1. Students will be able to critically read, assess, and discuss research work in Trustworthy Machine Learning (TML).
  2. Students will gain the technical background to implement basic TML techniques in a deep learning framework.
  3. Students will be ready to conduct their own research in TML and make contributions to the research community.

2. Prerequisites

  • Familiarity with Python and PyTorch coding.
  • A pass grade from the Deep Learning Course (or equivalent).
  • Basic knowledge of machine learning concepts.
  • Basic maths: multivariate calculus, linear algebra, probability, statistics, and optimisation.

3. TML Book

Previous course materials are available as a book: https://trustworthyml.io/ (also on arXiv).

The book will be useful for the course. However, the course materials are updated yearly to stay aligned with the latest research, so it won’t cover new topics.

4. Schedule

#DateContentProject
L1Mar 06Orientation (short session) · Video—
L2Mar 13I. Human → AI: ML Foundations & Generalisation Primer · Video—
L3Mar 20Underspecification & Cues · VideoTeam formation due (23:59)
L4Mar 27LLM Communication & Modularity · Video—
L5Apr 03II. AI → Human: Explanation & XAI · Video—
L6Apr 10Attribution Methods · Video—
L7Apr 17Training Data Attribution · Video—
L8Apr 24Proposal Presentations (Midterm Week) · VideoProposal report due (23:59); Proposal presentation
L9May 08Uncertainty I (Aleatoric)—
L10May 15Uncertainty II (Epistemic)—
L11May 22Uncertainty III (LLMs)—
L12May 29III. Privacy & Security: Privacy & Data Protection—
L13Jun 05Security & Adversarial Robustness—
L14Jun 12IV. Final Presentations (Part 1)Final report due (11 June, 23:59)
L15Jun 19Final Presentations (Part 2)Peer eval due (23:59)

5. Grading

ComponentWeight
Proposal presentation15%
Proposal report15%
Final presentation35%
Final report35%
Total100%

Late submissions are not accepted. A missed deadline counts as a zero.

All criteria use a 5-point scale: 1 (Poor) / 2 (Below expectations) / 3 (Meets expectations) / 4 (Good) / 5 (Excellent). Criteria within each rubric are equally weighted.

Proposal report (15%)

Due L8 (Apr 24, 23:59). 1-2 pages, ICML 2026 format.

CriterionDescription1 (Poor)5 (Excellent)
Problem definitionIs the research question stated clearly? Does the report specify what problem the project addresses and why it matters?Problem is vague or absent.Reader immediately understands the gap and the question.
Related work awarenessDoes the team cite key references and position their project relative to existing work? A full literature review is not expected, but awareness of the landscape is.No references or awareness of prior work.Clear positioning with relevant citations.
Proposed approachIs there a concrete plan? This includes model choice, data, metrics, and experimental design.Approach is missing or hand-wavy.Detailed, actionable plan.
Writing qualityIs the report well-structured, concise, and free of major errors? AI slop (generic, low-effort AI-generated text) will be penalised.Disorganised or incomprehensible.Polished writing.

Proposal presentation (15%)

L8 (Apr 24). 5-minute lightning talk per team.

CriterionDescription1 (Poor)5 (Excellent)
Clarity of problem and motivationCan the audience understand what the project is about and why it matters within the first two minutes?Audience is lost.Immediately clear.
Plan communicationDoes the team convey a credible plan? This includes approach, data, and expected outcomes.No plan is communicated.Plan is convincing and concrete.
Slide qualityAre slides readable, well-designed, and not overloaded?Unreadable walls of text or irrelevant graphics.Clean, effective visuals.
Time managementDoes the team stay within the 5-minute limit and pace themselves well?Severely over or under time.Smooth pacing with a natural conclusion.

Final report (35%)

Due L15 (Jun 12, 23:59). 4 pages excluding references, ICML 2026 format.

CriterionDescription1 (Poor)5 (Excellent)
Problem formulationIs the research question clearly stated and well-motivated? Compared to the proposal, this should now be refined and precise.Question remains vague.Crisp, well-scoped question.
Technical depthDoes the report demonstrate understanding of the methods used? Are technical choices justified?Superficial or incorrect technical content.Command of relevant techniques with principled decisions.
Experimental design and resultsAre experiments well-designed with appropriate baselines? Are results presented clearly (tables, figures, error bars where applicable)?Missing or poorly designed experiments.Rigorous experiments with clear presentation.
Analysis and discussionDoes the team interpret their results, discuss limitations, and reflect on what worked and what did not?No interpretation.Thoughtful analysis that goes beyond “method X got accuracy Y”.
Writing qualityIs the report well-organised, clearly written, and properly formatted? Are figures and tables captioned and referenced? AI slop will be penalised.Disorganised or hard to follow.Publication-ready writing.

Final presentation (35%)

L16 (Jun 19). 10 minutes per team + Q&A.

CriterionDescription1 (Poor)5 (Excellent)
Clarity and structureIs the presentation logically structured? Does the audience follow the narrative from problem to method to results to takeaway?Incoherent structure.Compelling, well-organised talk.
Technical communicationCan the team explain their technical approach at the right level of detail?Audience cannot understand the method.Complex ideas made accessible without oversimplification.
Results presentationAre results communicated effectively? Are key findings highlighted with readable figures and tables?Results are buried or absent.Audience clearly sees what was achieved.
Slide qualityAre slides clean, readable, and well-designed?Unreadable walls of text or irrelevant graphics.Clean, effective visuals.
Time managementDoes the team stay within the 10-minute limit and pace themselves well?Severely over or under time.Smooth pacing with a natural conclusion.

6. Projects

  • Team size: 3 students per team.
  • Formation: Use the #team-formation Slack channel to find team members. Teams finalised by L3 (Mar 20, 23:59).
  • Compute: Each student receives a 50 USD Google Cloud Platform voucher for the project. Each team of three therefore has 150 USD credit.
  • Template: Use the ICML 2026 LaTeX template (zip) for both proposal and final reports.
  • Deliverables:
    • Proposal: 1-2 page report + 5 min presentation.
    • Final: 4-page report (ICML format, excluding references) + 10 min presentation.
  • Peer evaluation: Mandatory form at the end. Distribute 100 points among your team members based on contribution. Unequal splits will affect individual grades.

Example project topics

Topics are open; students choose direction and methods. Examples of the kind of project that fits the course:

  1. Test-time detection of prompt sensitivity. Prior work has shown that VLMs are sensitive to prompt phrasing. The open question is whether we can detect at test time that a prediction is prompt-sensitive and flag it to the user, without access to ground truth. Propose and evaluate a detection method. Models: CLIP ViT-B/32, LLaVA-7B (both run on a single T4). Datasets: ImageNet, EuroSAT, or other zero-shot classification benchmarks. References: PARC (CVPR 2025, quantifying VLM prompt sensitivity), WaffleCLIP (ICCV 2023, random descriptors match LLM-generated prompts).
  2. Surfacing knowledge conflicts in RAG. Recent work resolves parametric-contextual conflicts silently. An open problem is whether the system can instead detect and surface the conflict to the user, letting them decide. Build a conflict-detection pipeline and evaluate its precision. Models: Llama 3.1 8B or Mistral 7B with a FAISS index (A100 or T4 with quantisation). Datasets: Natural Questions or TriviaQA with synthetically altered retrieval passages. References: FaithfulRAG (ACL 2025, fact-level conflict modelling), AdaCAD (NAACL 2025, adaptive decoding for knowledge conflicts), JuICE (ICML 2025, test-time attention intervention).
  3. Mechanistic vs data attribution for the same failure. Recent work attributes model failures either to training data (TDA) or to internal mechanisms (layer-wise dynamics, circuit analysis). These perspectives are rarely compared. Pick a failure mode (e.g. hallucination, gender bias) and apply both attribution families to the same cases. Do they agree? Models: ViT-B on ImageNet for vision; Llama 3.1 8B or a smaller LLM for language. Datasets: task-specific failure sets you curate. References: Accountability Attribution (ICML 2025, tracing behaviour to training stages), DDA (EMNLP 2024, influence functions with fitting error correction).
  4. Confidence under distribution shift. Methods like BaseCal and EAGLE improve calibration on in-distribution data. Less is known about how confidence estimates degrade under distribution shift or across multi-turn conversations. Evaluate existing calibration methods on shifted inputs and propose a detection strategy. Models: Llama 3.1 8B or Mistral 7B (sampling-based methods need ~20-50 forward passes per input; budget for this). Datasets: TriviaQA, MMLU with domain-shifted or adversarially perturbed variants. References: IB-EDL (ICLR 2025, information-theoretic evidential calibration), Multicalibration (ICML 2024, group-wise calibration for LLMs).
  5. Paper reproduction. Reproduce the key experiments of a published paper covered in the course. Verify the claims, test on a different model or dataset, and report where the results hold and where they break. Choose a paper whose experiments fit the compute budget.

7. Generative AI Policies

Students may use generative AI tools (e.g. LLMs, VLMs, image generators). However, you are solely responsible for all outputs you submit. We will apply heavy penalties for:

  • Hallucinated or factually incorrect outputs.
  • Unsound or fabricated citations.
  • Plagiarised materials.
  • AI slop (low-effort, generic AI-generated content).

Severe cases may be reported to the university for disciplinary action.

You must be ready to answer clarification requests from the lecturer or tutors at any point. Inability to explain your own work will be treated as evidence of academic misconduct.

We do not tolerate very similar creative work among class members. AI tends to produce similar outputs across sessions and model families. Diversify your answers, especially for creative work. Suspicion of copied work will be penalised.

8. Communication & Logistics

Language: English

Lecturer: Seong Joon Oh

Tutors: Myungkyu Koo (jameskoo0503@kaist.ac.kr), Gyouk Chu (kyouwook@kaist.ac.kr), Kyuyoung Kim (kykim@kaist.ac.kr), Sangwon Jang (sangwon.jang@kaist.ac.kr)

When: Fridays 13:00-15:30 (1st session 13:00-14:10, break 14:10-14:20, 2nd session 14:20-15:30)

Where: Online (Zoom)

Email: stai.there@gmail.com for submissions, questions, and feedback.

Slack: Email us your name and preferred email address to be added. Use it for questions, announcements, and finding team members.

© Seong Joon Oh, 2024 · Partially powered by the Academic theme for Hugo.