nvidia gtc day 1

March 18, 2025


a few notes from the keynote at SAP center, before i got kicked out 20 minutes in for sitting at the stairs, and jensen starting talking about GPUs and hardware.

the AI wave started with (2012 Alexnet)

perception AI -> generative AI -> agentic AI -> physical AI

three problems

  • how to solve the data problem?
  • how to solve the training problem?
  • how do you scale?

three scaling laws

pre-training scaling -> post-training scaling -> test-time scaling "long thinking"

reasoning ai inference compute > 100x one shot more tokens

how to solve data problem?

problem prompts -> model -> answer -> verifier -> back to model

post training with RLVR > 100T tokens

top 4 US CSPs

  • 2024: 1.3M hopper GPUs
  • 2025: 3.6 Blackwell GPUs (2-gpu per Chip)

computing at inflection point

2028 prediction $1T+ data center Capex

  • a new computing approach
  • increase in recognition that the future of software requires capital investment

computer is generating tokens for software, not just retrieval of files

computers are now AI factories, everyone will have two factories, one for the product, another for the AI

CUDA-X for every industry

  • cuPYNUMERIC - numpy on GPUs
  • CuLITHO - computational lithography
  • ARIAL - 5G radio networks with AI
  • cuOPT - mathematical optimization (plan seats, inventory, plants, driver and riders)
  • MONAI - medical
  • Earth 2 - radiology imaging
  • cuQuantum - quantum computing
  • ...

and some notes for Yann LeCun's talk

4 things he's excited about

  • understand physical world
  • consistent memory
  • reason
  • plan

world models: models of the physical world

  • we all have it to allow us to manipulate thoughts and predict what happens
  • architecture: different from language architectures
  • tokens are discrete, probability distribution, 100k numbers, we know how to do this
  • we don't know how to do this with video, we have failed to predict next pixel, it spends all its resources to come up with detail that is not possible to predict
  • what works better: learn representation of image/video/natural signal and make prediction in that space
    • require techniques to prevent collapse where prediction is constant and the input is useless
  • AMI: advanced machine intelligence
    • systems that learn abstract mental models of world and reasons and plans (3-5 years)
    • scaling them up to human-level ai
  • reasoning with tokens is not the right way
    • JEPA models

it takes human 400,000 years to read all text in the world, only 4 years by vision analog computing

a joint embedding predictive architecture (V-JEPA)

  • sliding window of 16 frames, predict the next few frames, measure prediction error

  • baby humans at 9 months can understand gravity

resnet: allows NNs to backprop all the way with many layers

GPT replaced BERT style

no need to mask data for training

open source distributed training is the future


ate an entire branzino for dinner with HP with red wine at Rollati Ristorante. best dinner i've had the entire year.