a few notes from the keynote at SAP center, before i got kicked out 20 minutes in for sitting at the stairs, and jensen starting talking about GPUs and hardware.
the AI wave started with (2012 Alexnet)
perception AI -> generative AI -> agentic AI -> physical AI
three problems
- how to solve the data problem?
- how to solve the training problem?
- how do you scale?
three scaling laws
pre-training scaling -> post-training scaling -> test-time scaling "long thinking"
reasoning ai inference compute > 100x one shot more tokens
how to solve data problem?
problem prompts -> model -> answer -> verifier -> back to model
post training with RLVR > 100T tokens
top 4 US CSPs
- 2024: 1.3M hopper GPUs
- 2025: 3.6 Blackwell GPUs (2-gpu per Chip)
computing at inflection point
2028 prediction $1T+ data center Capex
- a new computing approach
- increase in recognition that the future of software requires capital investment
computer is generating tokens for software, not just retrieval of files
computers are now AI factories, everyone will have two factories, one for the product, another for the AI
CUDA-X for every industry
- cuPYNUMERIC - numpy on GPUs
- CuLITHO - computational lithography
- ARIAL - 5G radio networks with AI
- cuOPT - mathematical optimization (plan seats, inventory, plants, driver and riders)
- MONAI - medical
- Earth 2 - radiology imaging
- cuQuantum - quantum computing
- ...
and some notes for Yann LeCun's talk
4 things he's excited about
- understand physical world
- consistent memory
- reason
- plan
world models: models of the physical world
- we all have it to allow us to manipulate thoughts and predict what happens
- architecture: different from language architectures
- tokens are discrete, probability distribution, 100k numbers, we know how to do this
- we don't know how to do this with video, we have failed to predict next pixel, it spends all its resources to come up with detail that is not possible to predict
- what works better: learn representation of image/video/natural signal and make prediction in that space
- require techniques to prevent collapse where prediction is constant and the input is useless
- AMI: advanced machine intelligence
- systems that learn abstract mental models of world and reasons and plans (3-5 years)
- scaling them up to human-level ai
- reasoning with tokens is not the right way
- JEPA models
it takes human 400,000 years to read all text in the world, only 4 years by vision analog computing
a joint embedding predictive architecture (V-JEPA)
-
sliding window of 16 frames, predict the next few frames, measure prediction error
-
baby humans at 9 months can understand gravity
resnet: allows NNs to backprop all the way with many layers
GPT replaced BERT style
no need to mask data for training
open source distributed training is the future
ate an entire branzino for dinner with HP with red wine at Rollati Ristorante. best dinner i've had the entire year.