nvidia gtc day 1

a few notes from the keynote at SAP center, before i got kicked out 20 minutes in for sitting at the stairs, and jensen starting talking about GPUs and hardware.

the AI wave started with (2012 Alexnet)

perception AI -> generative AI -> agentic AI -> physical AI

three problems

how to solve the data problem?
how to solve the training problem?
how do you scale?

three scaling laws

pre-training scaling -> post-training scaling -> test-time scaling "long thinking"

reasoning ai inference compute > 100x one shot more tokens

how to solve data problem?

problem prompts -> model -> answer -> verifier -> back to model

post training with RLVR > 100T tokens

top 4 US CSPs

2024: 1.3M hopper GPUs
2025: 3.6 Blackwell GPUs (2-gpu per Chip)

computing at inflection point

2028 prediction $1T+ data center Capex

a new computing approach
increase in recognition that the future of software requires capital investment

computer is generating tokens for software, not just retrieval of files

computers are now AI factories, everyone will have two factories, one for the product, another for the AI

CUDA-X for every industry

cuPYNUMERIC - numpy on GPUs
CuLITHO - computational lithography
ARIAL - 5G radio networks with AI
cuOPT - mathematical optimization (plan seats, inventory, plants, driver and riders)
MONAI - medical
Earth 2 - radiology imaging
cuQuantum - quantum computing
...

and some notes for Yann LeCun's talk

4 things he's excited about

understand physical world
consistent memory
reason
plan

world models: models of the physical world

we all have it to allow us to manipulate thoughts and predict what happens
architecture: different from language architectures
tokens are discrete, probability distribution, 100k numbers, we know how to do this
we don't know how to do this with video, we have failed to predict next pixel, it spends all its resources to come up with detail that is not possible to predict
what works better: learn representation of image/video/natural signal and make prediction in that space
- require techniques to prevent collapse where prediction is constant and the input is useless
AMI: advanced machine intelligence
- systems that learn abstract mental models of world and reasons and plans (3-5 years)
- scaling them up to human-level ai
reasoning with tokens is not the right way
- JEPA models

it takes human 400,000 years to read all text in the world, only 4 years by vision analog computing

a joint embedding predictive architecture (V-JEPA)

sliding window of 16 frames, predict the next few frames, measure prediction error
baby humans at 9 months can understand gravity

resnet: allows NNs to backprop all the way with many layers

GPT replaced BERT style

no need to mask data for training

open source distributed training is the future

ate an entire branzino for dinner with HP with red wine at Rollati Ristorante. best dinner i've had the entire year.

BENEDICT NEO 梁耀恩

nvidia gtc day 1