the ai grind

October 15, 2024


Louise Banks: If you could see your whole life from start to finish, would you change things? Ian Donnelly: Maybe I'd say what I felt more often. I-I don't know.

– Arrival

did a bunch of ML reading today, was literally walking to the bus stop while reading NLP with transformers, revised edition. learned more of the bigger picture for what led to transformers.

biweekly costco grocery run, bought beef chuck roast, $16 mixed nut butter, salt and pepper, frozen berries, bull ground meat

continued nvidia's deep learning course while watching fast.ai lectures concurrently. everything is slowly connecting together. it's a struggle to balance the amount of math to dive into, and code to write. the theory and the application. i want to know just enough to give me intuition, and only dive deeper if necessary, otherwise i want to build stuff. i want to train models to do something cool. i want to start making GPUs go brr.

watched arrival, it's such a good movie. their portrayal of extraterrestrial life forms. the concept of language as the foundation of civilization, the glue that holds people together. about time and memory.

watched weights & biases rag in production lectures. idea is to implement rag over my blog posts and have it reference specific posts and answer within <5 seconds. will call it askBen and will be my resume project.

worked on papertrail, have email and attachment processing and calendar integration done. har dto figure out what the mvp looks like. also difficult to not fall prey to premature optimization. i have to remind myself that the code that i (AI) write will be replaced in the next iteration, so the goal is to just get things to work.

so many things to learn. so many projects. i want more free days like this to pursue my own curiosity. come thursday, and i'm going to be swamped with assignments and exams again.


fast.ai lecture 3 & 5

  • a derivative = a function that tells you, if you increase the input, does it increase or decrease the output, and by how much
  • what is the mathematical function we are finding parameters for? we can create an infinitely flexible function from rectified linear units (ReLU). just create as many ReLU as you want, and you can create any squiggly line. everything from here is tweaks to make it faster and make it need less data
  • a relu is torch.clip(y, 0.) anything smaller than 0 is 0
  • you just need requires_grad() and backward() to get gradients, access with .grad attribute
  • reproducibility (setting manual seed) is not useful when you want to understand data is how much it varies, and how your model behaves under different variations of data.
  • broadcasting benefits: code is more concise, all happened in optimized C code (in GPU, it's optimized CUDA code)
  • rules of broadcasting: as long as last axis matches
    • tensor([1.,2.,3.]) * tensor(2.0)

nvidia intro to deep learning

  • what are tensors?
    • vector = 1d array
    • matrix = 2d array
    • tensor = n-d array
  • ex: pixles on a screen is a 3d tensor with width, height, and color channel
  • smaller batch sizes i.e. 32/64 for are efficient for model training
  • nn.Flatten() expects a batch of data
  • nn.Linear(input_size, neurons), number of neurons is what captures the statistical complexity in a dataset
  • next(model.parameters()).device - to check which device model params are on
  • torch.compile(model) for faster performance