ray summit day 2

Samsara

what? IoT for operations

mission: improve the safety, efficiency, and sustainability of the operations that power the global economy.

applications

  • video-based safety: dash cams
  • telematics: sensor data from vehicles, real time gps, routing, maintenance
  • workforce apps: compliance and training
  • connected equipment: location tracking, utilization
  • site visibility: remote visbility, proactive alerting

-> ai insights

ex:

  • driver performance
    • score, distance driven, total behaviours, % over speed limit
  • safety inbox
    • harsh events / speeding events
    • review dashcams of drivers that had accidents, coach the driver
  • seatbelt detection
  • drowsiness detection

Recursion

what? biotech company with >50 PB proprietary biological, chemical, and patient-centric data

in drug discovery, everything is a model (scientific)

each model is a proxy built by scientific experts

data -> database -> ds analyze data

ai-based drug discovery, we model (ML) all the models (scientific)

cell images are information rich and inexpensive and scalable

runs >2.2M experiments per week

the idea is you dose diseased cells with increasing levels of concentration of medicine, and observe the effects

images are analyzed by masked auto encoders (MAE)

their paper: Masked Autoencoders are Scalable Learners of Cellular Morphology (code)

why use this? it's not about generating image of cells

masked autoencoding: self-supervised generative AI

reason: the intermediate layer of encoded and decoder carries a lot of useful information that can be used for downstream task

  • MAE embedding: how images are different from control and perturbed cells

25.7% increase in expressed gene knockouts detected with the new model

with this approach, we discover NEW models of biology

instacart

core customers is family

answering the question "what's for dinner?"

#1 chatbot

  • expected: how to eat healthier, meal plan
  • reality: customer support, edit order, order details

it turned into a customer support chatbot

#2 search

ask instacart

soccer snacks for 20 kids

#3 catalog generation

today instacart has 85k stores, billions of items

bought one of every item, take photos, send it to extract attributes, put it on site. they've done this for millions of items

catalog augmentation today:

take product information + images -> LLM -> catalog

this is 10x cheaper

#4 internal productivity

Ava to democratize AI use across company

  • internal chat platform with prompt library, access to latest models
  • built into slack
  • internal slack room to train employees genAI

lessons from AI boom

  • invest in AI usage across your company
  • make tooling usable by your whole team

future of instacart: even more personalized down to nutrition labels

kevin weil

  • how is prod management different at openai
    • depends on culture of company, twitter is consensus culture
    • sam is visionary, pushes us to think big, but leaves room to build the right things
    • technology floor is not fixed at openai, its doing what computers couldn't do ever before.
    • you can't quiet see it coming, you can kind of see it through the mist, you often don't know until the model is baked
    • the way you build a product for an 80% correct thing is different from a 99% correct thing
  • strategy for developer facing roadmap
    • philosophy: more AI = better for the world
    • bring new capabilities, multimodal voice API, distillation
    • more intelligence, cheaper, faster, safer
    • gpt4o mini is 1% cost of gpt3 when it launched, it's massively smarter and safer, in <2 years it's 99% reduction in cost
    • the cheaper AI is, the more problems we can solve
  • open source models
    • from mission perspective: getting more AI into hands of people
    • open source is a great strategy
  • competing with cloud providors
    • microsoft: openai
    • google: more direct competitor
    • amazon: anthropic
    • competition makes all of us better, users get better models
    • up to openai to take more risks in the product
  • o1 use cases
    • let model think for 5 hours, days or months for hard research questions
  • move from AI answering questions for you -> AI doing things for you
    • not 5 minute tasks, but 5 hour tasks
    • even mundane and efficiency things takes reasoning, to take a complex world and act on it
    • reasoning will be a big part of the future
  • it will be a different way of building product, things will become async
  • build the things that just don't quite work today, and in 6 months, you'll be ahead of everyone else
  • if you're building something and you're afraid of someone's next model launch, you may not be building the right thing
  • if you're building something where you're looking forward for the next model launch, that's the right place to be
  • ai consumer product monetization
    • social media was ads, what about AI? subscriptions?
    • legal: $1000/hr for 6 hours to write a document can be done in 5 minutes with o1 with 3 dollar API credit
    • how do we share value of creation, bringing AI to the rest of the world that may not be able to pay
  • planning chatgpt roadmap analogies
    • think of systems like another human
    • people don't just blurt out CoT or go mute for 60 seconds, people give periodic thoughts when they're thinking, how to reflect this in the product?
    • the way people write and talk have discrepancies, written and spoken English are different languages
    • build neediness in voice mode to get the conversation going
    • PMs shape personality of models
  • richer responses
    • model responses need to be richer, now it's a lot of back and forth of text
    • it needs to be more naturally integrated with voice and video
    • how long will chat interface hold?
  • left/right leaning, politics
    • 50/50 should not be topics that models should take a stance
    • model spec: this is what the stance we expect our models to follow
    • if the model is not behaving in a natural way, two reasons
      • not following spec: fix the bug
      • you disagree with the spec: a debate we can have
  • who writes the specs?
    • we employ writers to get its emotions right
    • twitter: "no matter how many smart people you have within your walls, there are way more smart people outside your walls"
    • at open ai, philosophy of iterative deployments, when it comes with safety and societal aspects, getting models out there, and progressively exposing them to broader groups of people is how we slowly drive positive change
    • model spec is public so it receives feedback from people around the world
  • future
    • personalized tutoring for kids
    • we are more eval limited and not intelligence limited
    • applying datasets that are not public
    • making the model really good at specific things

petabyte scale embedding generation

embedding intro

  • vectorized numerical representation of information
    • closeness in vector space = data itself is similar
  • powers search, classification, chatbots, RAG-based
  • involve model inference

offline embedding generation

  • 3 characteristics: huge data volume (text, audio, video), predictable workload (optimize for throughput), heterogeneous compute
  • various input data formats
    • raw text
    • semi-structured: JSON, JSONL
    • structured data: parquet, Avro
    • table format: iceberg
    • import ray.data -> ray.data.read_datasource(CustomerDataSource())
  • preprocessing on CPU
    • preprocessing steps - tokenization, image resizing, audio/video decoding
    • filtering - conditional based filtering
    • special step - customized preprocessing logic in python
    • you can scale each step independently by providing more CPU cores
  • partial data preprocessing
    • gpu-based preprocessing (in batches)
      • ex: image transformation, transformation workload that works better for GPU
    • embedding model inferences: run embedding model
  • compute resources utilization
    • with ray.data, read data in mini batch in CPU, do preprocessing, transfer through Object Store Buffer, move to GPU, embedding model inference
    • resource utilization could be low if any steps becomes bottleneck, mainly GPU preprocessing and embedding model inference
    • when a batch finishes, there should be another batch already loaded in memory
  • two jobs
    • #1 pure CPU job, transformation -> intermediate data storage
    • #2 embedding generation job -> vector db
    • why? ray data is not great at handling heterogenous compute, so split into two
    • intermediate data is formatted in a way for job #2 to consume, so GPU utilization is high
    • problems
      • two jobs to maintain
      • orchestration solution needs to be introduced
      • wasted I/O time
  • takeaways
    • flexibility
      • flexible input types and output sources
      • custom preprocessing logics
      • support different embedding models
      • support different accelerators
    • high performance
      • scale each step independently
      • stream execution to avoid GPU ideal time (key for high throughput)
    • scalability
      • scale to thousands of CPU and GPU nodes

online embedding generation with ray

why online?

in a RAG app, we will get a user query, do preprocessing steps, and then generate embedding, use this embedding to perform similarity search on embeddings from offline generation, pass it to model of choice, and return a response

requirements

  • generate in real time with high availability and low latency
  • input can be many format
  • preprocessing involves multiple steps / model inference
  • processing step involves CPUs and GPUs
  • easy to use, validate, and iterate

real time processing

  • RayServe hosts a live endpoint for a real time query
  • hosts multiple routes with different HTTP methods
  • ability to specify a full suite of configurations

highly available and efficient autoscaling

  • global control service fault tolerance for graceful recovery from failures
  • default and custom health checks
  • configurable replica step per deployment

complex logic and heterogeneous cluster

  • support for model composition
  • each of models/deployments can scale independently
  • support for heterogeneous RayCluster
  • ex: text sanitization and translation before embedding a text query

rewatched lectures, watched office hour recordings, wrote out notes twice, was at campus from 3:30 p.m. till 10 p.m. i put more effort into this class than the last quiz, hope that reflects in the exam. i don't know why i got so anxious for this test, it's the lingering effects of blanking in the last quiz that is spurring fear-based motivation in me to get all the details right, to fully understand everything without skipping any details or taking any shortcuts. this might be the most ive studied for any subject. i don't know why i'm struggling so much in this material. might be my weak foundation in math.

10/2/2024