meta day 1 (supposedly)

meta practicum delayed so today was more deep learning and building a project to do RAG on my blog posts. lots to think about for the RAG system, how to chunk, how to enhance the query, how to evaluate the response, how to do hybrid search, which embedding model, which vector database, which LLM for the response, how to speed up the query, how to structure the response so that it references blog posts, and most importantly, how to package all this into a backend for my blog to use.

watched nocturnal animals and that movie was really dark. it's the kind of movie where you're left just with the pain of the characters. you don't know what to do with it. there's no resolution or takeaway, just bitterness and sorrow. the highway scene is so traumatic.

tried turmeric ginger latte with oat milk. not sure if i liked it, i will stick to normal coffee. going through nvidia's deep learning course, fun to learn about CNNs and data augmentation. the lectures are short and notebooks don't go too deep, but it does serve its purpose for getting started.

been printing an AI paper everyday at the library. hoping to absorb everything in this field like a sponge, and start training models with PyTorch, and get into distributed training lingo.

but also at the same time reading and writing more, i don't want to be 100% on just AI, i want my creative side to have the chance to see the light. these past few days have been like the calm before the storm, going to be one heck of a semester again. i hope this time, i focus more on understanding the why, and being more curious in class, and giving 100% in the lectures, build more, put in 200% for meta, and remember to have fun in the process. i'm living in the best city for AI, and i have nothing to complain. life is full of possibilities now.


a career cold start algorithm i found online

The first step is to find someone on the team and ask for 30 minutes with them.

In that meeting you have a simple agenda:

  • For the first 25 minutes: ask them to tell you everything they think you should know. Take copious notes. Only stop them to ask about things you don't understand. Always stop them to ask about things you don't understand.
  • For the next 3 minutes: ask about the biggest challenges the team has right now.
  • In the final 2 minutes: ask who else you should talk to. Write down every name they give you. Repeat the above process for every name you're given. Don't stop until there are no new names.

the first 25 mins gives you a framework to integrate new information more quickly. it will index on areas that are under active discussion, signals about the problems the team is currently facing, and adopt the company's lingo to work smoothly with them.

the second gives you a cheat sheet on how to impress teams with early positive impact. some things will take time to fix, and you should internalize these challenges, i.e. "our infra isn't scaling". but there are a surprising amount that you can easily help with, focus on these because they are things the team often neglect.

the third one gives you a valuable map of influence in the org. the more often names show up and the context in which they show up tends to be a different map of the organization that the one in the org chart

for all these, the greatest value isn't in the answers – it's in the asking. taking the meetings and listening shows the proper respect for the team. demonstrating mutual respect builds the trust required to make progress.

10/16/2024

the ai grind

Louise Banks: If you could see your whole life from start to finish, would you change things? Ian Donnelly: Maybe I'd say what I felt more often. I-I don't know.

– Arrival

did a bunch of ML reading today, was literally walking to the bus stop while reading NLP with transformers, revised edition. learned more of the bigger picture for what led to transformers.

biweekly costco grocery run, bought beef chuck roast, $16 mixed nut butter, salt and pepper, frozen berries, bull ground meat

continued nvidia's deep learning course while watching fast.ai lectures concurrently. everything is slowly connecting together. it's a struggle to balance the amount of math to dive into, and code to write. the theory and the application. i want to know just enough to give me intuition, and only dive deeper if necessary, otherwise i want to build stuff. i want to train models to do something cool. i want to start making GPUs go brr.

watched arrival, it's such a good movie. their portrayal of extraterrestrial life forms. the concept of language as the foundation of civilization, the glue that holds people together. about time and memory.

watched weights & biases rag in production lectures. idea is to implement rag over my blog posts and have it reference specific posts and answer within <5 seconds. will call it askBen and will be my resume project.

worked on papertrail, have email and attachment processing and calendar integration done. har dto figure out what the mvp looks like. also difficult to not fall prey to premature optimization. i have to remind myself that the code that i (AI) write will be replaced in the next iteration, so the goal is to just get things to work.

so many things to learn. so many projects. i want more free days like this to pursue my own curiosity. come thursday, and i'm going to be swamped with assignments and exams again.


fast.ai lecture 3 & 5

  • a derivative = a function that tells you, if you increase the input, does it increase or decrease the output, and by how much
  • what is the mathematical function we are finding parameters for? we can create an infinitely flexible function from rectified linear units (ReLU). just create as many ReLU as you want, and you can create any squiggly line. everything from here is tweaks to make it faster and make it need less data
  • a relu is torch.clip(y, 0.) anything smaller than 0 is 0
  • you just need requires_grad() and backward() to get gradients, access with .grad attribute
  • reproducibility (setting manual seed) is not useful when you want to understand data is how much it varies, and how your model behaves under different variations of data.
  • broadcasting benefits: code is more concise, all happened in optimized C code (in GPU, it's optimized CUDA code)
  • rules of broadcasting: as long as last axis matches
    • tensor([1.,2.,3.]) * tensor(2.0)

nvidia intro to deep learning

  • what are tensors?
    • vector = 1d array
    • matrix = 2d array
    • tensor = n-d array
  • ex: pixles on a screen is a 3d tensor with width, height, and color channel
  • smaller batch sizes i.e. 32/64 for are efficient for model training
  • nn.Flatten() expects a batch of data
  • nn.Linear(input_size, neurons), number of neurons is what captures the statistical complexity in a dataset
  • next(model.parameters()).device - to check which device model params are on
  • torch.compile(model) for faster performance

10/15/2024

nvidia nim

“I forgot that every little action of the common day makes or unmakes character,” Oscar Wilde wrote in De Profundis

went into a rabbit hole into nvidia software.

also did a bit of reading into graph neural networks.

main takeaway is you can create a graph structure from text and images, but it's more useful for heterogeneous structures where the number of neighbors for each node in a graph is variable (as apposed to fixed for text and images).

there are three main predictive tasks for GNNs

(1) graph-level

  • ex: molecule as a graph, predict its smell or probability of binding to a receptor
  • analogy for image: classify an entire image
  • analogy for text: label sentiment of an entire sentence

(2) node-level

  • ex: predict identity/role of each node
  • analogy for image: image segmentation
  • analogy for text: parts-of-speech

(3) edge-level

  • ex: image scene understanding, given nodes that represent objects in an image, predict which of these nodes share an edge or what the value of the edge is

the challenges of graphs in ML: representing graphs for neural networks

graphs have 4 types of info

  1. nodes
  2. edges
  3. global-context
  4. connectivity (hard part)

first three is straightforward, we create a node matrix N, where each node has an index i that stores the feature for node_i

connectivity is more complicated. first, adjacency matrix are sparse and space inefficient. second, many adjacency matrices can encode the same information, but without a guarantee that they produce the same result (they are not permutation invariant)

solution: represent sparse matrices as adjacency list

they describe connectivity of edge e_k between nodes n_i and n_j as a tuple (i, j) in the kth entry of the list.

conceptual

applied

found these datasets to work with

10/14/2024

blue angels

chinatown stockton street

chinatown stockton street

For a man's ways are before the eyes of the Lord, and he ponders all his paths – Proverbs 5:21 ESV

methodist church chinese service, they're so hospitable and welcoming. they keep saying "so glad to see young people". played ping pong after service. they had sticky rice and che thai.

space X chopsticks capture

italian heritage festival & parade

saw blue angels at fisherman's wharf. they fly down to 18 inches apart from each other. and costs ~120 mil each. the booming sounds they make are both terrifying yet thrilling at the same time.

went for a quick grocery run before the stores closes at 6pm. did a photography walk around chinatown. so impressed by the x100s, the only thing is the pictures that come out of it is dark, and requires lightroom for post-processing. but for a camera that came out 14 years ago, it's working so well.

so many things happening in the world. i'm only experiencing a tiny slice of it.

felt immensely tired today. is this the tiredness from the past week catching up on me? what is considered rest? am i even capable of rest?

10/13/2024

larkey park walnut creek

A quiet secluded life in the country, with the possibility of being useful to people to whom it is easy to do good, and who are not accustomed to have it done to them; then work which one hopes may be of some use; then rest, nature, books, music, love for one's neighbor — such is my idea of happiness. — Leo Tolstoy

basketball and bbq with IEC friends

talked about giving parents money as a culture in chinese families

napped for 2 hours

woke up feeling confused

final group assignment for regression

staked 0.02 ETH for ETHGlobal hackathon, excited to jump into web3 and see what i can build.

a few good resources

10/12/2024

fall mod 1 done

Omedetou: celebrate the small wins

highlights from fall mod 1

  • sf systems
  • went to google office for ollama meetup
  • visited salesforce park (it's a nice place to stroll after lunch)
  • waterloo house cooling party (saw a nuclear fusor in person + twitter people)
  • went to playspace and presented to 30+ people
  • traded for aerogel
  • looted free stuff from a group house
  • malaysian independence party with singaporeans
  • went to a comedy show in a bar
  • picked up trash in mission
  • bought nectar bed and set it up
  • started papertrail
  • ai devtools hackathon best demo w agentops
  • practicum meet n greet (kevala & meta)
  • meta interview grind
  • got meta with chinatown boys
  • transamerica park ping pong
  • haidilao in fremont with old and new friends
  • kbbq and karaoke with the boys
  • built aha and presented at class
  • first book club – user friendly
  • visited a presbyterian church
  • ray summit
  • cooked japanese beef curry
  • took the hardest exam in my life
  • gym'ed 18 times, currently squat 135lbs (but causes soreness in lower back) and curl 40 lbs.

two lessons from aging is another word for living by rosie spinks that i want to take into the rest of the program

lesson #11: find 25 minutes

If you think you don’t have time for hobbies, writing, fitness, reading, gardening, it’s because you’re looking for hours. Look for 25 minutes, a few times a week. That’s all you need. The first five minutes for faffing about, the next 20 are for doing the thing.

lesson #13: relish life at your busiest

There is a state of being where you are hyper-functioning, pathologically getting things done, thinking that once you tick off every last thing, you will finally get to all the good stuff. It’s possible to spend your entire life in this mode, promising yourself you’ll enjoy it all once you get to the other side. The other option is to accept there is no other side. To realize that the assignment is not to get everything done perfectly, it’s to find ways to relish life even (especially) when you’re the busiest. I will never not be re-learning this lesson, but Oliver Burkeman’s essay on adopting a “deliberate defiance toward the inner taskmaster” has really kept it front and center in my mind this year.

10/11/2024

regression done

presenting aha was fun. I was told i was comfortable in front of everyone which is a first. seems like i'm getting better at presenting. never felt nervous but my heart was beating faster and faster uncontrollably that i was losing my breath towards the end. still need to work on a confident voice. i sound like a high school kid that's still not sure about himself, like that kid from yiyi. i do not look or sound ~23.

was so tense and stressed from morning till 3:30. one of the most intense 2 hours and 10 minutes (extra time) of my life. i probably got 20-30% of it wrong, or even more, but i'm glad i got everything done. i was so close to leaving two questions blank, but i powered through, and was able to write down a proof that was sensible.

i realized solving these problems require persistence, they're never obvious from the start, and it requires some thought, it's so easy to give up early on and say "i can't do it". i think i've wired to give up quickly without being patient enough with a problem. i think if you just keep chipping away on the problem, you can get somewhere, and with a bit of luck, and sound intuition, and solid math, you just might reach the end.

i'm also realizing that math skills is so crucial, and i shouldn't skimp on math knowledge, even though i enjoy building and prototyping a lot more. i should put my stats degree to good use, double down on hard math and stats, understand the foundational stuff, because it will pay off in the future when i'm solving harder problems that require deep knowledge.


researched into Hume, a series B startup that launched a foundational audio-language model – EVI 2 that integrates empathetic AI into any product

how does the models work? it captures patterns from our speech and face:

  • Facial Expression
  • Speech Prosody (the way you say words)
  • Vocal Bursts (sighs, laughs, oohs, ahhs, umms, and shrieks)
  • Emotional Language (the words we say - explicit disclosure of emotion and implicit emotional connotation)

these reveal our preferences – whether we find things interesting or boring, funny or dubious, satisfying or frustrating, and and learns from these signals.

they call it RLHE - reinforcement learning from human expression.

these models trained with RLHE can serve as better copywriters, call center agents, tutors, etc, even in text-only interfaces.

their goal: a future in which technology draws on an understanding of human emotional expression to better serve human goals

they released the first concrete ethical guidelines for empathetic AI : Hume Initiative

they also publish their research

they have an ai engineer + writer role that caught my attention

here's the job description

We are looking for an engineer and writer to help copyedit developer materials (API documentation, tutorials) and write broad-reaching technical content (blog posts, white papers). In this role, you will create content that helps developers understand the role of emotional intelligence in AI, imagine the future of AI interfaces, and integrate our API into wide-ranging applications. You will work closely with our developer relations, product, engineering, and research teams on a diverse range of content.

The ideal candidate is a philosophically minded software developer with excellent writing and storytelling skills. They have worked with (but not necessarily written) API documentation, especially for AI tools, have informed opinions about the future of AI, and have a strong interest in science.

currently thinking of building an ios app where you talk to it throughout your day, and it summarizes your thoughts + tags your emotions with hume.ai, something like this tweet

10/10/2024

regression grind

a dump of my plan + notes for studying for my finals for a class that i should be doing well but is not because i'm just not good at math and stats apparently. might be the most i've studied for a class ever in my life

go through

  • notes
  • hw
  • quizzes
  • pq1, q1
  • pq2, q2
  • pfinals

prediction

  • distribution of SSE (sigma_hat)
  • e(sse)
  • show y_hat independent to residual
  • distribution of beta_hat
  • log reg why use logit? issues with linear model
  • explain what hii is?
  • why 0 < hii < 1
  • what is stud(ei)?
  • press

Outliers

  • outlier in x: leverage (hii) > 3p/n
  • outlier in y: discrepancy (studentized e) > t (n-1-p), 1-a/2 (outlier in
  • both: influence (cooks distance) >4/n

multicollinearity

problems: inflated SE checks:

  1. swing/change sign coefficients in f-test
  2. correlation matrix
  3. VIF

solution

  1. drop
  2. feature engineer
  3. regularized regression
  4. dimensionality reduction
  5. partial least square

heteroskedasticity

  • unbiased

detection

  • residual plot

problem: no longer BLEU -> wrong SE(beta) and CI/PI widths solution

  1. log / square
  2. boxcox
  3. robust SE
  4. WLS

if ei is non linear, use nonparametric regression (knn, moving average)

non-normal

  • still BLEU
  • no inference,

detection

  • histogram
  • qq plot
  • test for normality: shapiro

for normal: skewness: 0 (third moment) kurtosis: 3 (fourth moment)

  • omnibus k2 test (want high p-value to reject)
  • JB test

problems: unreliable t.test, wrong CI/PI

false assumption of linearity

  • transform y -> may introduce hetero if homo
  • transform x -> nice when only prob is non-linearity
  • transform both

Model selection under: biased coefs + predictions (under/overstimate), overestimate sigma2

extra vars: unbiased, MSE has fewer degrees, wider CI and lower power

over(multicol): inflated SE for coefs, rank deficient

adjusted R 1 - MSE / SST = 1 - SSE / n-p / SST / n-1 takes into account the "cost" of losing DF

Mallows Cp

  • identify subset where Cp is near k+1 where k is no of preds
  • this means bias is small
  • if all not near, missing predictor
  • if a number of them, choose model with smallest

AIC, BIC

  • estimates infomation lost in a model
  • trade-off goodness in fit vs simplicity, penalized by no. of model params (p)
  • larger penalty term in BIC than AIC : ln(n)p vs 2p

PRESS

  • modified SEE, uses predicted value for ith obs from model fit on data excluding that point

10/9/2024

data @ grammarly

Because of the routines we follow, we often forget that life is an ongoing adventure...and the sooner we realize that, the quicker we will be able to treat life as art: to bring all our energies to each encounter, to remain flexible enough to notice and admit when what we expected to happen did not happen. We need to remember that we are created creative and can invent new scenarios as frequently as they are needed.

– Maya Angelou

went to a grammarly fireside chat with a few directors in their data org

what you look for when hiring

  • base technical skills: sql, python, data manipulation (least important)
  • is this person going to make our business successful? is grammarly going to grow the user base, make money
  • people who understand business problems, can frame problems and solve them, has applied acumen
  • create experiments that can drive business

recommendations for early data career

  • find right mentors, strong set of leaders and peers, that will teach you more than anything else
  • exposure to multiple aspects of problem solving, not just technical stuff, but cross-collaborate with stakeholders, how to report status, how to link your output with business impact, communication skills
  • look into the tech stack of the company, work on the latest stuff
  • company size, startup vs large company? structure is important

day2day for new hire on data platforms

  • ingestion side, infra to bring data in
  • data governance, compliance
  • data eng: data modeling, transformations
  • analytics & ML: data cleanliness and quality and availability
  • everyone is a strong SWE, systems engineer (infra), data engineer (DE)
  • strong generalist SWE
  • databricks

big vs small company

  • think about impact vs prestige
  • big: prestige (name brand + density of talent)
    • mentoring on best practices, how to function these huge machines
  • small: impact
    • have stories that your work directly move X millions of revenue
  • find a company that can give you impact (bullet points on resume) + prestige

role of data science evolving (skills?)

  • understanding the business context, the space you're working in, the elements you need to solve that
  • less on fundamentals of the models, more on the application
  • foundational statistics and knowledge intuition
  • when things don't go well, really understanding exp design, causality, foundations of statistics will matter the most

AI companies

  • robust system to measure quality of AI, human eval, responsible AI metrics
  • subject matter experts on staff, people that can inform the models, everything that AI does is informed by human choices, who is making the choices?
  • what does the data infra look like? is that company investing in the data? serious test on companies investment on AI

most innovative + impactful project

  • orchestration framework for potential accounts to reach out to for sales team (ml, experimentation, llms to produce content)
  • 50b events a day, making informed decisions from this
  • suggestion quality, what is quality?

grammarly in llm era

  • huge advantage in user context across different services
  • lots of potential to innovate, supercharged efforts
  • creates interesting data for feedback mechanism for marketing

some interesting work i encountered on their engineering blog


migraine was back in the morning and i couldn't do anything at all in the morning besides cook lunch. acetaminophen kicked in 3 hours after i took it at 11am and i could finally function for most of today. pretty cool that i got to meet alumnis from cohort 11. when i graduate, i'll be meeting future students too. and i'll be saying the same things and giving them advice.

10/8/2024

migraines

"Nothing is absolute. Everything changes, everything moves, everything revolves, everything flies and goes away." – Frida Kahlo

i had really bad migraine today again. where i just feel really nauseous and dizzy. i walk slower. i'm perpetually thirsty. i feel sick down to my bones. when i try to focus on something, i can feel my brain compressing. not really sure why this is happening. it's been like this since i readjusted my braces. it's probably that. i was told by my nurse it should be better this time though. hope this won't last for another 16.5 months.

my productivity and overall output is dramatically decreased. this is not ideal, especially during finals week. i'm only 22 about to turn 23. i shouldn't be this tired everyday, getting headaches and feeling sick often. there are old people in their 70s that have more energy than me. the elderly in church have more youthful energy. what can i do to feel better. maybe it's sleep. i need to get 8 hours. i've been getting 6 these days. time for bed. nothing is going into my brain today.

10/7/2024

View the archives