Challenges of building LLM applications for production by Chip Huyen
- Inconsistency
- how to ensure user experience consistency?
- same input -> different outputs
- small input changes -> big output changes (temperature = 0 won't fix it)
- how to ensure downstream apps can run without breaking?
- no output schema guarantee
- how to ensure user experience consistency?
- Hallucination
- Compliance + privacy
- buy: are APIs compliant?
- build: what if your in-house chatbot leaks sensitive info?
- Context length
- Data drift
- even when provided new data, existing models trained on past data fails to generalize to answer questions asked in the present (SituatedQA)
- Forward & Backward compatibility
- same model, new data
- how to make sure prompts still work with newer models?
- LLM on the edge
- healthcare devices, autonomous vehicles, drive-thru voice bots, personal AI assistant trained on personal data
- on device inference
- training
- on-device training: bottlenecked by compute + memory + tech available
- if trained on server:
- how to incorporate device's data?
- how to send model's updates to device?
- choose a model size
- 7B param model (depending on sparsity)
- $100 to fine tune
- $25,000 to train from scratch
- 7B param model (depending on sparsity)
- LLM for non-english languages
- performance:
- tokenization (latency & cost differs with language)
- Efficiency of Chat as an interface
- chat is not efficient but very robust
- Data bottleneck
- rate of training dataset size growth is much faster rate of new data being generated
- internet is being rapidly populated with AI-generated text
More on Chip Huyen's article: Building LLM applications for production and in this video series.