Post
789
๐ Meta teams use a fine-tuned Llama model to fix production issues in seconds
One of Meta's engineering teams shared how they use a fine-tuned small Llama (Llama-2-7B, so not even a very recent model) to identify the root cause of production issues with 42% accuracy.
๐ค 42%, is that not too low?
โก๏ธ Usually, whenever there's an issue in production, engineers dive into recent code changes to find the offending commit. At Meta's scale (thousands of daily changes), this is like finding a needle in a haystack.
๐ก So when the LLM-based suggestion is right, it cuts incident resolution time from hours to seconds!
How did they do it?
๐ Two-step approach:
โฃ Heuristics (code ownership, directory structure, runtime graphs) reduce thousands of potential changes to a manageable set
โฃ Fine-tuned Llama 2 7B ranks the most likely culprits
๐ Training pipeline:
โฃ Continued pre-training on Meta's internal docs and wikis
โฃ Supervised fine-tuning on past incident investigations
โฃ Training data mimicked real-world constraints (2-20 potential changes per incident)
๐ฎ Now future developments await:
โฃ Language models could handle more of the incident response workflow (runbooks, mitigation, post-mortems)
โฃ Improvements in model reasoning should boost accuracy further
Read it in full ๐ https://www.tryparity.com/blog/how-meta-uses-llms-to-improve-incident-response
One of Meta's engineering teams shared how they use a fine-tuned small Llama (Llama-2-7B, so not even a very recent model) to identify the root cause of production issues with 42% accuracy.
๐ค 42%, is that not too low?
โก๏ธ Usually, whenever there's an issue in production, engineers dive into recent code changes to find the offending commit. At Meta's scale (thousands of daily changes), this is like finding a needle in a haystack.
๐ก So when the LLM-based suggestion is right, it cuts incident resolution time from hours to seconds!
How did they do it?
๐ Two-step approach:
โฃ Heuristics (code ownership, directory structure, runtime graphs) reduce thousands of potential changes to a manageable set
โฃ Fine-tuned Llama 2 7B ranks the most likely culprits
๐ Training pipeline:
โฃ Continued pre-training on Meta's internal docs and wikis
โฃ Supervised fine-tuning on past incident investigations
โฃ Training data mimicked real-world constraints (2-20 potential changes per incident)
๐ฎ Now future developments await:
โฃ Language models could handle more of the incident response workflow (runbooks, mitigation, post-mortems)
โฃ Improvements in model reasoning should boost accuracy further
Read it in full ๐ https://www.tryparity.com/blog/how-meta-uses-llms-to-improve-incident-response