SA
Skip to main content

2024-01-21

my AI-powered career exploration app (Wanderer) has been experiencing explosive growth and my GPT-4 costs were starting to pile up (USD 100+ a day !💀) here's the playbook I used to lower my AI costs by 99%, while also decreasing latency and maintaining quality:

  1. start with the most powerful model for your app's usecase. for 95% of companies, this is GPT-4 (not GPT-4-turbo). you want the best quality outputs, as we'll be using this to fine-tune a smaller model.
  2. store your AI requests/responses so they can be easily exported. i personally use @helicone_ai for this and love it. easy swap-in with OpenAI APIs and it stores all of your AI requests in an exportable table.
  3. once you've collected ~100-500+ request/response pairs, export them and clean the data so that the inputs and outputs are of high quality. if you collect feedback from your users (e.g. thumbs up thumbs down), you can potentially use this as well
  4. with your clean dataset, use a hosted OSS AI service like Together or Anyscale to fine-tune Mixtral 8x7B. you can also try to fine-tune GPT-3.5-Turbo on OpenAI, but I've personally gotten better results from Mixtral !
  5. swap out GPT-4 with your fine-tuned model and enjoy your healthy margins !

Links to This Note