HiPo
Data & AIFinancial Services

Training reasoning models for multi-step portfolio optimization

TIFIN

The challenge

Off-the-shelf models struggled with the multi-step reasoning required for portfolio optimization and knowledge-graph intent extraction, and TIFIN needed a repeatable way to keep improving model quality without ballooning compute costs.

Our approach

We built end-to-end AI training and inference pipelines across the Qwen, LLaMA, and GPT model families, with advanced caching and vLLM-based serving for real-time financial insights. We applied DPO and GRPO to train reasoning models for multi-step portfolio optimization, with GRPO achieving 93% knowledge-graph intent extraction accuracy via group-normalized advantage estimation. We also led SFT on frontier-model-generated synthetic data for conversational multi-agent systems.

Results

  • GRPO achieved 93% knowledge-graph intent extraction accuracy
  • 4x more gradient signal than DPO, with no reward model overhead
  • Established scalable training cadences with continuous model improvement cycles

Have a similar challenge?

Let's discuss how we can help you achieve results like these.