Benchmarks
FinGPT benchmarks demonstrate domain performance, cost efficiency, and practical readiness for financial AI applications.
Zero-shot Evaluation on Financial PhraseBank
| Model | Accuracy | F1 |
|---|---|---|
| Instruct-FinGPT | 0.76 | 0.74 |
| ChatGPT 4.0 | 0.64 | 0.51 |
| Llama-7B | 0.60 | 0.40 |
| ChatGLM2-6B | 0.47 | 0.40 |
| BloombergGPT | — | 0.51 |
Training Cost Comparison Between LLMs
| Model | Devices | Time | Cost |
|---|---|---|---|
| Instruct-FinGPT | 8 × A100 | 2 hours | $65.6 |
| ChatGLM2 | 64 × A100 | 2.5 days | $14,976 |
| BloombergGPT | 512 × A100 | 53 days | $2.67M |
| Llama3 | 2048 × A100 | 21 days | $4.23M |
* Based on $4.1 per GPU per hour (A100)
FinGPT-Benchmark
FinGPT uses instruction tuning to adapt open-source LLMs for financial tasks — enabling cost-effective fine-tuning across sentiment analysis, entity recognition, and more with task-specific, multi-task, and zero-shot paradigms.
What is the sentiment of this news? Please choose from {negative / neutral / positive}.
Does the news headline talk about price going up? Please choose from {Yes / No}.
Find all entities in the input text. Answer with format "entity1: type1; entity2: type2".
Extract the word/phrase pair and the corresponding lexical relationship from the input text.
What is the entity type of 'Bank' in the input sentence? Options: person, location, organization.
Choose the right relationship between 'Apple Inc' and 'Steve Jobs'. Options: industry, founded by, owner of...
Each task trains its own model independently
All tasks train a single shared model jointly
Hold out one task, train on others, test zero-shot transfer