Benchmarks

FinGPT benchmarks demonstrate domain performance, cost efficiency, and practical readiness for financial AI applications.

Zero-shot Evaluation on Financial PhraseBank

ModelAccuracyF1
Instruct-FinGPT0.760.74
ChatGPT 4.00.640.51
Llama-7B0.600.40
ChatGLM2-6B0.470.40
BloombergGPT0.51

Training Cost Comparison Between LLMs

ModelDevicesTimeCost
Instruct-FinGPT8 × A1002 hours$65.6
ChatGLM264 × A1002.5 days$14,976
BloombergGPT512 × A10053 days$2.67M
Llama32048 × A10021 days$4.23M

* Based on $4.1 per GPU per hour (A100)

FinGPT training cost:$65.6vs$2.67M(BloombergGPT)vs$4.23M(Llama3)
Higher Performance
Domain-tuned models outperform general-purpose LLMs on financial tasks
Lower Training Cost
Fine-tune for ~$65 vs millions for training from scratch
Better Adaptation
Instruction tuning enables rapid adaptation to new financial datasets
Production Ready
Strong fit for both research exploration and production deployment

FinGPT-Benchmark

FinGPT uses instruction tuning to adapt open-source LLMs for financial tasks — enabling cost-effective fine-tuning across sentiment analysis, entity recognition, and more with task-specific, multi-task, and zero-shot paradigms.

Tasks
Instruction Construction
Base Models
Instruction Tuning Paradigm
SASentiment Analysis
HeadlineHeadline Analysis
NERNamed Entity Recognition
RERelation Extraction
NER (CLS)NER Classification
RE (CLS)RE Classification

What is the sentiment of this news? Please choose from {negative / neutral / positive}.

Does the news headline talk about price going up? Please choose from {Yes / No}.

Find all entities in the input text. Answer with format "entity1: type1; entity2: type2".

Extract the word/phrase pair and the corresponding lexical relationship from the input text.

What is the entity type of 'Bank' in the input sentence? Options: person, location, organization.

Choose the right relationship between 'Apple Inc' and 'Steve Jobs'. Options: industry, founded by, owner of...

Llama3ChatGLM3BLOOMFalconMPTQwen
Step 1Task-Specific Instruction Tuning

Each task trains its own model independently

Task 1Base Model 1Respond 1
Task 2Base Model 2Respond 2
Task 3Base Model 3Respond 3
Step 2Multi-Task Instruction Tuning

All tasks train a single shared model jointly

Task 1
Task 2
Task 3
Shared Model
Respond 1
Respond 2
Respond 3
Step 3Zero-Shot Instruction Tuning

Hold out one task, train on others, test zero-shot transfer

Task 1
Task 2
Task 3 (held out)
Base Model
Respond 1
Respond 2
Respond 3 (zero-shot)