Benchmarks

FinGPT benchmarks demonstrate domain performance, cost efficiency, and practical readiness for financial AI applications.

Zero-shot Evaluation on Financial PhraseBank

Model	Accuracy	F1
Instruct-FinGPT	0.76	0.74
ChatGPT 4.0	0.64	0.51
Llama-7B	0.60	0.40
ChatGLM2-6B	0.47	0.40
BloombergGPT	—	0.51

Training Cost Comparison Between LLMs

Model	Devices	Time	Cost
Instruct-FinGPT	8 × A100	2 hours	$65.6
ChatGLM2	64 × A100	2.5 days	$14,976
BloombergGPT	512 × A100	53 days	$2.67M
Llama3	2048 × A100	21 days	$4.23M

* Based on $4.1 per GPU per hour (A100)

FinGPT training cost:$65.6vs$2.67M(BloombergGPT)vs$4.23M(Llama3)

Higher Performance

Domain-tuned models outperform general-purpose LLMs on financial tasks

Lower Training Cost

Fine-tune for ~$65 vs millions for training from scratch

Better Adaptation

Instruction tuning enables rapid adaptation to new financial datasets

Production Ready

Strong fit for both research exploration and production deployment

FinGPT-Benchmark

FinGPT uses instruction tuning to adapt open-source LLMs for financial tasks — enabling cost-effective fine-tuning across sentiment analysis, entity recognition, and more with task-specific, multi-task, and zero-shot paradigms.

Tasks

Instruction Construction

Base Models

Instruction Tuning Paradigm

SASentiment Analysis

HeadlineHeadline Analysis

NERNamed Entity Recognition

RERelation Extraction

NER (CLS)NER Classification

RE (CLS)RE Classification

What is the sentiment of this news? Please choose from {negative / neutral / positive}.

Does the news headline talk about price going up? Please choose from {Yes / No}.

Find all entities in the input text. Answer with format "entity1: type1; entity2: type2".

Extract the word/phrase pair and the corresponding lexical relationship from the input text.

What is the entity type of 'Bank' in the input sentence? Options: person, location, organization.

Choose the right relationship between 'Apple Inc' and 'Steve Jobs'. Options: industry, founded by, owner of...

Llama3ChatGLM3BLOOMFalconMPTQwen⋮

Step 1Task-Specific Instruction Tuning

Each task trains its own model independently

Task 1→Base Model 1→Respond 1

Task 2→Base Model 2→Respond 2

Task 3→Base Model 3→Respond 3

Step 2Multi-Task Instruction Tuning

All tasks train a single shared model jointly

Task 1

Task 2

Task 3

↘→↗

Shared Model

↗→↘

Respond 1

Respond 2

Respond 3

Step 3Zero-Shot Instruction Tuning

Hold out one task, train on others, test zero-shot transfer

Task 1

Task 2

Task 3 (held out)

↘→⤳

Base Model

↗→↘

Respond 1

Respond 2

Respond 3 (zero-shot)