FinGPT

Ecosystem

A connected open-source ecosystem spanning financial language models, reinforcement learning, trading systems, and AI agents.

FinGPT Architecture

A modular stack connecting financial applications, domain tasks, models, data engineering, and data sources.

FinGPT Architecture

A modular stack connecting financial applications, domain tasks, models, data engineering, and data sources.

Applications
Robo-advisorSentiment AnalysisPortfolio OptimizationRisk ManagementQuantitative TradingESG ScoringFraud DetectionCredit ScoringM&A Forecasting
Tasks
SummarizationNERInformation ExtractionSentiment AnalysisData AnalysisNumerical ReasoningIntent Detection
LLMs
APIs:
ChatGPTClaudeGeminiMistral
Trainable:
Llama3ChatGLM3QwenFalconInternLM
Methods:
LoRA / QLoRARAGChain-of-ThoughtRLSP
Data Engineering
Data CleaningTokenizationVector EmbeddingFeature ExtractionData Augmentation
Data Sources
News:
FinnhubYahoo FinanceCNBC
Social:
TwitterRedditWeibo
Filings:
SECNYSENASDAQ
Datasets:
ASharestocknet-dataset

FinGPT is part of the broader AI4Finance open-source ecosystem, connecting research innovation with deployable financial AI systems.

FinGPT-Benchmark

FinGPT uses instruction tuning to adapt open-source LLMs for financial tasks — enabling cost-effective fine-tuning across sentiment analysis, entity recognition, and more with task-specific, multi-task, and zero-shot paradigms.

Tasks
Instruction Construction
Base Models
Instruction Tuning Paradigm
SASentiment Analysis
HeadlineHeadline Analysis
NERNamed Entity Recognition
RERelation Extraction
NER (CLS)NER Classification
RE (CLS)RE Classification

What is the sentiment of this news? Please choose from {negative / neutral / positive}.

Does the news headline talk about price going up? Please choose from {Yes / No}.

Find all entities in the input text. Answer with format "entity1: type1; entity2: type2".

Extract the word/phrase pair and the corresponding lexical relationship from the input text.

What is the entity type of 'Bank' in the input sentence? Options: person, location, organization.

Choose the right relationship between 'Apple Inc' and 'Steve Jobs'. Options: industry, founded by, owner of...

Llama3ChatGLM3BLOOMFalconMPTQwen
Step 1Task-Specific Instruction Tuning

Each task trains its own model independently

Task 1Base Model 1Respond 1
Task 2Base Model 2Respond 2
Task 3Base Model 3Respond 3
Step 2Multi-Task Instruction Tuning

All tasks train a single shared model jointly

Task 1
Task 2
Task 3
Shared Model
Respond 1
Respond 2
Respond 3
Step 3Zero-Shot Instruction Tuning

Hold out one task, train on others, test zero-shot transfer

Task 1
Task 2
Task 3 (held out)
Base Model
Respond 1
Respond 2
Respond 3 (zero-shot)

FinNLP — Data Curation

FinGPT's data pipeline covers financial news, social media, filings, and research datasets — with feature engineering, data cleaning, and unified data access across 30+ providers.

NLP Data Sources

Financial text data from news, social media, regulatory filings, and research datasets.

News
Yahoo FinanceReutersSeekingAlphaPennyStocksMarketWatchCNBCThe FlyTalkMarketsAlliance NewsGuruFocusInvestorPlaceTipRanksFinnhubAkshareEastmoneySinaTushare
Social Media
TwitterRedditWeiboXueqiuStockTwitsEastmoneyFacebook
Filings
SECJuchao
Research Datasets
AShareCHRNNFiQAStocknetTrade The EventFPB
Feature Engineering
Fundamental Features
Financial RatiosAssetsLiabilitiesSales
Market Features
OpenHighLowCloseVolume
Analytics Features
News Sentiment
Alternative Features
Social MediaESGGoogle Trends

FinGPT-RAG

A retrieval-augmented generation framework for financial sentiment analysis. Most financial news lacks adequate context — FinGPT-RAG uses instruction tuning combined with multi-source knowledge retrieval to fill context gaps and enhance information depth.

By integrating external knowledge retrieval, the LLMs respond more accurately to financial sentiment analysis tasks, achieving performance improvements of 15% to 48% in accuracy and F1 scores.

RAG Pipeline

End-to-end flow from knowledge retrieval to instruction-tuned inference.

Retrieval-Augmented Generation
1.Multi-Source Knowledge Querying
2.Similarity-based Retrieval
Prompt Construction
1.Prompt with Query
2.Retrieved Context (Full Context)
LLMs Call
1.Inference
2.Training
Instruction Tuning
1.Supervised Sentiment Analysis Dataset
2.Instruction-Following Data Construction
3.Base Model Selection (Llama2-7B, ChatGLM2-6B, etc)
Financial Knowledge Sources

Multi-source retrieval from news, research platforms, and social media for richer context.

News Sources
BloombergReutersYahoo FinanceCNBCMarketWatch
Research Platforms
Goldman Sachs MarqueeCiti VelocitySeeking Alpha
Social Media
TwitterRedditStockTwitsWeibo
RAG in Action — Sentiment Analysis Example
Without RAG

"$ENR - Energizer shakes off JPMorgan's bear call."

Instruction-tuned LLM
Neutral
With RAG

"$ENR - Energizer shakes off JPMorgan's bear call."

Multi-source retrieval
Instruction-tuned LLM
Positive
Retrieved Context

"JPMorgan hikes Energizer Holdings (NYSE:ENR) to a Neutral rating from Underweight... We came away encouraged by some of the company's initiatives and believe their focus on innovation and brand investment can lead to relative outperformance going forward... Shares of Energizer are 0.46% premarket to $50.44."

RAG Performance — Twitter Validation Dataset

Accuracy and F1 scores with and without retrieval-augmented generation.

ModelAccuracyF1
ChatGPT 4.0 w/o RAG0.7880.652
ChatGPT 4.0 w/ RAG0.8130.708
FinGPT w/o RAG0.8630.811
FinGPT w/ RAG0.8810.842