This tutorial details how to create a GraphRAG (Graph-based Retrieval Augmented Generation) to conduct economic data analysis. It will focus on combining World Bank Data with Unstructured Reports.
Introduction
In today’s data-driven world, economic analysts are plagued with information in various forms. This can create a significant challenge in being able to extract valuable insights that are scattered across structured databases and unstructured documents. While the World Bank’s World Development Indicators (WDI) provide rich quantitative data, the context and explanations for economic trends often lie within IMF reports, OECD analyses, and policy papers. Traditional Retrieval-Augmented Generation (RAG) systems struggle to connect these disparate information sources effectively.
GraphRAG, or Ontology-based RAG (OBR), improves this approach by using knowledge graphs to create explicit relationships between structured data points and unstructured textual information. This integration enables sophisticated economic analysis that would be otherwise be difficult and time-consuming with conventional methods.

Why GraphRAG for Economic Analysis?
Limitations of Traditional RAG
Traditional RAG systems face several challenges when dealing with economic data:
- Relationship Blindness: Questions like “What economic policies contributed to Brazil’s GDP growth in 2020, and how do they compare to Argentina’s approach?” require understanding complex relationships between countries, policies, and indicators that traditional RAG cannot easily traverse.
- Context Fragmentation: Economic indicators in isolation provide limited insight. Understanding why inflation spiked in a particular country requires connecting quantitative data with policy decisions, external shocks, and historical context found in reports.
- Multi-hop Reasoning: Analyzing regional economic patterns or policy spillover effects requires connecting multiple data points and documents that may not be explicitly linked in traditional systems.
Think of it as creating a comprehensive map of economic relationships. Traditional methods are like having separate city maps for different neighborhoods, which is useful individually but lacking the connections between areas. GraphRAG creates the complete metropolitan map, showing how economic indicators in one country relate to policy decisions, how regional trends connect across borders, and how institutional analyses provide context for quantitative patterns.

GraphRAG Advantages
GraphRAG addresses these limitations through several key innovations:
- Creating Explicit Relationships: Connecting countries, indicators, time periods, policies, and events in a structured graph.
- Enabling Complex Queries: Supporting questions that require traversing multiple relationships and data sources.
- Providing Provenance: Offering clear paths from questions to source data and documents.
Now let’s get to the fun part, let’s see how we can build a GraphRAG system that incorporates real-world data with economic and policy analysis to answer complex questions.
Tutorial: Building an Economic Analysis GraphRAG System

Let’s build a practical GraphRAG system that combines World Bank WDI data with unstructured economic reports and analyses. If you’d like to copy the demo you can find the gitlab repo here: GraphRAG Tutorial Repo.
Prerequisites and Setup
Before we begin, let’s understand what tools we’ll be using and why:
- Neo4j: Graph database for storing entities and relationships.
- Qdrant: Vector database for semantic search over documents (alternatives: Milvus, Weaviate, Elastisearch).
- spaCy: Natural language processing for entity extraction.
- LangChain: Framework for local LLM integration and text processing, we will also show the Google Cloud integration to use Gemini.
- World Bank API: Source for structured economic data.
# Install required packages
pip install neo4j pandas requests langchain langchain-ollama python-dotenv spacy transformers sentence-transformers qdrant-client google-genai dotenv pdfplumber
# download the spaCy english model pipeline
python -m spacy download en_core_web_sm
Best Practices for File Organization and Environment Configuration
While this tutorial consolidates all code into a single file for simplicity, production applications should follow modular design principles. Splitting your project into separate files improves debugging, testing, and maintainability. These are essential practices for scalable data applications.
For a real-world example of proper file organization, check out the Data Sense GitLab repository, which demonstrates how to structure your project files effectively.
Managing API Keys and Configuration with Environment Variables
This project uses environment variables to securely store sensitive information like API keys and database URLs. We’ve implemented the dotenv
library to load configuration data from an environment file, which should contain:
- Neo4j database credentials and connection URLs
- Qdrant vector database API keys and endpoints
- Google API keys and service URLs
Pro tip: Whether you’re using cloud services or local development environments, you can easily switch between configurations by updating the values in your environment file—no code changes required.
Throughout this tutorial, you’ll see these environment variables accessed using the following code pattern:
os.getenv("VARIABLE")
Step 1: Environment Setup
Import Libraries
import pandas as pd
import numpy as np
from neo4j import GraphDatabase
import spacy
from langchain_ollama import OllamaEmbeddings, OllamaLLM
from langchain.text_splitter import RecursiveCharacterTextSplitter
from google import genai
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import requests
import os
from typing import List, Dict, Tuple
import json
from datetime import datetime
import re
# Load environment variables
from dotenv import load_dotenv
load_dotenv()
Setting Up Gemini API for Your Project
To use Gemini, install the Google Cloud CLI and connect it to your Google Cloud project. Follow the official Google Cloud CLI installation guide for detailed setup instructions.
This tutorial requires the Gemini API for text processing and embeddings. While Vertex AI offers additional embedding models, we’ll use Gemini’s standard embedding model for simplicity and compatibility.
Quick Setup Steps:
- Install Google Cloud CLI
- Configure gcloud with your project credentials
- Enable Gemini API access
- Obtain your API key for authentication
Initialize our Core Components
# spaCy for Natural Language Processing (NLP) for entity extraction from documents
nlp = spacy.load("en_core_web_sm")
# Google client for Google's embedding model and Gemini LLM
google_client = genai.Client(api_key=os.getenv('GOOGLE_API_KEY'))
# Local embeddings and LLM with Ollama
ollama_embeddings = OllamaEmbeddings(model=os.getenv('OLLAMA_EMBEDDINGS_MODEL'), base_url=os.getenv('OLLAMA_BASE_URL'))
ollama_llm = OllamaLLM(model=os.getenv('OLLAMA_MODEL'), base_url=os.getenv('OLLAMA_BASE_URL'), temperature=0.1). # Set the temperature low for more factual, consistent responses
# Database Connections
# Neo4j for graph storage; stores entities and relationships
neo4j_driver = GraphDatabase.driver(
os.getenv('NEO4J_URL'), auth=(os.getenv("NEO4J_USER"), os.getenv("NEO4J_PASSWORD"))
)
# Qdrant for vector storage; enables semantic search over documents
qdrant_client = QdrantClient(url=os.getenv('QDRANT_ENDPOINT'), port=6333, api_key=os.getenv('QDRANT_API_KEY'))
Why this setup?
- Dual storage approach: Graph database for structured relationships, vector database for semantic similarity.
- Low-temperature LLM: Reduces hallucinations for factual economic analysis.
- Compact embedding model: Balances quality with performance for production use.
Step 2: Collecting Structured Data from World Bank
Building a World Bank Data Collector for Economic Analytics
Now let’s create a data collector for the World Bank’s World Development Indicators (WDI), which will serve as our primary source of structured economic data. The WDI database contains over 1,400 time series indicators covering global development metrics including GDP, population, education, health, and environmental data across 200+ countries.
Understanding the World Bank API
The World Bank API provides access to thousands of economic indicators. Each API call follows this pattern:
- Base URL: https://api.worldbank.org/v2
- Structure: /country/{country_code}/indicator/{indicator_code}
- Format: JSON responses with metadata and data arrays
Building the Data Collector
class WorldBankDataCollector:
"""
Collects data from World Bank WDI API.
This class handles API calls, error handling, and data normalization to create clean pandas dataframes for graph
construction.
"""
def __init__(self):
self.base_url = "https://api.worldbank.org/v2"
def get_indicators(self, countries: List[str], indicators: List[str],
start_year: int = 2010, end_year: int = 2024) -> pd.DataFrame:
"""
Get the WDI data for specified countries and indicators.
:param countries: List of countries to collect data for.
:param indicators: List of indicators to collect data for.
:param start_year: Beginning year to collect data for.
:param end_year: Ending year to collect data for.
:return: Pandas DataFrame with columns: country_code, country_name, indicator_code, indicator_name, year, value
"""
all_data = []
# Iterate through countries and indicators to construct API urls
for country in countries:
for indicator in indicators:
url = f"{self.base_url}/country/{country}/indicator/{indicator}"
params = {
'data': f"{start_year}:{end_year}",
'format': 'json',
'per_page': 1000,
}
try:
response = requests.get(url, params=params)
data = response.json()
# WB WDI API returns [metadata, data]
if len(data) > 1 and data[1]:
for row in data[1]:
# Extract relevant data
all_data.append({
'country_code': row['country']['id'],
'country_name': row['country']['value'],
'indicator_code': row['indicator']['id'],
'indicator_name': row['indicator']['value'],
'year': row['date'],
'value': row['value']
})
except requests.exceptions.RequestException as e:
print(f"Error fetching {indicator} for {country}: {e}")
return pd.DataFrame(all_data)
Selecting Key Economic Indicators for Country Analysis
Next, we’ll define a curated list of economic indicators that provide comprehensive insights into a country’s economic performance. The following key indicators offer a well rounded view of economic health and development trends:
# Initialize our data collector
wb_collector = WBDataCollector()
# Key economic indicators - these codes represent specific WDI metrics
indicators = [
'NY.GDP.MKTP.KD.ZG', # GDP growth (annual %) - economic growth
'FP.CPI.TOTL.ZG', # Inflation, consumer prices (annual %) - price stability
'SL.UEM.TOTL.ZS', # Unemployment, total (% of total labor force) - labor market
'NE.TRD.GNFS.ZS', # Trade (% of GDP) - economic openness
'GC.DOD.TOTL.GD.ZS' # Central government debt, total (% of GDP) - fiscal health
]
# Focus on Latin American countries for this example
# Using ISO 2-letter country codes
countries = [
'BR', # Brazil
'AR', # Argentina
'CL', # Chile
'CO', # Colombia
'MX', # Mexico
'PE' # Peru
]
We can test the WDI data collector with the following code to make sure it’s working and get a preview of the data.
# Scrape the data
wb_collector = WorldBankDataCollector()
wdi_data = wb_collector.get_indicators(countries, indicators)
# Clean the data
wdi_data = wdi_data.dropna(subset=['value'])
# Print a preview
print("\nSample data:")
print(wdi_data.head())
print(wdi_data.describe())
Why these indicators?
- GDP Growth: Shows economic expansion/contraction over time.
- Inflation: Indicates monetary policy effectiveness and price stability.
- Unemployment: Reflects labor market health and social conditions.
- Trade: Shows economic integration and competitiveness.
- Government Debt: Indicates fiscal sustainability and policy space.
Data Quality Considerations
When working with World Bank economic indicators, several data quality factors require attention to ensure accurate analysis.
Handling Missing Data World Bank datasets often contain gaps for specific countries or years. Rather than removing incomplete records, consider these data imputation techniques:
- Forward fill (ffill) – carries last known value forward
- Backward fill (bfill) – uses next available value to fill gaps
- Rolling averages (rolling) – smooths data using neighboring time periods
While this tutorial removes NaN values for simplicity, production analyses should evaluate which imputation method best fits your dataset’s characteristics.
Data Revision and Versioning – World Bank indicators undergo regular revisions as new information becomes available. Timestamping your data collections ensures reproducibility and tracks when specific values were captured.
Cross-Country Methodology Differences – Economic indicators may use varying calculation methodologies across countries, potentially limiting direct comparability. For this tutorial, we assume consistent methodology across our selected countries, though real-world analyses should account for these methodological differences when drawing conclusions.
Step 3: Processing Unstructured Documents
Integrating Unstructured Economic Data Sources
Now we’ll incorporate unstructured data sources including IMF reports, OECD analyses, and economic research papers. The primary challenge involves breaking these documents into meaningful chunks while preserving contextual relationships. These processed chunks become vector embeddings stored in our database, enabling efficient semantic search and retrieval.
Understanding Document Chunking for RAG Systems
Document chunking plays a critical role in GraphRAG implementation for several key reasons:
Token Limit Management – Large Language Models have strict context windows, requiring documents to be divided into digestible segments that fit within token constraints.
Enhanced Semantic Search – Chunking complete thoughts or concepts rather than arbitrary text blocks improves retrieval accuracy by maintaining semantic coherence within each segment.
Precise Information Retrieval – Smaller, focused chunks enable more targeted searches, allowing the system to surface exactly relevant information rather than entire documents.
Key Chunking Decisions
- Chunk Size (1000 chars): Large enough for context, small enough for precision
- Overlap (200 chars): Prevents important information from being split across chunks
- Separator Priority: Preserves document structure by preferring paragraph breaks
- Metadata Preservation: Each chunk retains source information for provenance
Defining the Document Collection and Chunking Class
Below we define our class to collect and chunk documents from various economic data sources.
class DocumentCollector:
"""
Processes unstructured documents into chunks suitable for graph construction
and vector embedding.
The chunking strategy balances between maintaining semantic coherence
and keeping chunks small enough for effective retrieval.
"""
def __init__(self):
# RecursiveCharacterTextSplitter tries different separators in order
# This preserves document structure better than simple character splitting
self.text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # Target chunk size in characters
chunk_overlap=200, # Overlap prevents context loss at boundaries
separators=[ # Try these separators in order:
"\n\n", # Paragraph breaks (preferred)
"\n", # Line breaks
". ", # Sentence endings
"! ", # Exclamations
"? " # Questions
]
)
def process_document(self, text: str, doc_metadata: Dict) -> List[Dict]:
"""
Split document into chunks with preserved metadata.
Args:
text: Raw document text content
doc_metadata: Document information (source, title, date, etc.)
Returns:
List of chunk dictionaries with content and metadata
"""
# Split text into chunks using our strategy
chunks = self.text_splitter.split_text(text)
processed_chunks = []
for i, chunk in enumerate(chunks):
# Create chunk with unique ID and inherited metadata
chunk_data = {
'content': chunk,
'doc_id': doc_metadata.get('doc_id'),
'chunk_id': f"{doc_metadata.get('doc_id')}_chunk_{i}",
'source': doc_metadata.get('source'), # IMF, OECD, etc.
'title': doc_metadata.get('title'),
'date': doc_metadata.get('date', ''),
'doc_type': doc_metadata.get('doc_type', 'report')
}
processed_chunks.append(chunk_data)
return processed_chunks
Sample Economic Documents for Analysis
For this tutorial, we’ll use sample economic documents that represent typical data sources encountered in financial analysis. These documents mirror real-world formats from institutions like the IMF, OECD, and central banks.
The GitLab repository and YouTube tutorial include an OCR function for PDF processing, enabling extraction of text from scanned economic reports and research papers.
# Example documents - in practice, you'd load these from PDFs or web scraping
sample_documents = [
{
'doc_id': 'imf_brazil_2022',
'title': 'IMF Article IV Consultation: Brazil 2022',
'source': 'IMF',
'date': '2022-07-15',
'doc_type': 'country_report',
'content': """
Brazil's economy showed resilience in 2021-2022 despite global challenges. GDP growth reached 4.6% in 2021, supported by fiscal stimulus and commodity prices. However, inflation pressures emerged, reaching 10.1% by end-2021, prompting aggressive monetary tightening by the Central Bank. The fiscal situation remains challenging with government debt at 88% of GDP. Key structural reforms in labor markets and pension systems have supported medium-term growth prospects. External vulnerabilities remain contained with adequate international reserves and flexible exchange rate regime.
"""
},
{
'doc_id': 'oecd_latam_2023',
'title': 'OECD Economic Outlook: Latin America 2023',
'source': 'OECD',
'date': '2023-03-20',
'doc_type': 'regional_analysis',
'content': """
Latin American economies face headwinds from global financial tightening and China's slowdown. Regional growth is projected to slow to 1.3% in 2023. Argentina continues to grapple with high inflation exceeding 100% and currency pressures. Chile's economy contracted due to social unrest impacts and mining sector challenges. Mexico benefits from nearshoring trends and strong US demand. Structural challenges include low productivity growth, income inequality, and climate adaptation needs across the region.
"""
}
]
Processing Documents into Chunks
Using our sample documents, we’ll process them into chunks with the DocumentCollector class and display a preview to understand the chunk structure and segmentation results.
# Process all documents into chunks
doc_collector = DocumentCollector()
all_chunks = []
print("Processing documents into chunks...")
for doc in sample_documents:
# Split each document and add to our collection
chunks = doc_collector.process_document(doc['content'], doc)
all_chunks.extend(chunks)
print(f"Document '{doc['title']}' split into {len(chunks)} chunks")
print(f"\nTotal chunks created: {len(all_chunks)}")
# Preview a chunk to understand the structure
print("\nSample chunk:")
sample_chunk = all_chunks[0]
for key, value in sample_chunk.items():
if key == 'content':
print(f"{key}: {value[:100]}...") # Truncate content for display
else:
print(f"{key}: {value}")
Example Chunk from IMF Report Processing
Below is an example chunk extracted from IMF reports using the pdfplumber library for PDF text extraction, demonstrating the chunk structure from real economic documents.
Sample chunk:
content: opportunity to test innovations. In line with industry trends to incorporate technology and
business model innovations, it is recommended that regulators consider adjusting the testing
and risk monito...
doc_id: 1THAEA2019001-1.pdf
chunk_id: 1THAEA2019001-1.pdf_chunk_164
source: IMF
title: Thailand: Financial System Stability Assessment; IMF Country Report No. 19/308; September 10, 2019
date: D:20191002123429-04'00'
author:
doc_type: Report
Production-Ready Document Processing Enhancements
In a production system, leverage pdfplumber for OCR and text extraction from PDF documents. Alternatively, vision LLMs can handle complex document layouts and we’ve included this functionality in the repository.
For automated document collection, implement a web scraping class to gather reports directly from institutional websites like the IMF, World Bank, or OECD.
Document Types to Consider
- Country Reports: IMF Article IV consultations and World Bank country studies.
- Regional Analyses: OECD regional outlooks and regional development bank reports.
- Policy Papers: Central bank communications and ministry publications.
- Research Papers: Academic studies and think tank analyses.
- News Articles: Financial Times, The Economist, Reuters, and Bloomberg articles.
Step 4: Designing the Knowledge Graph Schema
Designing the Graph Schema for Economic Analysis
The graph schema forms the foundation of our GraphRAG system, defining entity types and their relationships. A well-designed schema enables complex economic queries while maintaining intuitive navigation. We’ll use Neo4j to build our knowledge graph structure.
Core Entity Types for Economic Analysis
We’ll define our primary entity types based on economic analysis requirements, creating nodes and relationships that capture meaningful connections between economic concepts:
"""
Economic Knowledge Graph Schema
ENTITIES (Nodes):
- Country: Geographic entities (Brazil, Argentina, etc.)
- Indicator: Economic metrics (GDP Growth, Inflation, etc.)
- Year: Time periods for data points
- DataPoint: Specific country-indicator-year-value combinations
- Document: Report chunks from IMF, OECD, etc.
- EconomicConcept: Economic themes (fiscal policy, monetary policy, etc.)
- Event: Economic events (financial crisis, policy reform, etc.)
RELATIONSHIPS (Edges):
- (Country)-[:HAS_DATA_POINT]->(DataPoint)
- (DataPoint)-[:MEASURES]->(Indicator)
- (DataPoint)-[:FOR_YEAR]->(Year)
- (Document)-[:MENTIONS]->(Country)
- (Document)-[:DISCUSSES]->(EconomicConcept)
- (Document)-[:DESCRIBES]->(Event)
"""
Building the Graph Constructor
First, we establish database constraints for our Neo4j graph to ensure data integrity and optimize query performance:
class EconGraphBuilder:
"""
Constructs and populates the economic knowledge graph.
Class handles:
1) setting up graph constraints for data integrity.
2) Creating nodes from structured WDI data.
3) Extracting entities from unstructured documents.
4) Building relationships between all entities.
"""
def __init__(self, neo4j_driver = neo4j_driver):
self.driver = neo4j_driver
self.setup_constraints()
def setup_constraints(self):
"""
Create unique constraints to prevent duplicate entities.
Constraints ensure data integrity and improve query performance by creating indexes on frequently accessed properties.
"""
constraints = [
# Each country has a unique code (BR, AR, etc.)
"CREATE CONSTRAINT country_code IF NOT EXISTS FOR (c:Country) REQUIRE c.code IS UNIQUE",
# Each indicator has a unique code (NY.GDP.MKTP.KD.ZG, etc.)
"CREATE CONSTRAINT indicator_code IF NOT EXISTS FOR (i:Indicator) REQUIRE i.code IS UNIQUE",
# Each year is unique
"CREATE CONSTRAINT year_value IF NOT EXISTS FOR (y:Year) REQUIRE y.value IS UNIQUE",
# Each document chunk has unique ID
"CREATE CONSTRAINT document_id IF NOT EXISTS FOR (d:Document) REQUIRE d.chunk_id IS UNIQUE"
]
with self.driver.session() as session:
for constraint in constraints:
try:
session.run(constraint)
print(f"✓ Created constraint: {constraint.split('(')[1].split(')')[0]}")
except Exception as e:
print(f"⚠ Constraint might already exist: {e}")
Creating Nodes from Structured Data
Next, we’ll implement a helper function to categorize our economic indicators, enabling better organization and navigation within our knowledge graph:
def _categorize_indicator(self, indicator_code: str) -> str:
"""Categorize indicators for better organization"""
if 'GDP' in indicator_code or 'MKTP' in indicator_code:
return 'Growth'
elif 'CPI' in indicator_code or 'INF' in indicator_code:
return 'Inflation'
elif 'UEM' in indicator_code or 'EMP' in indicator_code:
return 'Employment'
elif 'TRD' in indicator_code or 'EXP' in indicator_code or 'IMP' in indicator_code:
return 'Trade'
elif 'DOD' in indicator_code or 'DEBT' in indicator_code:
return 'Fiscal'
else:
return 'Other'
Populating Knowledge Graph Nodes
We’ll now define the function to populate our knowledge graph nodes using the World Bank economic data collected earlier:
def create_structured_nodes(self, wdi_data: pd.DataFrame):
"""
Create nodes and relationships from World Bank WDI data.
This method transforms tabular data into a graph structure:
Country -> DataPoint -> Indicator
|
v
Year
"""
print("Creating structured data nodes...")
with self.driver.session() as session:
# 1) Create country nodes
countries = wdi_data[['country_code', 'country_name']].drop_duplicates()
print(f"Creating {len(countries)} country nodes...")
for _, row in countries.iterrows():
session.run(
"""
MERGE (c:Country {code: $code})
SET c.name = $name
""",
code=row['country_code'],
name=row['country_name']
)
# 2) Create indicator nodes
indicators = wdi_data[['indicator_code', 'indicator_name']].drop_duplicates()
print(f"Creating {len(indicators)} indicator nodes...")
for _, row in indicators.iterrows():
session.run(
"""
MERGE (i:Indicator {code: $code})
SET i.name = $name, i.category = $category
""",
code=row['indicator_code'],
name=row['indicator_name'],
category=self._categorize_indicator(row['indicator_code'])
)
# 3) Create Year Nodes
years = wdi_data['year'].dropna().unique()
print(f"Creating {len(years)} year nodes...")
for year in years:
session.run(
"""
MERGE (y:Year {value: $year})
""",
year=int(year)
)
# 4) Create DataPoint nodes and relationships
# -- filter out null values for clean data
valid_data = wdi_data.dropna(subset=['value'])
print(f"Creating {len(valid_data)} data points with relationships...")
for counter, (_, row) in enumerate(valid_data.iterrows()):
try:
# Create DataPoint with explicit property setting
session.run(
"""
MATCH (c:Country {code: $country_code})
MATCH (i:Indicator {code: $indicator_code})
MATCH (y:Year {value: $year})
MERGE (dp:DataPoint {
country_code: $country_code,
indicator_code: $indicator_code,
year: $year
})
SET dp.value = $value,
dp.last_updated = datetime()
MERGE (c)-[:HAS_DATA_POINT]->(dp)
MERGE (dp)-[:MEASURES]->(i)
MERGE (dp)-[:FOR_YEAR]->(y)
""",
country_code=row['country_code'],
indicator_code=row['indicator_code'],
year=int(row['year']),
value=float(row['value'])
)
except Exception as e:
print(f"Error processing row {i}: {e}")
print(f"Row data: {dict(row)}")
break
# Progress indicator for large datasets
if counter % 100 == 0:
print(f" Processed {counter}/{len(valid_data)} data points...")
# Final progress update
print(f" Processed {len(valid_data)}/{len(valid_data)} data points...")
print("✓ Structured data nodes created successfully")
Why this structure?
- Separation of Concerns: Countries, indicators, and years are separate entities that can be reused.
- Flexible Queries: Can easily find all data for a country and all years for an indicator.
- Data Integrity: Constraints prevent duplicate entities and ensure referential integrity.
- Performance: Unique constraints create indexes for fast lookups.
- Extensibility: Easy to add new indicators or countries without schema changes.
Key Design Decisions
- DataPoint as Central Entity: Represents the many-to-many relationship between countries, indicators, and years.
- Categorical Organization: Indicators are categorized for easier navigation and filtering.
- Timestamp Tracking: last_updated field helps with data freshness tracking.
- Flexible Value Storage: Values stored as floats accommodate various economic metrics.
Creating Document Nodes for Unstructured Data
We still need to incorporate document nodes from our unstructured data sources into the knowledge graph. First, we’ll define a helper function to extract entities from text using spaCy’s natural language processing capabilities to identify relevant economic entities within documents.
def extract_entities_from_text(self, text: str) -> Tuple[List[str], List[str], List[str]]:
"""Extract countries, economic terms, and events from text"""
nlp = spacy.load("en_core_web_sm") # Used for extracting words from documents
doc = nlp(text)
countries = []
economic_terms = []
events = []
# Define some known countries to aid in detection including the countries for our analysis plus other major economies
known_countries = ['Brazil', 'Argentina', 'Chile', 'Colombia', 'Mexico', 'Peru', 'United States', 'United Kingdom', 'China', 'Germany', 'Japan', 'US', 'UK', 'France', 'Australia', 'Russia'])
# Predefined economic terms to look for extraction
economic_keywords = {
'gdp', 'inflation', 'unemployment', 'fiscal', 'monetary', 'debt', 'growth', 'recession', 'stimulus',
'reform', 'trade', 'exports', 'imports', 'deficit', 'surplus', 'policy', 'central bank', 'interest rates'
}
for ent in doc.ents:
if ent.label_ == "GPE": # Geopolitical entities
if ent.text.lower() in known_countries:
countries.append(ent.text)
elif ent.label_ in ["ORG", "EVENT"]:
events.append(ent.text)
# Extract economic terms
for token in doc:
if token.text.lower() in economic_keywords:
economic_terms.append(token.text.lower())
return list(set(countries)), list(set(economic_terms)), list(set(events))
Building Document Nodes and Relationships
Now we’ll create the function that generates document nodes for our knowledge graph. This process creates nodes for each text chunk and establishes relationships based on entity mentions including countries, economic concepts, organizations, and events.
The system uses spaCy’s named entity recognition to match labels within each chunk. For economic concepts, we define a curated list of relevant terms that help identify chunks containing specific economic discussions, enabling precise relationship mapping between documents and structured data.
def create_document_nodes(self, chunks: List[Dict]):
"""Create document nodes and extract entities"""
with self.driver.session() as session:
for i, chunk in enumerate(chunks):
print("Processing document chunks into nodes...")
# Create document chunk node
session.run(
"""
MERGE (d:Document {doc_id: $doc_id})
SET d.title = $title,
d.content = $content,
d.chunk_id = $chunk_id
""",
**chunk
)
# Extract and link entities
countries, economic_terms, events = self.extract_entities_from_text(chunk["content"])
# Link countries
for country in countries:
session.run(
"""
MATCH (d:Document {chunk_id: $chunk_id})
MATCH (c:Country {name: $country})
MERGE (d)-[:MENTIONS]->(c)
""",
chunk_id=chunk["chunk_id"],
country=country
)
# Create economic concept nodes
for term in economic_terms:
session.run(
"""
MATCH (d:Document {chunk_id: $chunk_id})
MERGE (ec:EconomicConcept {name: $term})
MERGE (d)-[:DISCUSSES]->(ec)
""",
chunk_id=chunk["chunk_id"],
term=term
)
# Create event nodes
for event in events:
session.run(
"""
MATCH (d:Document {chunk_id: $chunk_id})
MERGE (e:Event {name: $event})
MERGE (d)-[:DESCRIBES]->(e)
""",
chunk_id=chunk["chunk_id"],
event=event
)
# Progress tracking
if i % 100 == 0 or i == len(chunks) - 1: # Update every 5 chunks since there are fewer documents
entities_found = len(countries) + len(economic_terms) + len(events)
print(f"Processed chunk {i + 1}/{len(chunks)} - Found {entities_found} entities")
print("Document nodes and entity relationships created successfully")
Step 5: Setting up the Vector Store
Congratulations if you’ve made it this far, we’re almost done! If you’re enjoying the tutorial please like and subscribe to our posts on LinkedIn and YouTube. We appreciate your support!
Setting Up the Vector Store
For vector storage, we’ll use Qdrant vector database to enable semantic search across all document chunks. The system employs cosine similarity to measure text similarity between chunks, allowing for precise retrieval of contextually relevant economic information.
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
from langchain_ollama import OllamaEmbeddings
from google import genai
from google.genai.errors import ServerError
import os
from typing import List, Dict
from dotenv import load_dotenv
import time
from httpx import RemoteProtocolError
load_dotenv()
# Initialize the qdrant client
qdrant_client = QdrantClient(url=os.getenv('QDRANT_ENDPOINT'), port=6333, api_key=os.getenv('QDRANT_API_KEY'))
ollama_embeddings = OllamaEmbeddings(model=f"{os.getenv('OLLAMA_EMBEDDINGS_MODEL')}")
google_embeddings = genai.Client(api_key=os.getenv('GOOGLE_API_KEY'))
class VectorStoreManager:
def __init__(self, qdrant_client = qdrant_client, embeddings = "google"):
self.client = qdrant_client
self.embeddings = embeddings
self.collection_name = "economic_documents"
self.setup_collection()
def setup_collection(self):
"""Initialize Qdrant collection"""
try:
# We set the vector size to 3072 because that's what google embeddings returns, different embeddings models
# may return different vector sizes and this would need to be adjusted to accommodate if using a local model.
self.client.create_collection(
collection_name=self.collection_name,
vectors_config=VectorParams(size=3072, distance=Distance.COSINE),
)
except Exception as e:
print(f"Collection might already exist: {e}")
def embed_documents(self, chunks: List[Dict]):
"""Embed document chunks and store in vector database"""
points = []
for i, chunk in enumerate(chunks):
# Create enriched content for embedding
enriched_content = f"""
Title: {chunk['title']}
Source: {chunk['source']}
Content: {chunk['content']}
"""
processed_embeddings = []
tries = True
num_tries = 0
embedding = ''
# Generate embedding, if the google api fails we want to try a couple more times before moving on to the next chunk.
if self.embeddings == "google":
while tries:
try:
num_tries += 1
embedding = google_embeddings.models.embed_content(
model="gemini-embedding-001",
contents=enriched_content
)
tries = False
except (ServerError, RemoteProtocolError) as e:
if num_tries == 5: # Move to next entry
tries = False
continue
print(f"Issue with Google Server: {e}")
time.sleep(30) # Pause for 30 seconds before trying to create the embeddings again
if embedding:
processed_embeddings = embedding.embeddings[0].values
elif self.embeddings == "ollama":
processed_embeddings = ollama_embeddings.embed_query(enriched_content)
else:
print(f"Embeddings model not available for embedding type: {self.embeddings}.")
continue
# Create point for Qdrant
point = PointStruct(
id=i,
vector=processed_embeddings,
payload={
'chunk_id': chunk['chunk_id'],
'content': chunk['content'],
'title': chunk['title'],
'source': chunk['source'],
'date': chunk['date'],
'doc_type': chunk['doc_type']
}
)
points.append(point)
# Progress tracking
if i % 100 == 0 or i == len(chunks) - 1: # Update every 5 chunks since there are fewer documents
print(f" Processed chunk {i + 1}/{len(chunks)}")
with open(Path.cwd() / 'db' / 'points.p', 'wb') as f:
pickle.dump(points, f)
# Chunk the points to upsert into the Qdrant db to avoid issues with payload size
for i in range(0, len(points), 10):
points_to_upsert = points[i:i + 10]
# Upload to Qdrant
self.client.upsert(
collection_name=self.collection_name,
points=points_to_upsert
)
print(f"Embedded and stored {len(points)} document chunks")
return None
Step 6: GraphRAG Query System
Building the GraphRAG Query Interface
We’re almost finished! The final component is our GraphRAG query class, which retrieves relevant information and data to provide context for LLM responses.
Extracting Entities from User Questions
Before querying the knowledge graph, we must extract appropriate entities from user questions. This process combines NLP techniques with preset keyword matching to identify indicator codes, indicator categories, and economic concepts within queries.
class EconGraphRag:
"""
Combines the knowledge graph with the vector search to conduct the graphRAG
"""
def __init__(self, neo4j_driver = neo4j_driver, qdrant_client = qdrant_client,
provider = "google", llm_type = "google", collection_name = "economic_documents"):
self.driver = neo4j_driver
self.qdrant_client = qdrant_client
self.embeddings = google_client if provider == "google" else ollama_embeddings
self.provider = provider
self.llm = google_client if llm_type == "google" else ollama_llm
self.collection_name = collection_name
def extract_query_entities(self, question: str) -> Dict[str, List[str]]:
"""Extract entities and concepts from user question"""
nlp = spacy.load("en_core_web_sm")
query = nlp(question)
entities = {
'countries': [],
'indicator_codes': [],
'indicator_categories': [],
'years': [],
'concepts': []
}
# Extract entities
for ent in query.ents:
if ent.label_ == "GPE":
entities['countries'].append(ent.text.lower())
elif ent.label_ == "DATE":
# Simple year extraction
year_match = re.search(r'\b(19|20)\d{2}\b', ent.text)
if year_match:
entities['years'].append(year_match.group())
# World Bank WDI Indicator Code Mappings
wdi_indicators = {
# GDP Growth indicators
'gdp growth': 'NY.GDP.MKTP.KD.ZG', # GDP growth (annual %)
'economic growth': 'NY.GDP.MKTP.KD.ZG',
# Inflation indicators
'inflation': 'FP.CPI.TOTL.ZG', # Inflation, consumer prices (annual %)
'inflation rate': 'FP.CPI.TOTL.ZG',
'consumer price index': 'FP.CPI.TOTL.ZG', # Consumer price index (2010 = 100)
'cpi': 'FP.CPI.TOTL.ZG',
'price level': 'FP.CPI.TOTL.ZG',
# Employment indicators
'unemployment': 'SL.UEM.TOTL.ZS', # Unemployment, total (% of total labor force)
'unemployment rate': 'SL.UEM.TOTL.ZS',
'not working': 'SL.UEM.TOTL.ZS'
# Trade indicators
'trade': 'NE.TRD.GNFS.ZS', # Trade (% of GDP)
'trade balance': 'NE.TRD.GNFS.ZS',
'import and exports': 'NE.TRD.GNFS.ZS'
# Fiscal indicators
'government debt': 'GC.DOD.TOTL.GD.ZS', # Central government debt, total (% of GDP)
'debt': 'GC.DOD.TOTL.GD.ZS',
'public debt': 'GC.DOD.TOTL.GD.ZS',
}
# Category mappings for broader searches
indicator_categories = {
'growth': 'Growth',
'economic growth': 'Growth',
'inflation': 'Inflation',
'prices': 'Inflation',
'employment': 'Employment',
'unemployment': 'Employment',
'jobs': 'Employment',
'working': 'Employment',
'trade': 'Trade',
'exports': 'Trade',
'imports': 'Trade',
'trade balance': 'Trade',
'fiscal': 'Fiscal',
'debt': 'Fiscal',
'government': 'Fiscal',
'budget': 'Fiscal'
}
question_lower = question.lower()
# Check for specific indicator code matches
for indicator, code in wdi_indicators.items():
if indicator in question_lower:
entities['indicator_codes'].append(code)
entities['concepts'].append(indicator)
# Check for category matches
for indicator, category in indicator_categories.items():
if indicator in question_lower:
entities['indicator_categories'].append(category)
if indicator not in entities['concepts']:
entities['concepts'].append(indicator)
# Extract country names and codes that might not be caught by NER
common_countries = {
'us': 'united states', 'usa': 'united states', 'america': 'united states',
'uk': 'united kingdom', 'britain': 'united kingdom',
'china': 'china', 'prc': 'china',
'india': 'india',
'germany': 'germany',
'france': 'france',
'japan': 'japan',
'brazil': 'brazil',
'russia': 'russia',
'canada': 'canada',
'australia': 'australia',
'south korea': 'south korea', 'korea': 'south korea',
'mexico': 'mexico',
'italy': 'italy',
'spain': 'spain',
'netherlands': 'netherlands',
'switzerland': 'switzerland',
'sweden': 'sweden',
'norway': 'norway',
'denmark': 'denmark'
}
for country_variant, country_name in common_countries.items():
if country_variant in question_lower and country_name not in entities['countries']:
entities['countries'].append(country_name)
# Remove duplicates while preserving order
for key in entities:
entities[key] = list(dict.fromkeys(entities[key]))
return entities
Querying the Knowledge Graph for Economic Context
Next, we’ll define the function to query our knowledge graph for comprehensive economic information. The system searches for data related to specific countries mentioned in questions, plus relevant indicator codes or categories.
This approach handles both specific and broad economic queries. For example, “How has GDP growth changed over the past 5 years?” will capture relevant indicators and historical data across multiple countries.
The function also searches document nodes to find reports related to mentioned countries or economic concepts. This cross-country knowledge building ensures that events in one country (like the US) that may impact another (like Brazil) are captured and surfaced in responses.
def query_graph(self, entities: Dict[str, List[str]]) -> List[dict]:
"""Query the knowledge graph based on extracted entities, order by the year desc"""
results = []
with self.driver.session() as session:
# Query for country data points
if entities['countries']:
for country in entities['countries']:
query = """
MATCH (c:Country)-[:HAS_DATA_POINT]->(dp:DataPoint)-[:MEASURE]->(i:Indicator)
MATCH (dp)-[:FOR_YEAR]->(y:Year)
WHERE toLower(c.name) CONTAINS $country or toLower(c.code) = $country
"""
params = {'country': country}
# Filter by indicator codes if specified
if entities['indicator_codes']:
query += " AND i.code IN $indicator_codes"
params['indicator_codes'] = entities['indicator_codes']
elif entities['indicator_categories']:
query += " AND i.category IN $indicator_categories"
params['indicator_categories'] = entities['indicator_categories']
if entities['years']:
query += " AND y.value in $years"
params['years'] = [int(y) for y in entities['years']]
query += """
RETURN c.name as country, i.name as indicator, i.code as indicator_code,
i.category as category, y.value as year, dp.value as value, dp.last_updated as last_updated
ORDER BY dp.last_updated, y.value DESC
LIMIT 20
"""
result = session.run(query, params)
results.extend([dict(record) for record in result])
# Query for related documents
if entities['countries'] or entities['concepts']:
doc_query = """
MATCH (d:Document)
WHERE
"""
conditions = []
params = {}
if entities['countries']:
conditions.append("""
EXISTS {
MATCH (d)-[:MENTIONS]->(c:Country)
WHERE ANY(country IN $countries WHERE toLower(c.name) CONTAINS country)
}
""")
params['countries'] = entities['countries']
if entities['concepts']:
conditions.append("""
EXISTS {
MATCH (d)-[:DISCUSSES]->(ec:EconomicConcept)
WHERE ec.name IN $concepts
}
""")
params['concepts'] = entities['concepts']
doc_query += " OR ".join(conditions)
doc_query += """
RETURN d.chunk_id as chunk_id, d.title as title,
d.source as source, d.content as content
LIMIT 10
"""
if conditions:
doc_result = session.run(doc_query, params)
doc_results = [dict(record) for record in doc_result]
results.extend(doc_results)
# Query for indicators without specific countries
if entities['indicator_codes'] or entities['indicator_categories']:
query = """
MATCH (c:Country)-[:HAS_DATA_POINT]->(dp:DataPoint)-[:MEASURE]->(i:Indicator)
MATCH (dp)-[:FOR_YEAR]->(y:Year)
WHERE 1=1
"""
params = {}
# Filter by specific indicator codes
if entities['indicator_codes']:
query += " AND i.code IN $indicator_codes"
params['indicator_codes'] = entities['indicator_codes']
# Filter by categories if no specific codes
elif entities['indicator_categories']:
query += " AND i.category IN $indicator_categories"
params['indicator_categories'] = entities['indicator_categories']
if entities['years']:
query += " AND y.value IN $years"
params['years'] = [int(y) for y in entities['years']]
query += """
RETURN c.name as country, i.name as indicator, i.code as indicator_code,
i.category as category, y.value as year, dp.value as value, dp.unit as unit
ORDER BY y.value DESC, c.name ASC
LIMIT 50
"""
result = session.run(query, params)
results.extend([dict(record) for record in result])
return results
Adding Semantic Search for Comprehensive Retrieval
In addition to searching through our knowledge graph, we’ll conduct a semantic search to ensure we capture all the information and avoid missing any relevant economic context or relationships.
def semantic_search(self, question: str, limit: int = 10) -> List[Dict]:
"""Perform semantic search on document vectors"""
if self.provider == "google":
query_embedding = self.embeddings.models.embed_content(
model="gemini-embedding-001",
contents=question
)
if query_embedding:
processed_embeddings = query_embedding.embeddings[0].values
else:
print("Embeddings not found.")
return None
else:
processed_embeddings = self.embeddings.embed_query(question)
search_results = self.qdrant_client.query_points(
collection_name=self.collection_name,
query=processed_embeddings,
limit=limit
)
return [
{
'content': result.payload['content'],
'title': result.payload['title'],
'source': result.payload['source'],
'score': result.score
}
for result in search_results.points
]
Generating LLM Responses with Complete Context
Finally, we’ll implement functions that integrate all components and enable the LLM to generate comprehensive answers using context from both our knowledge graph and vector database.
def generate_answer(self, question: str, graph_results: List[Dict],
semantic_results: List[Dict]) -> str:
"""Generate final answer using LLM"""
# Prepare context
context_parts = []
# Add graph-derived structured data
if graph_results:
context_parts.append("STRUCTURED DATA FROM KNOWLEDGE GRAPH:")
for result in graph_results[:10]: # Limit context size
if 'value' in result: # Numeric data
context_parts.append(
f"- {result.get('country', 'N/A')}: {result.get('indicator', 'N/A')} "
f"in {result.get('year', 'N/A')} was {result.get('value', 'N/A')}"
)
else: # Document data
context_parts.append(f"- {result.get('title', 'N/A')}: {result.get('content', 'N/A')[:200]}...")
# Add semantic search results
if semantic_results:
context_parts.append("\nRELEVANT DOCUMENT EXCERPTS:")
for result in semantic_results:
context_parts.append(
f"- From '{result['title']}' ({result['source']}): {result['content'][:500]}..."
)
context = "\n".join(context_parts)
prompt = f"""
Based on the following structured economic data and document excerpts,
please provide a comprehensive answer to the question: "{question}"
Available Context:
{context}
Please provide a detailed answer that:
1. Uses specific data points when available
2. Explains relationships between different economic indicators
3. References sources appropriately
4. Acknowledges any limitations in the available data
Answer:
"""
if self.provider == "google":
response = self.llm.models.generate_content(
model="gemini-2.5-flash",
contents=prompt,
)
else:
response = self.llm(prompt)
return response
def answer_question(self, question: str) -> str:
"""Main method to answer economic questions"""
print(f"Processing question: {question}")
# Extract entities from question
entities = self.extract_query_entities(question)
print(f"Extracted entities: {entities}")
# Query knowledge graph
graph_results = self.query_graph(entities)
print(f"Found {len(graph_results)} graph results")
# Perform semantic search
semantic_results = self.semantic_search(question)
print(f"Found {len(semantic_results)} semantic results")
# Generate answer
answer = self.generate_answer(question, graph_results, semantic_results)
return answer
Congratulations ~ Your GraphRAG System is Complete!
You’ve successfully built a GraphRAG system for economic data analysis that combines structured World Bank indicators with unstructured economic documents. This foundation provides a solid base for expansion and customization.
Next Steps and Resources:
- For PDF processing capabilities, watch our detailed YouTube video tutorial
- Access the complete GraphRAG implementation on our GitLab repository with enhanced features
- Explore additional data sources and expand your economic entity types
- Implement automated document collection for real-time updates
Your system now enables sophisticated economic analysis by connecting quantitative data with qualitative insights from reports and research papers.
Testing the Economic GraphRAG
The hard work is complete and it’s time to see your GraphRAG system in action. This step is straightforward: define several test questions covering different economic scenarios and send them to the answer_question
function of our EconomicGraphRAG class.
Sample Test Questions:
- Country-specific economic performance queries
- Cross-country comparative analyses
- Historical trend questions spanning multiple years
- Policy impact assessments combining structured and unstructured data
Watch as your system retrieves relevant data from both the knowledge graph and vector database to generate comprehensive, contextually-rich responses.
Sending Test Questions to Answer
# Initialize GraphRAG system
graph_rag = EconGraphRag()
# Test questions
test_questions = [
"What was Brazil's GDP growth in 2021 and what factors contributed to it?",
"How does Argentina's inflation compare to other Latin American countries?",
"What economic challenges are facing Latin America according to recent reports?",
"Which countries had the highest government debt levels and what were the underlying causes?"
]
for question in test_questions:
print(f"\n{'=' * 60}")
print(f"QUESTION: {question}")
print(f"{'=' * 60}")
answer = graph_rag.answer_question(question)
print(f"\nANSWER:\n{answer}")
print(f"{'=' * 60}")
Advanced Features and Extensions
Advanced Features for Extended Analysis
For our more advanced readers, we’ve included additional functions to extend the GraphRAG system’s capabilities. These advanced features are covered in detail in our YouTube video tutorial.
Temporal Analysis for Economic Trends
Analyzing trends over time is fundamental in economic analysis. This function enables trend analysis for specific country-indicator combinations, integrating seamlessly into our EconomicGraphRAG class to provide historical insights and pattern recognition across economic datasets.
def analyze_trends(self, country: str, indicator: str, years: List[int]) -> Dict:
"""Analyze trends over time for specific country-indicator combinations"""
with self.driver.session() as session:
query = """
MATCH (c:Country {name: $country})-[:HAS_DATA_POINT]->(dp:DataPoint)-[:MEASURES]->(i:Indicator {name: $indicator})
MATCH (dp)-[:FOR_YEAR]->(y:Year)
WHERE y.value IN $years
RETURN y.value as year, dp.value as value
ORDER BY y.value
"""
result = session.run(query, {
'country': country,
'indicator': indicator,
'years': years
})
data = [dict(record) for record in result]
# Calculate trend metrics
if len(data) > 1:
values = [d['value'] for d in data]
trend = 'increasing' if values[-1] > values[0] else 'decreasing'
avg_change = (values[-1] - values[0]) / (len(values) - 1)
return {
'data': data,
'trend': trend,
'average_change': avg_change,
'volatility': np.std(values) if len(values) > 2 else 0
}
return {'data': data}
Multi-Country Comparison Analysis
Another essential concept in economic analysis is comparative assessment between countries. Multi-country comparisons provide valuable insights when countries have similar economic structures but show diverging indicator performance, helping identify policy effectiveness and structural differences.
def compare_countries(self, countries: List[str], indicator: str, year: int) -> Dict:
"""Compare multiple countries for a specific indicator and year"""
with self.driver.session() as session:
query = """
MATCH (c:Country)-[:HAS_DATA_POINT]->(dp:DataPoint)-[:MEASURES]->(i:Indicator {name: $indicator})
MATCH (dp)-[:FOR_YEAR]->(y:Year {value: $year})
WHERE c.name IN $countries
RETURN c.name as country, dp.value as value
ORDER BY dp.value DESC
"""
result = session.run(query, {
'countries': countries,
'indicator': indicator,
'year': year
})
return [dict(record) for record in result]
Best Practices and Considerations
1. Data Quality and Validation
- Implement data validation: Check for missing values, outliers, and inconsistencies in WDI data.
- Document provenance: Track data sources and update timestamps.
- Handle data revisions: World Bank data is frequently revised; implement versioning.
2. Graph Schema Evolution
- Start simple: Begin with core entities (Country, Indicator, Year, Document).
- Iterate incrementally: Add new entity types and relationships based on user needs.
- Version control: Maintain schema versions to handle updates gracefully.
3. Performance Optimization
- Index frequently queried properties: Country codes, indicator codes, years.
- Limit context size: Prevent overwhelming the LLM with too much information.
- Cache common queries: Store results for frequently asked questions.
- Batch processing: Process documents and data updates in batches.
4. Evaluation and Monitoring
Implementing a simple function that checks the coherence and credibility of the information outputted by the LLM would be important in production. This quality control mechanism not only improves response accuracy but also provides feedback to refine knowledge graph construction, optimizing how nodes and edges are created for better information retrieval and analysis.
Conclusion
GraphRAG represents a significant advancement in economic data analysis by seamlessly combining structured quantitative data with unstructured textual information. This tutorial demonstrates how to build a practical system that can answer complex economic questions by leveraging the relationships between data points, countries, indicators, and policy documents.
The key advantages of this approach include:
- Enhanced Context: Quantitative indicators gain meaning through policy explanations and economic analysis
- Relationship Discovery: Uncover connections between countries, policies, and economic outcomes
- Reduced Hallucinations: Ground responses in structured data relationships
- Scalability: Easily add new data sources and entity types
As you implement this system, remember that the quality of your graph schema and entity extraction directly impacts the system’s effectiveness. Start with a focused domain (like Latin American economics) and gradually expand to other regions and economic topics.
The future of economic analysis lies in systems that can seamlessly traverse between “what happened” (quantitative data) and “why it happened” (qualitative explanations), making GraphRAG an essential tool for modern economic research and policy analysis.