Automated ETL Pipeline: PythonPostgreSQLPower BI

End-to-End Crypto Market Pipeline & Dashboard

A self-sustaining data system that captures real-time crypto snapshots to analyze market cap distribution and supply-side inflation.

Tools

Python, PostgreSQL

Viz Library

Power BI

Complexity

API Automation

Focus

Market Structure

The Problem & Logic

The CA Brain: Why these metrics?

Most people look at crypto prices. I wanted to look at Market Structure. By Comparing the Cirulating supply in the market to max supply possible in the market we can analyze the supply health of that coin along with its liquidity in the market. This also helps us make prediction as to how much time it will take to reach its max supply and how valued the coin is in current market.

Merely showing currency age doesn't provide much insights, by adding the market cap condition to it, we can clearly see that the longer the coin is in the market, the more is its market share.

DAX: Currency Age

currency_age = DATEDIFF('cmc_data latest_data'[date_added], TODAY(), YEAR)

The Challenge: Hyper-inflationary Tokens

There were two coins SHIB and PEPE which are highly popular meme coins. They have very high max supply due to which they were distorting the current supply and max supply visualization. Similarly, when I showed market cap by currency bar plot, BTC was distorting the whole plot as it has very high market cap.

To fix this, I used a log scale which doesn't compare the coins in actual values rather makes scales and compares the coins one to another on that scale. This fixed the graphs and other coins were also being able to be showed there instead of being suppressed by more dominating coins.

The Architecture

The Engineering: How it works

I built a modular pipeline using Python and SQLAlchemy for extraction, PostgreSQL for storage, and Windows Task Scheduler for automation.

SQLAlchemy Database Injection

db_url = 'postgresql://postgres:postgres@localhost:5432/postgres' engine = create_engine(db_url) df.to_sql( 'crypto_data', con=engine, if_exists='append', index=False, schema= 'cmc_data' ) print("Data Transfer Successful")

Self-Reading Run Logger

def get_next_run_no(log_file): if not os.path.exists(log_file): return 1 try: with open(log_file, 'r') as f: last_line = f.readlines()[-1] parts = last_line.split() last_no = int(parts[1]) return last_no + 1 except Exception as e: print(f'Error Returning Last Line, {e}') return 1

Automation Layer

I used a simple bat script that activated my .venv and runs main.py and then deactivates itself. I automated its running with task scheduler so that it runs automatically every hour my pc is on.

The Database Layer

The SQL Flex: Moving Logic to Backend

Creating a VIEW for the Latest Data + (Capped vs Uncapped Coins). This proves I can move 'Business Logic' out of the UI and into the Database for better performance.

SQL VIEW Creation

create or replace view latest_data as select *, case when max_supply is null then 'Inflationary (No Cap)' else 'Deflationary (Capped)' end as coin_type from crypto_data where timestamp = (select max(timestamp) from crypto_data);

Power BI Showcase

Interactive Insights Dashboard

Market Dominance (Scatter Plot)

Click to expand details

Supply Health (Gauge)

Click to expand details

Top Gainers/Losers

Click to expand details

Price Trends (ZEC Example)

Click to expand details

Want to see the raw Python code?

The full ETL logic, PostgreSQL schemas, and Power BI files are available on my GitHub.

Explore Repository