(2024-Mar-16) I have been a big fan of using the Excel application for various tasks for more than half of my life. I believe it's the most widely used spreadsheet application in the world. I'm not shy to acknowledge that I've used it for very small tasks like taking structured notes during meetings, as well as creating sophisticated tables that can be shared with other users and handling larger data as updatable sourcing datasets for database applications. Let's not forget the well-established marketing campaign by Microsoft promoting Excel as the main analytical platform about 15 years ago, with some tools still available and some relegated to memory (Power Query, Power Pivot, Power Map, Power View). The good old days!
Examining PostgreSQL database tables (or any other database platform), including their structure, column names, data types, and even a glimpse of sample datasets, can be accomplished using various data tools. This process only requires data connection supportability by those tools and your perseverance to navigate through different database objects, expand the list of columns, and run simple data profiling tasks. It's all possible.
But what about having a single, simple view into a database's metadata with the help of an Excel spreadsheet? This would benefit both data gurus and information novices alike, especially in a team collaborative format where information can be viewed, discussed, and commented on.
1) Let's extract the metadata first. Here is my sample PostgreSQL database with 15 tables and very diverse column data types:
then I execute the following SQL script that would give me the list with all the tables, their columns and each column's attributes:
SELECT c.table_catalog , c.table_schema , c.table_name , c.column_name , c.ordinal_position , c.column_default , c.is_nullable , c.data_type , c.character_maximum_length , c.numeric_precision , c.datetime_precision FROM information_schema.columns as c join information_schema.tables as t on t.table_catalog = c.table_catalog and t.table_schema = c.table_schema and t.table_name=c.table_name and t.table_schema not in('pg_catalog', 'information_schema') and t.table_type <> 'VIEW' order by c.table_catalog , c.table_schema , c.table_name , c.ordinal_position
Then I save the output of this query into a CSV file and open it in Excel:
2) Next is a sample data row, I'm happy to extract only random first records from each table, but you may change the number or returned records if you like.
To start this, I need to create another SQL request based on the existing metadata from the existing database and execute the following SQL script for this:
select 'select start.* ' as table_data
union all select * from ( SELECT concat(' ,', table_name,'.*') FROM information_schema.tables WHERE table_type = 'BASE TABLE' and table_type <> 'VIEW' and table_schema not in('pg_catalog', 'information_schema') order by table_name) data union all select 'from (select null as "Data Value") as start' union all select * from( SELECT concat('left join (select * from ', table_schema, '.', table_name, ' limit 1) as ', table_name,' on 1=1') FROM information_schema.tables WHERE table_type = 'BASE TABLE' and table_type <> 'VIEW' and table_schema not in('pg_catalog', 'information_schema') order by table_name ) tables
This script constructs the following script, listing all the available tables, to attempt to execute further SELECT statements:
I copied this script and executed it as another query to run the actual data extraction of my sample data.
As a result, I only needed a database connection and two SQL script executions (as well as one derived SQL code execution from the 2nd query). Both of my SQL scripts simply rely on existing database metadata, without any hardcoded table or column references. My mission is accomplished. Thanks for reading this blog post; I hope it will save you some time, as these SQL scripts helped me too!
Comments
Post a Comment