1 Introduction
Recently, I have been dealing with a wide variety of topics in the field of SQL. You can find my blog posts here: Michael Fuchs SQL
I recommend you to read it as a preparation if you are not familiar with SQL yet.
In this post I want to show how to access a SQL database (here SQL Server from Microsoft) using Python and run appropriate queries.
2 Import the Libraries and Preparations
To make SQL queries via Python we need the library pyodbc. This can be installed as follows:
pip install pyodbc
import pandas as pd
import pyodbc
In preparation for this post, I created a database with tables via SQL Server Management Studio (SSMS). How to do that I have described in detail in this post: Create a Database
3 Connection to the DB
A local SQL server can be reached with the following syntax. Of course, the name of the respective database must be adjusted.
cnxn = pyodbc.connect(driver='{SQL Server}', server='(local)', database='MyDB',
trusted_connection='yes')
cursor = cnxn.cursor()
Or this version written a little nicer:
conn_str = (
r'driver={SQL Server};'
r'server=(local);'
r'database=MyDB;'
r'trusted_connection=yes;'
)
cnxn = pyodbc.connect(conn_str)
cursor = cnxn.cursor()
4 Exploration of the respective DB
The first thing I do when I connect to a new database via Python is to see which tables are stored here. I have described here which possibilities there are for this: Data Wrangling / Get an Overview of the Data
query = "SELECT * FROM SYSOBJECTS WHERE xtype = 'U';"
df_existing_tables = pd.read_sql(query, cnxn)
df_existing_tables
Or a bit nicer:
query_mod = '''
SELECT *
FROM SYSOBJECTS
WHERE xtype = 'U'
;
'''
df_existing_tables = pd.read_sql(query_mod, cnxn)
df_existing_tables
As we can see, two tables are stored in the ‘MyDB’ database.
Now I would like to see which columns are stored in the table ‘Customer’:
query = "EXEC sp_columns 'Customer';"
df_existing_columns = pd.read_sql(query, cnxn)
df_existing_columns
5 Loading tables from DB
Of course, now I have the option here to load the stored tables in the database in Python. This goes as follows:
5.1 Complete Dataframe
query = '''
SELECT *
FROM Customer
;
'''
df_Customer = pd.read_sql(query, cnxn)
df_Customer
5.2 Selected Data
Of course, I can also specify my data query:
with a WHERE Statement
query = '''
SELECT *
FROM Customer
WHERE ID_Customer < 3
;
'''
df_Customer_selected = pd.read_sql(query, cnxn)
df_Customer_selected
with a newly generated column
query = '''
SELECT First_Name, Last_Name,
CONCAT(First_Name, ' ', Last_Name) AS Full_Name
FROM Customer
;
'''
df_Customer_modified = pd.read_sql(query, cnxn)
df_Customer_modified
with joined data
query = '''
SELECT t1.ID_Customer,
t1.First_Name,
t1.Last_Name,
t2.Nationality
FROM Customer AS t1
LEFT JOIN Customer_metadata AS t2
ON t1.ID_Customer = t2.ID_Customer
;
'''
df_Customer_joined = pd.read_sql(query, cnxn)
df_Customer_joined
and so on …
I can run all sorts of SQL queries here and load them into Python as I need the data.
6 Data Manipulation in SQL Server using Python
From Python, I can also manipulate the contents of existing tables on a SQL server.
6.1 Insert Values into SQL Server Table
cursor.execute('''
INSERT INTO Customer VALUES ('No', 'Name')
;
''')
cnxn.commit()
query = '''
SELECT *
FROM Customer
;
'''
df_Customer_manipulated = pd.read_sql(query, cnxn)
df_Customer_manipulated
Worked. A new row was added to the existing record.
Let’s take a look at the SQL Server itself:
6.2 Delete Records in SQL Server
Now we will delete this column again.
cursor.execute('''
DELETE FROM Customer
WHERE First_Name = 'No'
;
''')
cnxn.commit()
query = '''
SELECT *
FROM Customer
;
'''
df_Customer_original = pd.read_sql(query, cnxn)
df_Customer_original
Let’s take another look at SQL Server itself:
7 Inserting a Python Dataframe into SQL Server
Now I may want to write a new or edited record to a database. This is also possible from here.
Here is an example data set that I have created in Python:
df = pd.DataFrame({'Name': ['Maria', 'Marc', 'Julia'],
'Age': [32,22,62],
'Height': [162, 184, 170],
'Gender': ['female', 'male', 'female']})
df
In order to write data to a new table on the SQL Server, I need to create it on the server first.
cursor.execute('''
CREATE TABLE NewDF
(Name VARCHAR(100) NOT NULL,
Age INT NOT NULL,
Height INT NOT NULL,
Gender VARCHAR(100) NOT NULL)
;
''')
cnxn.commit()
Let’s take a quick look via SQL Server Object Explorer at the newly created table:
It is now listed as we can see. Let’s try to run a query on this table.
query = '''
SELECT *
FROM NewDF
;
'''
df_NewDF = pd.read_sql(query, cnxn)
df_NewDF
Fits, there are no data stored yet but we want to insert them in the next step:
for index, row in df.iterrows():
cursor.execute("INSERT INTO dbo.NewDF (Name,Age,Height,Gender) values(?,?,?,?)", row.Name, row.Age, row.Height, row.Gender)
cnxn.commit()
or written a bit more clearly:
for index, row in df.iterrows():
cursor.execute('''
INSERT INTO dbo.NewDF
(Name,Age,Height,Gender)
values(?,?,?,?)
''',
row.Name,
row.Age,
row.Height,
row.Gender)
cnxn.commit()
Let’s start the query again:
query = '''
SELECT *
FROM NewDF
;
'''
df_NewDF = pd.read_sql(query, cnxn)
df_NewDF
Worked. The data from our table created in Python is now stored on the SQL server!
8 Conclusion
How to make queries with SQL is part of the basic knowledge if you want to work in the field of Data Science.
How to connect to a SQL Server via Python and execute queries on it I have shown in this post.
Here is the SQL syntax I used to create the database and the tables:
SET LANGUAGE ENGLISH
CREATE DATABASE MyDB;
USE MyDB;
CREATE TABLE Customer
(ID_Customer INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
First_Name VARCHAR(100) NOT NULL,
Last_Name VARCHAR(100) NOT NULL)
;
INSERT INTO Customer VALUES ('Max', 'Steel')
INSERT INTO Customer VALUES ('Jessy', 'Williams')
INSERT INTO Customer VALUES ('Marc', 'Pike')
INSERT INTO Customer VALUES ('Emily', 'Taylor')
SELECT * FROM Customer
CREATE TABLE Customer_metadata
(ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
ID_Customer INT NOT NULL FOREIGN KEY REFERENCES Customer(ID_Customer),
Nationality VARCHAR(100) NOT NULL)
;
INSERT INTO Customer_metadata VALUES (1, 'German')
INSERT INTO Customer_metadata VALUES (2, 'French')
INSERT INTO Customer_metadata VALUES (3, 'English')
INSERT INTO Customer_metadata VALUES (4, 'Spanish')
SELECT * FROM Customer_metadata
SELECT * FROM NewDF