1 min read

Selection of columns per data type

1 Introduction

In some situations it is necessary to select all columns of a certain data type. For example if you want to convert all categorical variables into dummy variables in order to be able to calculate a regression.

For this post the dataset Bank Data from the platform “UCI Machine Learning repository” was used. You can download it from my GitHub Repository.

2 Loading the libraries and the data

import numpy as np
import pandas as pd
bank = pd.read_csv("path/to/file/bank.csv", sep=";")

bank.dtypes

bank.shape

3 Selection of numeric variables

Only numerical variables will be considered here:

num_col = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
numerical_columns = list(bank.select_dtypes(include=num_col).columns)
bank_numeric = bank[numerical_columns]
bank_numeric.shape

bank_numeric.dtypes

4 Selection of categorical variables

Only categorical variables will be considered here:

obj_col = ['object']
object_columns = list(bank.select_dtypes(include=obj_col).columns)
bank_categorical = bank[object_columns]
bank_categorical.shape

bank_categorical.dtypes

5 Conclusion

This publication showed how to selectively select numeric or categorical variables.