2 min read

String Manipulation. An intuition.

1 Introduction

It happens again and again that in the course of the planned analysis text variables are unfavorably filled and therefore have to be changed. Here are some useful build in methods for string manipulation from Python.

Loading the libraries

import pandas as pd

2 Separate

2.1 via map - function

Map property applies changes to every element of a column

string_manipulation = pd.DataFrame({'Name': ['1.Anton', '2.Susi', '3.Moni', '4.Renate'],
                     'Alter': [32,22,62,44],
                     'Gehalt': [4700, 2400,4500,2500]})
string_manipulation

show_map = string_manipulation.copy()
show_map

Cleanup of the “Name” column

show_map.Name = show_map.Name.map(lambda x: x.split('.')[1])
show_map

Background info how .split works:

x = 'I.am.a.test'
y = x.split('.')
print (y)

z = x.split('.')[1]
print (z)

2.2 via string function

show_str_split = string_manipulation.copy()
show_str_split

new = show_str_split["Name"].str.split(".", n = 1, expand = True) 
new

show_str_split["MA-Nummer"]= new[0] 
show_str_split["MA-Name"]= new[1]
show_str_split

Exclude unnecessary columns

small_show_str_split = show_str_split.drop(columns=['Name', 'MA-Nummer'])
small_show_str_split

New arrangement of columns

clist = list(small_show_str_split.columns)
clist_new = clist[-1:]+clist[:-1]
small_show_str_split = small_show_str_split[clist_new]
small_show_str_split

3 Unite

3.1 two columns

df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})
df

df['period'] = df[['Year','quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
df

3.2 three and more columns

df = pd.DataFrame([['USA', 'Nevada', 'Las Vegas'], ['Brazil', 'Pernambuco', 'Recife']], columns=['Country', 'State', 'City'],)
df

df['AllTogether'] = df[['Country','State', 'City']].apply(lambda x : '{}, 
                    {} & {}'.format(x[0],x[1],x[2]), axis=1)
df

4 add_prefix

show_prefix2 = small_show_str_split.copy()
show_prefix2

show_prefix2['MA-Name'] = show_prefix2['MA-Name'].apply(lambda x: "{}{}".format('MA: ', x))
show_prefix2

5 add_suffix

show_suffix = show_prefix2.copy()
show_suffix

show_suffix['Betriebszugehörigkeit'] = show_suffix['Betriebszugehörigkeit'].apply(lambda x: "{}{}".format(x, ' Jahre'))
show_suffix

6 Conclusion

This was a small insight into the subject of string manipulation.