1 Introduction
It happens again and again that in the course of the planned analysis text variables are unfavorably filled and therefore have to be changed. Here are some useful build in methods for string manipulation from Python.
Loading the libraries
import pandas as pd
2 Separate
2.1 via map - function
Map property applies changes to every element of a column
string_manipulation = pd.DataFrame({'Name': ['1.Anton', '2.Susi', '3.Moni', '4.Renate'],
'Alter': [32,22,62,44],
'Gehalt': [4700, 2400,4500,2500]})
string_manipulation
show_map = string_manipulation.copy()
show_map
Cleanup of the “Name” column
show_map.Name = show_map.Name.map(lambda x: x.split('.')[1])
show_map
Background info how .split works:
x = 'I.am.a.test'
y = x.split('.')
print (y)
z = x.split('.')[1]
print (z)
2.2 via string function
show_str_split = string_manipulation.copy()
show_str_split
new = show_str_split["Name"].str.split(".", n = 1, expand = True)
new
show_str_split["MA-Nummer"]= new[0]
show_str_split["MA-Name"]= new[1]
show_str_split
Exclude unnecessary columns
small_show_str_split = show_str_split.drop(columns=['Name', 'MA-Nummer'])
small_show_str_split
New arrangement of columns
clist = list(small_show_str_split.columns)
clist_new = clist[-1:]+clist[:-1]
small_show_str_split = small_show_str_split[clist_new]
small_show_str_split
3 Unite
3.1 two columns
df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})
df
df['period'] = df[['Year','quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
df
3.2 three and more columns
df = pd.DataFrame([['USA', 'Nevada', 'Las Vegas'], ['Brazil', 'Pernambuco', 'Recife']], columns=['Country', 'State', 'City'],)
df
df['AllTogether'] = df[['Country','State', 'City']].apply(lambda x : '{},
{} & {}'.format(x[0],x[1],x[2]), axis=1)
df
4 add_prefix
show_prefix2 = small_show_str_split.copy()
show_prefix2
show_prefix2['MA-Name'] = show_prefix2['MA-Name'].apply(lambda x: "{}{}".format('MA: ', x))
show_prefix2
5 add_suffix
show_suffix = show_prefix2.copy()
show_suffix
show_suffix['Betriebszugehörigkeit'] = show_suffix['Betriebszugehörigkeit'].apply(lambda x: "{}{}".format(x, ' Jahre'))
show_suffix
6 Conclusion
This was a small insight into the subject of string manipulation.