In this short guide, you'll see how to combine multiple columns into a single one in Pandas.

Here you can find the short answer:

(1) String concatenation

df['Magnitude Type'] + ', ' + df['Type']

(2) Using methods agg and join

df[['Date', 'Time']].T.agg(','.join)

(3) Using lambda and join

df[['Date', 'Time']].agg(lambda x: ','.join(x.values), axis=1).T

So let's see several useful examples on how to combine several columns into one with Pandas.

Suppose you have data like:

DateTimeDepthMagnitude TypeTypeMagnitude
01/02/1965 13:44:18 131.6 MW Earthquake 6.0
01/04/1965 11:29:49 80.0 MW Earthquake 5.8
01/05/1965 18:05:58 20.0 MW Earthquake 6.2
01/08/1965 18:49:43 15.0 MW Earthquake 5.8
01/09/1965 13:32:50 15.0 MW Earthquake 5.8

1: Combine multiple columns using string concatenation

Let's start with most simple example - to combine two string columns into a single one separated by a comma:

df['Magnitude Type'] + ', ' + df['Type']

result will be:

0         MW, Earthquake
1         MW, Earthquake
2         MW, Earthquake
3         MW, Earthquake
4         MW, Earthquake

What if one of the columns is not a string? Then you will get error like:

TypeError: can only concatenate str (not "float") to str

To avoid this error you can convert the column by using method .astype(str):

df['Magnitude Type'] + ', ' + df['Magnitude'].astype(str)

result:

0         MW, 6.0
1         MW, 5.8
2         MW, 6.2
3         MW, 5.8
4         MW, 5.8

2: Combine date and time columns into DateTime column

What if you have separate columns for the date and the time. You can concatenate them into a single one by using string concatenation and conversion to datetime:

pd.to_datetime(df['Date'] + ' ' + df['Time'], errors='ignore')

In case of missing or incorrect data we will need to add parameter: errors='ignore' in order to avoid error:

ParserError: Unknown string format: 1975-02-23T02:58:41.000Z 1975-02-23T02:58:41.000Z

3: Combine multiple columns with agg and join

Another option to concatenate multiple columns is by using two Pandas methods:

  • agg
  • join
df[['Date', 'Time']].T.agg(','.join)

result:

0        01/02/1965,13:44:18
1        01/04/1965,11:29:49
2        01/05/1965,18:05:58
3        01/08/1965,18:49:43

This one might be a bit slower than the first one.

4: Combine multiple columns with lambda and join

You can use lambda expressions in order to concatenate multiple columns. The advantages of this method are several:

  • you can have condition on your input - like filter
  • output can be customised
  • better control on dtypes

To combine columns date and time we can do:

df[['Date', 'Time']].agg(lambda x: ','.join(x.values), axis=1).T

In the next section you can find how we can use this option in order to combine columns with the same name.

5: Combine columns which have the same name

Finally let's combine all columns which have exactly the same name in a Pandas DataFrame.

First let's create duplicate columns by:

df.columns = ['Date', 'Date', 'Depth', 'Magnitude Type', 'Type', 'Magnitude']
df

A general solution which concatenates columns with duplicate names can be:

df.groupby(df.columns, axis=1).agg(lambda x: x.apply(lambda y: ','.join([str(l) for l in y if str(l) != "nan"]), axis=1))

This will result into:

DateDepthMagnitudeMagnitude TypeType
01/02/1965,13:44:18 131.6 6.0 MW Earthquake
01/04/1965,11:29:49 80.0 5.8 MW Earthquake
01/05/1965,18:05:58 20.0 6.2 MW Earthquake
01/08/1965,18:49:43 15.0 5.8 MW Earthquake
01/09/1965,13:32:50 15.0 5.8 MW Earthquake

How does it work? First is grouping the columns which share the same name:

for i in df.groupby(df.columns, axis=1):
    print(i)

result:

('Date',              Date      Date
0      01/02/1965  13:44:18
1      01/04/1965  11:29:49
2      01/05/1965  18:05:58
3      01/08/1965  18:49:43
4      01/09/1965  13:32:50
...           ...       ...
23407  12/28/2016  08:22:12
23408  12/28/2016  09:13:47
23409  12/28/2016  12:38:51
23410  12/29/2016  22:30:19
23411  12/30/2016  20:08:28

[23412 rows x 2 columns])
('Depth',         Depth
0      131.60
1       80.00
2       20.00
3       15.00
4       15.00
...       ...
23407   12.30
23408    8.80

Then it's combining their values:

df.groupby(df.columns, axis=1).apply(lambda x: x.values)

result:

Date              [[01/02/1965, 13:44:18], [01/04/1965, 11:29:49...
Depth             [[131.6], [80.0], [20.0], [15.0], [15.0], [35....
Magnitude         [[6.0], [5.8], [6.2], [5.8], [5.8], [6.7], [5....
Magnitude Type    [[MW], [MW], [MW], [MW], [MW], [MW], [MW], [MW...
Type              [[Earthquake], [Earthquake], [Earthquake], [Ea...

Finally there is prevention of errors in case of bad values like NaN, missing values, None, different formats etc.

Resources