pyspark中split的简单使用

用于基于某个标识符将字符串/列拆分/断开为多个,并返回列表:

df_b = spark.createDataFrame([('1','ABC-07-DEF')],[ "ID","col1"])
df_b = df_b.withColumn('post_split', F.split(F.col('col1'), "-"))
df_b.show()
+---+----------+--------------+
| ID|      col1|    post_split|
+---+----------+--------------+
|  1|ABC-07-DEF|[ABC, 07, DEF]|
+---+----------+--------------+

此外,还可以使用getitem()从该arry列中提取列,如下所示

df_b = df_b.withColumn('split_col1', F.col('post_split').getItem(0))
  .withColumn('split_col2', F.col('post_split').getItem(1))
  .withColumn('split_col3', F.col('post_split').getItem(2)) df_b.show() +---+----------+--------------+----------+----------+----------+ | ID| col1| post_split|split_col1|split_col2|split_col3| +---+----------+--------------+----------+----------+----------+ | 1|ABC-07-DEF|[ABC, 07, DEF]| ABC| 07| DEF| +---+----------+--------------+----------+----------+----------+

 

posted @ 2022-03-09 15:13  干了这瓶老干妈  阅读(480)  评论(0编辑  收藏  举报
Live2D