代码改变世界

【大数据】大数据相关的Troubshooting

2022-04-06 18:02  码上起舞  阅读(55)  评论(0编辑  收藏  举报

问题1

pyspark的df.show() 报错Method showString([class java.lang.Integer, class java.lang.Integer]) does not exist

原因:Spark版本不匹配. 在Spark 2.3之前 show方法只接受了两个参数 def show(self, n=20, truncate=True),但需要传递三个参数。

解决方案:选择方案1或2其中一种即可

1.直接在代码中打印print(raw_data._jdf.showString(20, 0, False))

2.修改show()方法,在实现代码中增加一个参数,代码红色部分

def show(self, n=20, truncate=True):
"""Prints the first ``n`` rows to the console.

:param n: Number of rows to show.
:param truncate: If set to True, truncate strings longer than 20 chars by default.
If set to a number greater than one, truncates long strings to length ``truncate``
and align cells right.

>>> df
DataFrame[age: int, name: string]
>>> df.show()
+---+-----+
|age| name|
+---+-----+
| 2|Alice|
| 5| Bob|
+---+-----+
>>> df.show(truncate=3)
+---+----+
|age|name|
+---+----+
| 2| Ali|
| 5| Bob|
+---+----+
"""

 if isinstance(truncate, bool) and truncate: print(self._jdf.showString(n, 20,False)) else: print(self._jdf.showString(n, int(truncate),False))