4.1 大数据汇总

# 环境&安装

## hbase安装问题

  https://www.cnblogs.com/shan333/p/15386771.html

## pyspark安装

  https://www.cnblogs.com/rhgaiymm/p/12892710.html

  https://stackoverflow.com/questions/64169977/modulenotfounderror-no-module-named-pyspark

## MapReduce基本原理

  https://blog.csdn.net/weixin_45366499/article/details/106892489

  https://blog.csdn.net/weixin_43542605/article/details/122288056

  https://blog.csdn.net/Shockang/article/details/117970151

  https://blog.csdn.net/qq_45725767/article/details/120956256

## Spark与MapReduce比较

  https://www.zhihu.com/question/31930662

## MapReduce倒排索引

  参考https://www.cnblogs.com/zll20153246/p/9334857.html

## MapReduce表连接操作

  参考https://blog.csdn.net/chuyouyinghe/article/details/78845364

## CAP理论介绍

  参考http://www.ruanyifeng.com/blog/2018/07/cap.html

  https://www.zhihu.com/question/54105974

## 厦门大学课程

  参考http://dblab.xmu.edu.cn/blog/?s=python

  http://dblab.xmu.edu.cn/post/spark-python/

## importing pyspark in python shell

  参考https://stackoverflow.com/questions/23256536/importing-pyspark-in-python-shell

## Spark基础使用

  https://spark.apache.org/docs/latest/quick-start.html

## Spark中的宽依赖和窄依赖

  https://blog.csdn.net/houmou/article/details/52531205

## Spark中的RDD

  https://blog.csdn.net/Zsusan7/article/details/121920810

  https://blog.csdn.net/Python_Ai_Road/article/details/111940472

  https://blog.csdn.net/olizxq/article/details/118276930

## Spark中的DataFrame

  https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_df.html

  https://blog.csdn.net/ljp7759325/article/details/124135234

## Spark SQL

  https://blog.csdn.net/m0_46917254/article/details/123959257

## DataFrame数据写出

  https://blog.csdn.net/feizuiku0116/article/details/121527042

## 数据倾斜

  https://zhuanlan.zhihu.com/p/376286414

  https://zhuanlan.zhihu.com/p/449471866

## HiveSQL编译过程

  https://tech.meituan.com/2014/02/12/hive-sql-to-mapreduce.html

## Spark OOM解决办法

  https://www.jianshu.com/p/1e3472cb033d

  https://cloud.tencent.com/developer/article/2109043

## Spark SQL合并产生的小文件

  https://blog.csdn.net/Jerry_991/article/details/95773902

## Spark写表覆盖指定分区

  https://huaweicloud.csdn.net/63357acfd3efff3090b58903.html

  https://blog.csdn.net/lovetechlovelife/article/details/114544073

## Hadoop Shell命令

  https://blog.csdn.net/m0_52879657/article/details/124633808

## PickleException: expected zero arguments for construction of ClassDict (for numpy.dtype)

  float与np.float64有区别, https://stackoverflow.com/questions/53800062/expected-zero-arguments-for-construction-of-classdict-for-numpy-dtype-when-c

posted on 2022-04-07 22:36  Hiteration  阅读(28)  评论(0编辑  收藏  举报

导航