4.1 大数据汇总
# 环境&安装
## Hadoop安装报错:HADOOP_HOME and hadoop.home.dir are unset
https://blog.csdn.net/qq_43470725/article/details/136615113
## Spark安装报错:java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0
https://stackoverflow.com/questions/57808230/spark-on-windows-java-lang-unsatisfiedlinkerror-org-apache-hadoop-io-nativeio
## HBase安装问题
https://www.cnblogs.com/shan333/p/15386771.html
## PySpark安装
https://www.cnblogs.com/rhgaiymm/p/12892710.html
https://stackoverflow.com/questions/64169977/modulenotfounderror-no-module-named-pyspark
https://stackoverflow.com/questions/68246173/python-was-not-found-but-can-be-installed-when-using-spark-submit-on-windows
## PySpark报错:Unable to load multiple json files with pyspark
https://stackoverflow.com/questions/71376495/unable-to-load-multiple-json-files-with-pyspark
## PySpark常用函数
https://www.jianshu.com/p/2964bf816efc
## MapReduce基本原理
https://blog.csdn.net/weixin_45366499/article/details/106892489
https://blog.csdn.net/weixin_43542605/article/details/122288056
https://blog.csdn.net/Shockang/article/details/117970151
https://blog.csdn.net/qq_45725767/article/details/120956256
## Spark与MapReduce比较
https://www.zhihu.com/question/31930662
## MapReduce倒排索引
参考https://www.cnblogs.com/zll20153246/p/9334857.html
## MapReduce表连接操作
参考https://blog.csdn.net/chuyouyinghe/article/details/78845364
## CAP理论介绍
参考http://www.ruanyifeng.com/blog/2018/07/cap.html
https://www.zhihu.com/question/54105974
## 厦门大学课程
参考http://dblab.xmu.edu.cn/blog/?s=python
http://dblab.xmu.edu.cn/post/spark-python/
## importing pyspark in python shell
参考https://stackoverflow.com/questions/23256536/importing-pyspark-in-python-shell
## Spark基础使用
https://spark.apache.org/docs/latest/quick-start.html
## Spark中的宽依赖和窄依赖
https://blog.csdn.net/houmou/article/details/52531205
## Spark中的RDD
https://blog.csdn.net/Zsusan7/article/details/121920810
https://blog.csdn.net/Python_Ai_Road/article/details/111940472
https://blog.csdn.net/olizxq/article/details/118276930
## Spark中的DataFrame
https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_df.html
https://blog.csdn.net/ljp7759325/article/details/124135234
## Spark SQL
https://blog.csdn.net/m0_46917254/article/details/123959257
## DataFrame数据写出
https://blog.csdn.net/feizuiku0116/article/details/121527042
## 数据倾斜
https://zhuanlan.zhihu.com/p/376286414
https://zhuanlan.zhihu.com/p/449471866
## Hive、Spark SQL任务参数调优
https://www.jianshu.com/p/2964bf816efc
## HiveSQL编译过程
https://tech.meituan.com/2014/02/12/hive-sql-to-mapreduce.html
## Spark OOM解决办法
https://www.jianshu.com/p/1e3472cb033d
https://cloud.tencent.com/developer/article/2109043
## Spark SQL合并产生的小文件
https://blog.csdn.net/Jerry_991/article/details/95773902
## Spark写表覆盖指定分区
https://huaweicloud.csdn.net/63357acfd3efff3090b58903.html
https://blog.csdn.net/lovetechlovelife/article/details/114544073
## Hadoop Shell命令
https://blog.csdn.net/m0_52879657/article/details/124633808
## PickleException: expected zero arguments for construction of ClassDict (for numpy.dtype)
float与np.float64有区别, https://stackoverflow.com/questions/53800062/expected-zero-arguments-for-construction-of-classdict-for-numpy-dtype-when-c
## ES与Hive区别
https://blog.51cto.com/u_16213453/11792522
posted on 2022-04-07 22:36 Hiteration 阅读(34) 评论(0) 编辑 收藏 举报
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· winform 绘制太阳,地球,月球 运作规律
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人