2025 年 2月随笔档案 - ZhangZhihuiAAA

Hive - Install Apache Hive and Apache Tez

摘要：Download Hive 4.0.1 from https://dlcdn.apache.org/hive/hive-4.0.1/apache-hive-4.0.1-bin.tar.gz . .bashrc: export HIVE_HOME=$sfw/hive-4.0.1 export HIVE 阅读全文

posted @ 2025-02-25 20:24 ZhangZhihuiAAA 阅读(13) 评论(0) 推荐(0) 编辑

Big Data Analytics with Apache Hadoop Study Notes 3

摘要：LLAP -- 1.Creating a Table: CREATE TABLE employees (emp_id INT, emp_name STRING, emp_salary DOUBLE) STORED AS ORC TBLPROPERTIES ('transactional'='true 阅读全文

posted @ 2025-02-25 19:53 ZhangZhihuiAAA 阅读(1) 评论(0) 推荐(0) 编辑

Hadoop - Setting up a Single Node Cluster in Pseudo-Distributed Mode

摘要：References: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html https://hadoop.apache.org/docs/r3.4.1/hadoop-pr 阅读全文

posted @ 2025-02-23 22:25 ZhangZhihuiAAA 阅读(12) 评论(0) 推荐(0) 编辑

Big Data Analytics with Apache Hadoop Study Notes 2

摘要：See https://www.cnblogs.com/zhangzhihui/p/18733011 . hdfs dfs -ls <path> : List files and directories in HDFS. zzh@ZZHPC:~$ hdfs dfs -ls / Found 2 ite 阅读全文

posted @ 2025-02-23 10:34 ZhangZhihuiAAA 阅读(2) 评论(0) 推荐(0) 编辑

Big Data Analytics with Apache Hadoop Study Notes 1

摘要：Hadoop is an open-source framework designed for distributed storage and processing of large datasets across clusters of computers. It provides a scala 阅读全文

posted @ 2025-02-22 21:46 ZhangZhihuiAAA 阅读(3) 评论(0) 推荐(0) 编辑

Neo4j - Cypher Manual Study Notes 7

摘要：Subqueries CALL subqueries The CALL clause can be used to invoke subqueries that execute operations within a defined scope, thereby optimizing data ha 阅读全文

posted @ 2025-02-20 18:29 ZhangZhihuiAAA 阅读(4) 评论(0) 推荐(0) 编辑

Neo4j - Cypher Manual Study Notes 6

摘要：SKIP SKIP (and its synonym OFFSET) defines from which row to start including the rows in the output. By using SKIP, the result set will get trimmed fr 阅读全文

posted @ 2025-02-19 21:32 ZhangZhihuiAAA 阅读(4) 评论(0) 推荐(0) 编辑

Neo4j - Cypher Manual Study Notes 5

摘要：SHOW FUNCTIONS Listing the available functions can be done with SHOW FUNCTIONS. Table 1. List functions output ColumnDescriptionType name The name of 阅读全文

posted @ 2025-02-19 17:53 ZhangZhihuiAAA 阅读(2) 评论(0) 推荐(0) 编辑

Neo4j - Cypher Manual Study Notes 4

摘要：MATCH Find nodes Find all nodes in a graph MATCH (n) RETURN n Find nodes with a specific label MATCH (movie:Movie) RETURN movie.title MATCH using node 阅读全文

posted @ 2025-02-18 22:03 ZhangZhihuiAAA 阅读(1) 评论(0) 推荐(0) 编辑

Neo4j - Cypher Manual Study Notes 3

摘要：CREATE Syntax for nodes You can bind each node to a variable that you can refer to later in the query. Multiple labels are separated by colons. CREATE 阅读全文

posted @ 2025-02-18 17:50 ZhangZhihuiAAA 阅读(26) 评论(0) 推荐(0) 编辑

Neo4j - Install Neo4j Desktop, connect a Neo4j Enterprise Database with it, install APOC plugin

摘要：zzh@ZZHPC:~/Downloads$ ./neo4j-desktop-1.6.1-x86_64.AppImage [08:57:25.594] [info] ● ● ● Starting Neo4j Desktop 1.6.1 @ Linux 6.8.0-52-generic, Intel( 阅读全文

posted @ 2025-02-18 09:02 ZhangZhihuiAAA 阅读(7) 评论(0) 推荐(0) 编辑

Neo4j - Install Neo4j Enterprise Edition

摘要：zzh@ZZHPC:~/Downloads/sfw$ sudo dpkg -i neo4j-enterprise_2025.01.0_all.deb (Reading database ... 403875 files and directories currently installed.) Pr 阅读全文

posted @ 2025-02-17 21:14 ZhangZhihuiAAA 阅读(6) 评论(0) 推荐(0) 编辑

Neo4j - Cypher Manual Study Notes 2

摘要：Clause composition The semantics of a whole query is defined by the semantics of its clauses. Each clause has as input the state of the graph and a ta 阅读全文

posted @ 2025-02-17 19:38 ZhangZhihuiAAA 阅读(10) 评论(0) 推荐(0) 编辑

Neo4j - Cypher Manual Study Notes 1

摘要：Built-in databases in Neo4j All Neo4j servers contain a built-in database called system, which behaves differently than all other databases. The syste 阅读全文

posted @ 2025-02-15 14:15 ZhangZhihuiAAA 阅读(11) 评论(0) 推荐(0) 编辑

Neo4j - Northwind Graph Guide

摘要：Load product catalog Load the product catalog data from external CSV files Northwind sells food products in a few categories provided by suppliers. Le 阅读全文

posted @ 2025-02-15 12:08 ZhangZhihuiAAA 阅读(1) 评论(0) 推荐(0) 编辑

Neo4j - Movie Graph Guide

摘要：Create nodes and relationships: CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'}) CREATE (Keanu:Person 阅读全文

posted @ 2025-02-14 18:18 ZhangZhihuiAAA 阅读(1) 评论(0) 推荐(0) 编辑

Vector Database - Study Notes 2

摘要：zzh@ZZHPC:~$ pip install langchain Collecting langchain Downloading langchain-0.3.18-py3-none-any.whl.metadata (7.8 kB) Collecting langchain-core<1.0. 阅读全文

posted @ 2025-02-11 20:39 ZhangZhihuiAAA 阅读(3) 评论(0) 推荐(0) 编辑

Vector Database - Study Notes 1

摘要：zzh@ZZHPC:~

w h i c h p i p / h o m e / z z h / v e n v s / z p y 313 / b i n / p i p z z h @ Z Z H P C :

$which pip /home/zzh/venvs/zpy313/bin/pip zzh@ZZHPC:~$ pip install chromadb Collecting chromadb Downloading chromadb-0.6.3-py3-none-any.wh 阅读全文

posted @ 2025-02-11 12:06 ZhangZhihuiAAA 阅读(2) 评论(0) 推荐(0) 编辑

Sqlite - Study Notes 3

摘要：Analyze and Optimize sqlite_stat1 is an internal table. It is not in the output of .tables. Suggested Pragmas Faster inserts Method 1 (a bit risky): M 阅读全文

posted @ 2025-02-11 10:39 ZhangZhihuiAAA 阅读(2) 评论(0) 推荐(0) 编辑

Sqlite - Study Notes 2

摘要：Flexible Typing Type is at cell level. Sqlite will convert the type to the declared type if it can without losing data. 5 data types: Type Affinity Co 阅读全文

posted @ 2025-02-10 16:01 ZhangZhihuiAAA 阅读(3) 评论(0) 推荐(0) 编辑

Sqlite - Study Notes 1

摘要：zzh@ZZHPC:~

s u d o a p t i n s t a l l s q l i t e 3 z z h @ Z Z H P C :

$sudo apt install sqlite3 zzh@ZZHPC:~$ sqlite3 SQLite version 3.37.2 2022-01-06 13:25:41 Enter ".help" for usage hints. Connected to a tra 阅读全文

posted @ 2025-02-10 13:29 ZhangZhihuiAAA 阅读(6) 评论(0) 推荐(0) 编辑

PySpark - Orchestration and Scheduling Data Pipeline with Databricks Workflows

摘要：In Databricks Community Edition, you cannot use Workflows because it is a premium feature that requires an upgraded subscription. This chapter and the 阅读全文

posted @ 2025-02-09 21:02 ZhangZhihuiAAA 阅读(3) 评论(0) 推荐(0) 编辑

PySpark - Performance Tuning in Delta Lake

摘要：from delta import configure_spark_with_delta_pip from pyspark.sql import SparkSession from pyspark.sql.functions import when, rand import timeit build 阅读全文

posted @ 2025-02-09 16:09 ZhangZhihuiAAA 阅读(6) 评论(0) 推荐(0) 编辑

PySpark - Performance Tuning with Apache Spark

摘要：from pyspark.sql import SparkSession # Create a new SparkSession spark = (SparkSession .builder .appName("monitor-spark-ui") .master("spark://ZZHPC:70 阅读全文

posted @ 2025-02-08 13:15 ZhangZhihuiAAA 阅读(6) 评论(0) 推荐(0) 编辑

PySpark - Processing Streaming Data

摘要：from delta import configure_spark_with_delta_pip, DeltaTable from pyspark.sql import SparkSession from pyspark.sql.functions import col, from_json fro 阅读全文

posted @ 2025-02-07 18:21 ZhangZhihuiAAA 阅读(3) 评论(0) 推荐(0) 编辑

PySpark - Ingesting Streaming Data

摘要：nc -lk 9999 from pyspark.sql import SparkSession from pyspark.sql.functions import explode, split spark = (SparkSession.builder .appName("config-strea 阅读全文

posted @ 2025-02-05 16:34 ZhangZhihuiAAA 阅读(2) 评论(0) 推荐(0) 编辑

PySpark - Setup a local Spark and Kafka environment

摘要：1. Download Spark 3.4.1 2. Download Java JDK 17 3. Setup Python virtual environment 3.11.9 .bashrc: sfw=~/Downloads/sfw zpy=~/venvs/zpy311 export JAVA 阅读全文

posted @ 2025-02-03 17:33 ZhangZhihuiAAA 阅读(11) 评论(0) 推荐(0) 编辑

PySpark - Manipulate Data with Delta Lake

摘要：from delta import configure_spark_with_delta_pip, DeltaTable from pyspark.sql import SparkSession builder = (SparkSession.builder .appName("create-del 阅读全文

posted @ 2025-02-03 12:56 ZhangZhihuiAAA 阅读(5) 评论(0) 推荐(0) 编辑

PySpark - Data Transformation and Data Manipulation

摘要：# Apply transform function to Numbers column df_transformed = ( df.select("category", "overallMotivation", "year", "laureates", transform(col("laureat 阅读全文

posted @ 2025-02-02 19:30 ZhangZhihuiAAA 阅读(2) 评论(0) 推荐(0) 编辑

Dockerfile - base, spark-base, spark-master, spark-worker, jupyterlab (Spark 3.5.4)

摘要：build.sh: #!/bin/bash # # -- Build Apache Spark Standalone Cluster Docker Images # # -- Variables # BUILD_DATE="$(date -u +'%Y-%m-%d')" SPARK_VERSION= 阅读全文

posted @ 2025-02-02 15:00 ZhangZhihuiAAA 阅读(7) 评论(0) 推荐(0) 编辑

PySpark - Data Ingestion and Data Extraction

摘要：from pyspark.sql.functions import flatten, collect_list # create a DataFrame with an array of arrays column df = spark.createDataFrame([ (1, [[1, 2], 阅读全文

posted @ 2025-02-01 22:45 ZhangZhihuiAAA 阅读(7) 评论(0) 推荐(0) 编辑

Dockerfile - base, spark-base, spark-master, spark-worker, jupyterlab (Spark 3.4.1)

摘要：build.sh: #!/bin/bash # # -- Build Apache Spark Standalone Cluster Docker Images # # -- Variables # BUILD_DATE="$(date -u +'%Y-%m-%d')" SPARK_VERSION= 阅读全文

posted @ 2025-02-01 20:24 ZhangZhihuiAAA 阅读(4) 评论(0) 推荐(0) 编辑

导航

搜索

常用链接

我的标签

随笔档案 (754)

阅读排行榜


Copyright © 2025 ZhangZhihuiAAA Powered by .NET 9.0 on Kubernetes 博客园