-
Hive - Install Apache Hive and Apache Tez
摘要:Download Hive 4.0.1 from https://dlcdn.apache.org/hive/hive-4.0.1/apache-hive-4.0.1-bin.tar.gz . .bashrc: export HIVE_HOME=$sfw/hive-4.0.1 export HIVE
阅读全文
-
Big Data Analytics with Apache Hadoop Study Notes 3
摘要:LLAP -- 1.Creating a Table: CREATE TABLE employees (emp_id INT, emp_name STRING, emp_salary DOUBLE) STORED AS ORC TBLPROPERTIES ('transactional'='true
阅读全文
-
Hadoop - Setting up a Single Node Cluster in Pseudo-Distributed Mode
摘要:References: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html https://hadoop.apache.org/docs/r3.4.1/hadoop-pr
阅读全文
-
Big Data Analytics with Apache Hadoop Study Notes 2
摘要:See https://www.cnblogs.com/zhangzhihui/p/18733011 . hdfs dfs -ls <path> : List files and directories in HDFS. zzh@ZZHPC:~$ hdfs dfs -ls / Found 2 ite
阅读全文
-
Big Data Analytics with Apache Hadoop Study Notes 1
摘要:Hadoop is an open-source framework designed for distributed storage and processing of large datasets across clusters of computers. It provides a scala
阅读全文
-
Neo4j - Cypher Manual Study Notes 7
摘要:Subqueries CALL subqueries The CALL clause can be used to invoke subqueries that execute operations within a defined scope, thereby optimizing data ha
阅读全文
-
Neo4j - Cypher Manual Study Notes 6
摘要:SKIP SKIP (and its synonym OFFSET) defines from which row to start including the rows in the output. By using SKIP, the result set will get trimmed fr
阅读全文
-
Neo4j - Cypher Manual Study Notes 5
摘要:SHOW FUNCTIONS Listing the available functions can be done with SHOW FUNCTIONS. Table 1. List functions output ColumnDescriptionType name The name of
阅读全文
-
Neo4j - Cypher Manual Study Notes 4
摘要:MATCH Find nodes Find all nodes in a graph MATCH (n) RETURN n Find nodes with a specific label MATCH (movie:Movie) RETURN movie.title MATCH using node
阅读全文
-
Neo4j - Cypher Manual Study Notes 3
摘要:CREATE Syntax for nodes You can bind each node to a variable that you can refer to later in the query. Multiple labels are separated by colons. CREATE
阅读全文
-
Neo4j - Install Neo4j Desktop, connect a Neo4j Enterprise Database with it, install APOC plugin
摘要:zzh@ZZHPC:~/Downloads$ ./neo4j-desktop-1.6.1-x86_64.AppImage [08:57:25.594] [info] ● ● ● Starting Neo4j Desktop 1.6.1 @ Linux 6.8.0-52-generic, Intel(
阅读全文
-
Neo4j - Install Neo4j Enterprise Edition
摘要:zzh@ZZHPC:~/Downloads/sfw$ sudo dpkg -i neo4j-enterprise_2025.01.0_all.deb (Reading database ... 403875 files and directories currently installed.) Pr
阅读全文
-
Neo4j - Cypher Manual Study Notes 2
摘要:Clause composition The semantics of a whole query is defined by the semantics of its clauses. Each clause has as input the state of the graph and a ta
阅读全文
-
Neo4j - Cypher Manual Study Notes 1
摘要:Built-in databases in Neo4j All Neo4j servers contain a built-in database called system, which behaves differently than all other databases. The syste
阅读全文
-
Neo4j - Northwind Graph Guide
摘要:Load product catalog Load the product catalog data from external CSV files Northwind sells food products in a few categories provided by suppliers. Le
阅读全文
-
Neo4j - Movie Graph Guide
摘要:Create nodes and relationships: CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'}) CREATE (Keanu:Person
阅读全文
-
Vector Database - Study Notes 2
摘要:zzh@ZZHPC:~$ pip install langchain Collecting langchain Downloading langchain-0.3.18-py3-none-any.whl.metadata (7.8 kB) Collecting langchain-core<1.0.
阅读全文
-
Vector Database - Study Notes 1
摘要:zzh@ZZHPC:~ whichpip/home/zzh/venvs/zpy313/bin/pipzzh@ZZHPC: pip install chromadb Collecting chromadb Downloading chromadb-0.6.3-py3-none-any.wh
阅读全文
-
Sqlite - Study Notes 3
摘要:Analyze and Optimize sqlite_stat1 is an internal table. It is not in the output of .tables. Suggested Pragmas Faster inserts Method 1 (a bit risky): M
阅读全文
-
Sqlite - Study Notes 2
摘要:Flexible Typing Type is at cell level. Sqlite will convert the type to the declared type if it can without losing data. 5 data types: Type Affinity Co
阅读全文
-
Sqlite - Study Notes 1
摘要:zzh@ZZHPC:~ sudoaptinstallsqlite3zzh@ZZHPC: sqlite3 SQLite version 3.37.2 2022-01-06 13:25:41 Enter ".help" for usage hints. Connected to a tra
阅读全文
-
PySpark - Orchestration and Scheduling Data Pipeline with Databricks Workflows
摘要:In Databricks Community Edition, you cannot use Workflows because it is a premium feature that requires an upgraded subscription. This chapter and the
阅读全文
-
PySpark - Performance Tuning in Delta Lake
摘要:from delta import configure_spark_with_delta_pip from pyspark.sql import SparkSession from pyspark.sql.functions import when, rand import timeit build
阅读全文
-
PySpark - Performance Tuning with Apache Spark
摘要:from pyspark.sql import SparkSession # Create a new SparkSession spark = (SparkSession .builder .appName("monitor-spark-ui") .master("spark://ZZHPC:70
阅读全文
-
PySpark - Processing Streaming Data
摘要:from delta import configure_spark_with_delta_pip, DeltaTable from pyspark.sql import SparkSession from pyspark.sql.functions import col, from_json fro
阅读全文
-
PySpark - Ingesting Streaming Data
摘要:nc -lk 9999 from pyspark.sql import SparkSession from pyspark.sql.functions import explode, split spark = (SparkSession.builder .appName("config-strea
阅读全文
-
PySpark - Setup a local Spark and Kafka environment
摘要:1. Download Spark 3.4.1 2. Download Java JDK 17 3. Setup Python virtual environment 3.11.9 .bashrc: sfw=~/Downloads/sfw zpy=~/venvs/zpy311 export JAVA
阅读全文
-
PySpark - Manipulate Data with Delta Lake
摘要:from delta import configure_spark_with_delta_pip, DeltaTable from pyspark.sql import SparkSession builder = (SparkSession.builder .appName("create-del
阅读全文
-
PySpark - Data Transformation and Data Manipulation
摘要:# Apply transform function to Numbers column df_transformed = ( df.select("category", "overallMotivation", "year", "laureates", transform(col("laureat
阅读全文
-
Dockerfile - base, spark-base, spark-master, spark-worker, jupyterlab (Spark 3.5.4)
摘要:build.sh: #!/bin/bash # # -- Build Apache Spark Standalone Cluster Docker Images # # -- Variables # BUILD_DATE="$(date -u +'%Y-%m-%d')" SPARK_VERSION=
阅读全文
-
PySpark - Data Ingestion and Data Extraction
摘要:from pyspark.sql.functions import flatten, collect_list # create a DataFrame with an array of arrays column df = spark.createDataFrame([ (1, [[1, 2],
阅读全文
-
Dockerfile - base, spark-base, spark-master, spark-worker, jupyterlab (Spark 3.4.1)
摘要:build.sh: #!/bin/bash # # -- Build Apache Spark Standalone Cluster Docker Images # # -- Variables # BUILD_DATE="$(date -u +'%Y-%m-%d')" SPARK_VERSION=
阅读全文
|