-
Spark for ML - Study Notes 5
-
Spark for ML - Study Notes 4
摘要:import pandas as pd # 示例数据 data = { "X1": [1, 2, 3, 4, 5], "X2": [2, 4, 6, 8, 10], # X2 是 X1 的两倍,完全共线 "X3": [5, 3, 4, 2, 1] } df = pd.DataFrame(data)
阅读全文
-
VSCode - How to stop terminal from inheriting virtual environments?
-
Spark for ML - Study Notes 3
摘要:from pyspark.sql import SparkSession from pyspark.ml.feature import HashingTF, IDF, Tokenizer spark = SparkSession.builder.appName("TF-IDF Example").g
阅读全文
-
Spark for ML - Study Notes 2
摘要:from pyspark.sql import SparkSession spark = SparkSession.builder.appName("HDFS Read Example").getOrCreate() # Define the HDFS path hdfs_path = "hdfs:
阅读全文
-
Spark for ML - Study Notes 1
-
Singular Value Decomposition (SVD)
摘要:Singular Value Decomposition (SVD) is a powerful mathematical technique used in linear algebra to factorize a matrix into three simpler matrices. It i
阅读全文
-
Pearson or Spearman correlation
摘要:Pearson vs. Spearman Correlation: When to Use Each? Both Pearson and Spearman correlations measure the relationship between two variables, but they ar
阅读全文
-
Spark
摘要:1. Download: https://spark.apache.org/downloads.html 2. Install: (base) zzh@ZZHPC:~/Downloads/sfw$ tar -xvzf spark-3.5.4-bin-hadoop3.tgz 3. Set enviro
阅读全文
-
Airflow - Study Notes 6
摘要:1. First, we will set up the imports that are required for the dashboard view: from __future__ import annotations from typing import TYPE_CHECKING fro
阅读全文
-
Airflow - AirflowTimetableInvalid: Exactly 5, 6 or 7 columns has to be specified for iterator expression.
摘要:from airflow.decorators import ( dag, task, ) from pendulum import datetime @dag( schedule="@none", start_date=datetime(2025, 1, 1), catchup=False, de
阅读全文
-
Airflow - Study Notes 5
摘要:To create a new connection, select the + to add a new record. from airflow.decorators import ( dag, task, ) from pendulum import datetime @dag( start_
阅读全文
-
Airflow - Study Notes 4
摘要:To retrieve these images, I frequently make use of the NASA Astronomy Picture of the Day API (https://apod.nasa.gov/apod/astropix.html) to gather a ne
阅读全文
-
Airflow - Study Notes 3
摘要:(.venv) frank@ZZHUBT:~/venvs/my_airflow_project$ airflow config get-value core executor SequentialExecutor
阅读全文
-
Airflow - Study Notes 2
摘要:rank@ZZHUBT:~ pipinstallairflowctlrank@ZZHUBT: airflowctl init my_airflow_project --build-start ...... ebserver | [2025-01-18 20:46:08 +0800] [1
阅读全文
-
Airflow - Study Notes 1
摘要:Apache Airflow is known within the data engineering community as the go-to open source platform for “developing, scheduling, and monitoring batch-orie
阅读全文
-
PySpark - Study Notes 2
摘要:For the purpose of this book, we use the Docker version of PySpark, running on a single machine. If you have a version of PySpark installed on a distr
阅读全文
-
PySpark - Study Notes 1
摘要:frank@ZZHUBT:~$ docker pull jupyter/pyspark-notebook docker run --name pyspark-notebook -p 8888:8888 -v ~/dkvols/pyspark-notebook/:/home/jovyan/work/
阅读全文
-
DuckDB - Study Notes 11
摘要:(zpy310) frank@ZZHUBT:~ pipinstallduckdb(zpy310)frank@ZZHUBT: pip install harlequin ...... Successfully installed MarkupSafe-3.0.2 click-8.1.8
阅读全文
-
DuckDB - Study Notes 10
摘要:import duckdb records = duckdb.read_csv("data/C11/Pedestrian_Counting_System_Monthly_counts_per_hour_may_2009_to_14_dec_2022.csv") records.show(max_wi
阅读全文
-
DuckDB - Study Notes 9
摘要:In this section, we are going to explore some of the DuckDB niceties – little shortcuts and tweaks to make your use of DuckDB easier. D CREATE OR REPL
阅读全文
-
Virtualbox - Virtualized CPU missing features AVX, AVX2 and FMA
摘要:(duckdb_book) frank@ZZHUBT:~$ python Python 3.13.1 (main, Jan 15 2025, 18:12:47) [GCC 11.4.0] on linux Type "help", "copyright", "credits" or "license
阅读全文
-
DuckDB - Study Notes 8
摘要:Requirements: duckdb emoji ibis-framework ibis-framework[duckdb] jupysql jupyter-lsp jupyterlab pandas plotly polars-lts-cpu pyarrow sqlparse pi_relat
阅读全文
-
VSCode - Python formatter doesn't work
摘要:I installed VSCode extension Black Formatter, but it seemed not work. Later, I found that it only work when there's no syntax error and it can't forma
阅读全文
-
DuckDB - Study Notes 7
摘要:import duckdb duckdb.sql("SELECT 'duck' AS animal, 'quack!' AS greeting") ┌─────────┬──────────┐ │ animal │ greeting │ │ varchar │ varchar │ ├────────
阅读全文
-
Pip - Installing plotly stuck
摘要:Installing plotly stuck in a Python venv virtual environment in WSL2 Ubuntu. pip install -i https://pypi.org/simple package_namepip install -i https:/
阅读全文
-
DuckDB - Study Notes 6
摘要:DuckDB’s nested data types: LIST, MAP, and STRUCT. D SELECT [7,8,9] AS list_int; ┌───────────┐ │ list_int │ │ int32[] │ ├───────────┤ │ [7, 8, 9] │ └─
阅读全文
-
DuckDB - Study Notes 5
摘要:D SELECT * FROM duckdb_extensions(); ┌──────────────────┬─────────┬───────────┬──────────────┬───┬───────────────────┬───────────────────┬────────────
阅读全文
-
Virtualbox - NAT Port Forwarding
摘要:Can VM communicate with the host in NAT mode? No, in NAT mode in VirtualBox, the virtual machine (VM) cannot directly communicate with the host unless
阅读全文
-
Ubuntu - How to update snap store
摘要:frank@ZZHUBT:~$ sudo snap refresh snap-store error: cannot refresh "snap-store": snap "snap-store" has running apps (snap-store), pids: 2979 frank@ZZH
阅读全文
-
Star Schema and Snowflake schema
摘要:The Star Schema and Snowflake Schema are two types of dimensional models used in data warehousing to organize data for analytical queries. Both schema
阅读全文
-
DuckDB - Study Notes 4
摘要:block range index, BRIN adaptive radix tree, ART To download the necessary dataset for this project, please follow these instructions: 1. Go to https:
阅读全文
-
DuckDB - Study Notes 3
摘要:Data Wrangling CREATE OR REPLACE TABLE web_log_text (raw_text VARCHAR); COPY web_log_text FROM 'access.log' (DELIM ''); SELECT regexp_extract(raw_text
阅读全文
-
DuckDB - Study Notes 2
摘要:Parameters The parameters listed below are used in the read_csv function to configure the CSV Rejects Table. NameDescriptionTypeDefault store_rejects
阅读全文
-
VSCode - Change default terminal from Powershell to WSL shell
摘要:To change the default terminal in Visual Studio Code (VSCode) to the WSL (Windows Subsystem for Linux) shell instead of PowerShell, follow these steps
阅读全文
-
VSCode - Can't open in WSL2 Ubuntu Desktop
摘要:frank@ZZHPC:~$ code --version To use Visual Studio Code with the Windows Subsystem for Linux, please install Visual Studio Code in Windows and uninsta
阅读全文
-
DuckDB - Study Notes 1
摘要:https://duckdb.org/docs/installation Windows: PS C:\Users\ZhangZhihui> winget install DuckDB.cli 已找到 DuckDB CLI [DuckDB.cli] 版本 1.1.3 此应用程序由其所有者授权给你。
阅读全文
-
Flameshot - Install on Windows
摘要:PS C:\Users\ZhangZhihui> winget install flameshot “msstore”源要求在使用前查看以下协议。 Terms of Transaction: https://aka.ms/microsoft-store-terms-of-transaction 源要
阅读全文
-
WSL2 Ubuntu has no software center
摘要:frank@ZZHPC:~ sudoaptlistgnome−software[sudo]passwordforfrank:Listing...Donegnome−software/noble46.0−1ubuntu2amd64frank@ZZHPC: frank@Z
阅读全文
-
WSL2 Ubuntu Desktop favorites bar not showing
摘要:The settings showed by clicking the settings icon: frank@ZZHPC:~$ sudo apt install gnome-shell-extension-dashtodock Reading package lists... Done Buil
阅读全文
-
WSL2 - Install Ubuntu Desktop and XRDP
摘要:Step 1 – Installing Ubuntu Desktop sudo apt updatesudo apt upgrade -y sudo apt install ubuntu-desktop -y 在GNOME安装过程中,“waiting for automatic snapd rest
阅读全文
-
WSL2 - Install an X Server and set DISPLAY
摘要:Since WSL2 doesn’t natively support GUI apps, you need an X server to render the GUI on Windows. Install VcXsrv: Download and install VcXsrv from here
阅读全文
-
WSL2 Installation
摘要:C:\Program Files\WSL是系统自带的,与下面WSL功能是否启用无关,不能将基删除或移走,否则系统服务WSL Service将无法运行,wsl命令也无法执行。 Enable the WSL feature: Install: PS C:\Users\ZhangZhihui> wsl -
阅读全文
-
VirtualBox - Create a Ubuntu Virtual Machine on Windows
摘要:The Paravirtualization Interface setting in VirtualBox determines how the guest operating system interacts with the underlying host system for perform
阅读全文
-
Adobe Reader - How to move bookmarks panel to the left side?
摘要:With the latest update to Adobe Acrobat, the bookmarks side panel that used to live on the left side of the screen has moved to the right. The only wa
阅读全文
-
新电脑Windows设置过程中如何跳过微软帐户注册
摘要:1、关闭网络 2、按Shift+F10打开命令行窗口 3、输入'oobe\bypassnro',回车,电脑会自动重启。 4、重启后在选择网络时会出现一个“我没有可用的internet连接“的连接,点击此连接就可以无须注册微软帐户继续设置。
阅读全文
-
VirtualBox - Permission denied when accessing the shared folder
摘要:zzh@ZZHUBT:~ cd/zzhwinbash:cd:/zzhwin:Permissiondeniedzzh@ZZHUBT: sudo su root@ZZHUBT:/home/zzh# cd /zzhwin root@ZZHUBT:/zzhwin# ls -al tota
阅读全文
|