lakefs 类似git 的文件对象存储
lakefs 是基于golang编写的兼容git 的对象存储
特性
包含了git 的能力,同时方便集成数据ci/cd,同时可以方便的和现有的数据技术栈集成
使用
基于docker-compose运行
- docker-compose 文件
version: '3'
services:
lakefs:
image: "treeverse/lakefs:${VERSION:-latest}"
ports:
- "8000:8000"
depends_on:
- "postgres"
environment:
- LAKEFS_AUTH_ENCRYPT_SECRET_KEY=${LAKEFS_AUTH_ENCRYPT_SECRET_KEY:-some random secret string}
- LAKEFS_DATABASE_CONNECTION_STRING=${LAKEFS_DATABASE_CONNECTION_STRING:-postgres://lakefs:lakefs@postgres/postgres?sslmode=disable}
- LAKEFS_BLOCKSTORE_TYPE=${LAKEFS_BLOCKSTORE_TYPE:-local}
- LAKEFS_BLOCKSTORE_LOCAL_PATH=${LAKEFS_BLOCKSTORE_LOCAL_PATH:-/home/lakefs}
- LAKEFS_GATEWAYS_S3_DOMAIN_NAME=${LAKEFS_GATEWAYS_S3_DOMAIN_NAME:-s3.local.lakefs.io:8000}
- LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID:-}
- LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_SECRET_KEY=${AWS_SECRET_ACCESS_KEY:-}
- LAKEFS_LOGGING_LEVEL=${LAKEFS_LOGGING_LEVEL:-INFO}
- LAKEFS_STATS_ENABLED
- LAKEFS_BLOCKSTORE_S3_ENDPOINT
- LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE
- LAKEFS_COMMITTED_LOCAL_CACHE_DIR=${LAKEFS_COMMITTED_LOCAL_CACHE_DIR:-/home/lakefs/.local_tier}
entrypoint: ["/app/wait-for", "postgres:5432", "--", "/app/lakefs", "run"]
postgres:
image: "postgres:${PG_VERSION:-12}"
command: "-c log_min_messages=FATAL"
environment:
POSTGRES_USER: lakefs
POSTGRES_PASSWORD: lakefs
- 启动
docker-compose up -d
- 效果
首次需要使用lakefs 的cli 进行初始化(需要进入容器)
lakefs init --user-name dalongdemo
参考效果
- 数据使用
lakefs 对于数据的上传和普通s3的操作是一样的,只是为了方便进行类似git 的管理提供了一个cli 程序,同时lakefs 也可以使用s3做为底层存储
参考配置
blockstore:
type: s3
s3:
force_path_style: true
endpoint: http://<minio_endpoint>:9000
credentials:
access_key_id: <minio_access_key>
secret_access_key: <minio_secret_key>
参考架构
说明
基于git 模式的数据管理工具还是很多的,dolthub,nessie 都是很不错的
参考资料
https://github.com/treeverse/lakeFS
https://lakefs.io/
https://docs.lakefs.io/reference/configuration.html#example-minio
https://docs.lakefs.io/reference/configuration.html#using-environment-variables
https://projectnessie.org/
https://github.com/projectnessie/nessie
https://www.dremio.com/introducing-project-nessie