Data-BigData - 随笔分类 - 郝壹贰叁

[K8S] 01 - What is Kubernetes

摘要：热身知识一、基础知识前置课程：[Docker] 00 - What is Docker? Ref: 马哥Kubernetes教学视频完整版【貌似更好】 Docker --> SWARM （自家原生的），但k8s是可以自我独立的一套体系，替代SWARM。 Borg 内部系统 --> Go 语言版本阅读全文

posted @ 2021-01-12 16:35 郝壹贰叁阅读(186) 评论(0) 推荐(0)

[K8S] 00 - Why we use KubeFlow?

摘要：为什么学习k8s，因为需要用到kubeflow。不错的教程：【尚硅谷】【k8s】Kubernetes最新最细视频教程重磅发布前言历史 Tensorflow 从 0.8 版本开始支持分布式训练，至今为止，无论高阶还是低阶的 API，对分布式训练已经有了完善的支持。同时，Kubernetes 和阅读全文

posted @ 2021-01-12 12:59 郝壹贰叁阅读(318) 评论(0) 推荐(0)

[SageMaker] Preparing FSx Input for SageMaker

摘要：Preparing FSx Input for SageMaker Download and prepare your training dataset on S3. Follow the steps listed here to create a FSx linked with your S3 b 阅读全文

posted @ 2021-01-11 21:34 郝壹贰叁阅读(86) 评论(0) 推荐(0)

[SageMaker] Invoked by AWS Lambda

摘要：资源 Deploying to TensorFlow Serving Endpoints - 不大的参考价值 Table of Contents Deploying from an Estimator Deploying directly from model artifacts Making pr 阅读全文

posted @ 2021-01-09 16:05 郝壹贰叁阅读(244) 评论(0) 推荐(0)

[SageMaker] Built-in algorithms

摘要：一些资源类型：https://aws.amazon.com/ec2/instance-types/ Elastic Inference：便宜的GPU功能。核心步骤一、内置 docker images from sagemaker.amazon.amazon_estimator import g 阅读全文

posted @ 2021-01-08 10:06 郝壹贰叁阅读(232) 评论(0) 推荐(0)

[SageMaker] Custom Docker Containers with SageMaker Debugger

摘要：开启一个系列，有必要研读并实践：https://docs.aws.amazon.com/sagemaker/latest/dg/docker-containers.html 【1】Ref: amazon-sagemaker-examples/advanced_functionality/custom 阅读全文

posted @ 2020-12-17 11:16 郝壹贰叁阅读(138) 评论(0) 推荐(0)

[SageMaker] Multi-model endpoints

摘要：Ref: 通过使用 Amazon SageMaker 多模型终端节点节省推理成本 multi_model_bring_your_own multi_model_linear_learner_home_value multi_model_sklearn_home_value multi_model_x 阅读全文

posted @ 2020-12-09 19:44 郝壹贰叁阅读(93) 评论(0) 推荐(0)

[SageMaker] Computer Vision & Large Scale Training ***

摘要：SageMaker Fridays Season 2, Episode 6 - Computer vision & large scale training (November 2020) 图像，并且是重头开始训练，这就体现了distributed ml training的价值。 This proj 阅读全文

posted @ 2020-12-06 21:06 郝壹贰叁阅读(166) 评论(0) 推荐(0)

[SageMaker] Data Science on AWS by SageMaker

摘要：Chapter 1. Automated Machine Learning 热身例子一、是什么 Amazon SageMaker Autopilot Amazon SageMaker Autopilot automatically trains and tunes the best machine 阅读全文

posted @ 2020-12-04 17:52 郝壹贰叁阅读(191) 评论(0) 推荐(0)

[SageMaker] DNN with Amazon SageMaker

摘要：进化过程训练图像分类的课程：https://www.udemy.com/course/practical-aws-sagemaker-6-real-world-case-studies/ 一、Keras 传统例子构建与训练 import tensorflow as tf from tensorf 阅读全文

posted @ 2020-11-16 15:16 郝壹贰叁阅读(214) 评论(0) 推荐(0)

[Ray] 00 - Easy Distributed Computing

摘要：Ref: Easy Distributed Computing with Ray + Python Ref: https://github.com/ray-project/ray GitHub主页 Ray provides a simple, universal API for building d 阅读全文

posted @ 2020-11-16 10:51 郝壹贰叁阅读(208) 评论(0) 推荐(0)

[Hadoop] Zookeeper

摘要：Ref: 大数据zookeeper精讲视频课程 Ref: Zookeeper底层原理解析目的，这个东西，很多东西都基于此，有必要系统地了解下。前言一、做什么 Hive优化 -> MapReduce优化 MySQL优化 --> SQL语句的优化 Zookeeper是个啥？负责各个组件的协调服务。阅读全文

posted @ 2019-12-24 18:10 郝壹贰叁阅读(527) 评论(0) 推荐(0)

[Golang] What is Golang

摘要：一、Golang实现分布式数据库链接：https://www.zhihu.com/question/36947537/answer/69892403 Update：原问题还请教了有哪些开源项目可以参与实践，这个我了解不多，请有需要的看其它人的回答。 1. 相关的课程 Ref :Distribut 阅读全文

posted @ 2019-12-12 08:21 郝壹贰叁阅读(162) 评论(0) 推荐(0)

[CDH] Process data: integrate Spark with Spring Boot

摘要：Spark 数据处理一、Spark 在线计算可见，从Kafka传来的原始数据做一些“基本的处理后”，再存放如Redis中。简单统计Kafka流后写入Redis。三、Spark 离线计算既然是“离线”，数据就可以来源于HBase。简单统计后挖掘出一些有用的信息，比如如何为“虚拟车站”选址。阅读全文

posted @ 2019-12-09 17:41 郝壹贰叁阅读(445) 评论(0) 推荐(0)

[CDH] Redis: Remote Dictionary Server

摘要：基本概念一、安装 Redis: Remote Dictionary Server 远程字典服务使用ANSI C语言编写、支持网络、可基于内存亦可持久化的日志型、Key-Value数据库，并提供多种语言的API。其他接口支持：https://redis.io/clients 原代码下载：http 阅读全文

posted @ 2019-12-08 13:47 郝壹贰叁阅读(422) 评论(0) 推荐(0)

[CDH] Acquire data: Flume and Kafka

摘要：Flume 基本概念一、是什么 Ref: http://flume.apache.org/ 数据源获取：Flume、Google Refine、Needlebase、ScraperWiki、BloomReach 开源的日志系统，包括facebook的scribe，apache的chukwa，Lin 阅读全文

posted @ 2019-12-07 19:57 郝壹贰叁阅读(353) 评论(0) 推荐(0)

[CDH] New project for ML pipeline

摘要：启动后台服务: [CDH] Cloudera's Distribution including Apache Hadoop 这里只介绍一些基本的流程，具体操作还是需要实践代码。一、开发环境配置 JDK安装 Ref: Ubuntu安装jdk8的两种方式然后，Project Structure -- 阅读全文

posted @ 2019-12-01 12:46 郝壹贰叁阅读(170) 评论(0) 推荐(0)

[Flink] 01 - Apache Flink: Stateful Computations over Data Streams

摘要：Ref: Apache Flink® — Stateful Computations over Data Streams Ref: https://www.jianshu.com/p/01bb84c19723 一个解决方案就是提高数据加载频率从而实现近实时的更新。周级别的数据加载可以提升到天级别，天阅读全文

posted @ 2019-11-20 20:39 郝壹贰叁阅读(222) 评论(0) 推荐(0)

[ML] Machine Learning in the Common Infrastructure ecosystem

摘要：一、CogNet架构下图，可见Kafka的作用。 Partial code: Machine Learning in the Common Infrastructure ecosystem Release doc: http://www.cognet.5g-ppp.eu/wp-content/up 阅读全文

posted @ 2019-11-12 21:34 郝壹贰叁阅读(172) 评论(0) 推荐(0)

[ML] LIBSVM Data: Classification, Regression, and Multi-label

摘要：数据库下载：LIBSVM Data: Classification, Regression, and Multi-label 一、机器学习模型的参数模型所需的参数格式，有些为：LabeledPoint。官方示例：https://spark.apache.org/docs/2.4.4/mllib- 阅读全文

posted @ 2019-11-09 21:32 郝壹贰叁阅读(515) 评论(0) 推荐(0)

机器学习水很深

We all have two lives. The second one starts when we realize that we only have one. --- Tom Hiddleston

随笔分类 - Data-BigData

公告