[GPU] Install H2O.ai

一、前言

主页:https://www.h2o.ai/products/h2o4gpu/

GPU版本安装:h2oai/h2o4gpu

 

采用GPU,能否成为超越下面链接中实验的存在?

[ML] LIBSVM Data: Classification, Regression, and Multi-label

 

Solver Classes

Among others, the solver can be used for the following classes of problems

    • GLM: Lasso, Ridge Regression, Logistic Regression, Elastic Net Regulariation
    • KMeans
    • Gradient Boosting Machine (GBM) via XGBoost
    • Singular Value Decomposition(SVD) + Truncated Singular Value Decomposition
    • Principal Components Analysis(PCA)

Real time bench mark: https://www.youtube.com/watch?v=LrC3mBNG7WU,速度快二十倍。

 

 

二、安装

 注意事项:安装升级驱动时,先切换为x-windows状态;安装cuda时,不安装自带的驱动,因为之前已经安装过了。

复制代码
hadoop@unsw-ThinkPad-T490:~/NVIDIA_CUDA-10.1_Samples/bin/x86_64/linux/release$ nvidia-smi
Thu Nov 14 10:59:21 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.31       Driver Version: 440.31       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce MX250       Off  | 00000000:3C:00.0 Off |                  N/A |
| N/A   58C    P0    N/A /  N/A |    390MiB /  2002MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1728      G   /usr/lib/xorg/Xorg                           190MiB |
|    0      1906      G   /usr/bin/gnome-shell                         136MiB |
|    0      2664      G   ...uest-channel-token=12816552660085767439    59MiB |
+-----------------------------------------------------------------------------+
复制代码

 

 

三、测试

当迭代更多次时,h2o的优势开始显现;至于“预测”,cpu已经非常快。

复制代码
import os
import time
from sklearn.linear_model import MultiTaskLasso, Lasso
from sklearn.datasets import load_svmlight_file
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error

import h2o4gpu
import h2o4gpu.util.import_data as io
import h2o4gpu.util.metrics as metrics
import pandas as pd
import numpy as np

#from joblib import Memory
#mem = Memory("./mycache")

# This maybe a tricky way to load files.
##@mem.cache
def get_data():
    data = load_svmlight_file("/home/hadoop/YearPredictionMSD")
    return data[0], data[1]

print("Loading data.")
train_x, train_y = load_svmlight_file("/home/hadoop/YearPredictionMSD")
train_x = train_x.todense()

test_x, test_y = load_svmlight_file("/home/hadoop/YearPredictionMSD.t")
test_x = test_x.todense()


for max_iter in [100, 500, 1000, 2000, 4000, 8000]:
    print("="*80)
    print("Setting up solver, msx_iter is {}".format(max_iter))
    model = h2o4gpu.Lasso(alpha=0.01, fit_intercept=False, max_iter=max_iter)
    #model = Lasso(alpha=0.1, fit_intercept=False, max_iter=500)
    
    
    time_start=time.time()
    model.fit(train_x, train_y)
    time_end=time.time()
    print('train totally cost {} sec'.format(time_end-time_start))
    
    time_start=time.time()
    y_pred_lasso = model.predict(test_x)
    y_pred_lasso = np.squeeze(y_pred_lasso)
    time_end=time.time()
    print('test totally cost {} sec'.format(time_end-time_start))
    
    
    print(y_pred_lasso.shape )
    print(test_y.shape )
    
    print(y_pred_lasso[:10])
    print(test_y[:10])
    
    mse = mean_squared_error(test_y, y_pred_lasso)
    print("mse on test data : %f" % mse)
    r2_score_lasso = r2_score(test_y, y_pred_lasso)
    print("r^2 on test data : %f" % r2_score_lasso)
复制代码

 

End.

posted @   郝壹贰叁  阅读(329)  评论(0编辑  收藏  举报
编辑推荐:
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律
点击右上角即可分享
微信分享提示