摘要:
In this page, I am going to talk about the 'hello world' model that is linear regression and train it with 2 different ways. one is the "closed-form" 阅读全文
摘要:
Linear System Vector Equations The Matrix Equation Solution Sets of Linear Systems Linear Indenpendent Introduction to Linear Transformation The Matri 阅读全文
摘要:
If a tree is not balanced, it is not efficient and it is the same efficient as a linked list in the worst situation for seaching a given key. Self-bal 阅读全文
摘要:
How to install CDH on Cent OS 6.10 阅读全文
摘要:
Why? look at the following 2 pieces of code for implementing a simple web server based on socket, can you point out the problems(I put them in the com 阅读全文
摘要:
We are going to explain how join works in MR , we will focus on reduce side join and map side join. Reduce Side Join Assuming we have 2 datasets , one 阅读全文
摘要:
Map Reduce Application(Partitioninig/Group data by a defined key) Assuming we want to group data by the year(2008 to 2016) of their [last access date 阅读全文
摘要:
Top 10 IDs base on their value First , we need to set the reduce to 1. For each map task, it is not a good idea to output each key/value pair. Instead 阅读全文
摘要:
In this page, I will explain the following important MR concepts. 1) Job: how the job is inited , executed. 2) MR components: How they work to process 阅读全文
摘要:
为什么想用英文写了?我获取知识、技术的大部分途径都是通过英文,所以按照自己的理解用英文写下来也比较容易,另外,很多term都是不能翻译的,如果要持续学习技术和知识,那就不但要习惯去阅读,听,还要写,说。可惜从IBM出来后,很少有机会和人去说了,只能写了。就当提高自己英文水平吧 I am going 阅读全文
摘要:
HDFS架构 the core of HADOOP/distributed systems is storeage(HDFS) and resource manager(YARN) for computing engines built on it. Master/Slave: The charac 阅读全文
摘要:
Overview YARN provides API not for application developers but for the great developers working on new computing engines. YARN make it easy and unified 阅读全文
摘要:
例子:一个Binary Classifier 假设我们要预测图片中的数字是否为数字5。如下面代码。 X_train为训练集,每一个instance为一张28*28像素的图片,共784个features,每个feature代表某个像素的颜色强度(0-255之间)。y_train_5为label, bo 阅读全文
摘要:
线程安全定义 "A class is thread-safe if it behaves correctly when accessed from multiple threads, regardless of the scheduling or interleaving of the execut 阅读全文
摘要:
回归问题的典型性能度量是均方根误差(RMSE:Root Mean Square Error)。如下公式。 以上,我们使用小写斜体表示标量(m,y(i)),函数名(h)。小写粗体表示向量(x(i)). 大写粗体表示矩阵(X). 还有一种度量方法为: Mean Absolute Error. 理解起来也 阅读全文
摘要:
我们继关系型数据库事务一:概念之后,再聊聊隔离级别(Isolation Level)。 隔离级别是为了解决并发所带来的问题的,我们期望并发的结果跟串行化(一个之后接一个)一样。实际上,串行化(Serializability)是最强的隔离级别,能解决世间所有并发问题带来的痛苦。那还有什么好说的?不难想 阅读全文
摘要:
笔者在写上一篇文章Java并发简介 中脑子里面同时也闪烁着,程序中有并发问题,那数据库中也有类似问题吗? 让我们一起看一下吧! 事务是将一组读写操作组合在一起形成一个逻辑单元。这些操作要么全部执行成功提交(commit),要么全部中止失败(abort,rollback),不会留下一个中间状态的烂摊子 阅读全文
摘要:
从宏观方面,机器学习可以从不同角度来分类 是否在人类的干预/监督下训练。(supervised,unsupervised,semisupervised 以及 Reinforcement Learning) 是否可以增量学习 (在线学习,批量学习) 是否是用新数据和已知数据比较,还是在训练数据中发现一 阅读全文
摘要:
年轻的时候学会了“使用”Servlet后,感觉自己什么都会做了,之后就不停的写所谓的业务逻辑,框架(这里说的不是structs,spring等,就是说servlet)给人们屏蔽了很多复杂性(更别说构建在servlet上面的那些了),极易容易上手,上手之后就一直那样了...... 随着需求的变化和复杂 阅读全文