spark并行度加载关系数据库

方法一:针对整形字段ECI进行并行度加载:并行度为3

1         SparkConf sparkConf = new SparkConf();
2         sparkConf.setAppName("jdbc").setMaster("local[4]");
3         JavaSparkContext jsc = new JavaSparkContext(sparkConf);
4         SQLContext sc = new SQLContext(jsc);
5         String url ="jdbc:sqlserver://192.168.1.101;DatabaseName=database;user=user;password=123456";
6         String tableName = "tb_city";
7         Properties connectionProperties  = new Properties();
8         connectionProperties.put("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver");
9         DataFrame table = sc.read().jdbc(url,tableName,"ECI",125883650,263780907,3,connectionProperties).select("CityID","IMSI","ECI");

方法二:针对varchar字段IMSI进行并行度加载:并行度为3

 1         SparkConf sparkConf = new SparkConf();
 2         sparkConf.setAppName("jdbc").setMaster("local[4]");
 3         JavaSparkContext jsc = new JavaSparkContext(sparkConf);
 4         SQLContext sc = new SQLContext(jsc);
 5         String url ="jdbc:sqlserver://192.168.1.101;DatabaseName=database;user=user;password=123456";
 6         String tableName = "tb_city";
 7         Properties connectionProperties  = new Properties();
 8         connectionProperties.put("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver");
 9         String[] predicates = new String[]{
10                 "IMSI >='105156335255615' AND IMSI <='145437785776944'",
11                 "IMSI >='145441560321876' AND IMSI <='145441636521493'",
12                 "IMSI >'145441636521493' AND IMSI <='145464988025176'",
13         };
14         DataFrame table = sc.read().jdbc(url,tableName,predicates,connectionProperties).select("CityID","IMSI");

predicates内是筛选条件。三个筛选条件对应三个分区。

posted @ 2020-01-15 17:14  carsonwuu  阅读(336)  评论(0编辑  收藏  举报