Java Spark 简单示例(四)Spark SQL JDBC

字数 217阅读 648

本篇介绍Spark SQL如何连接JDBC数据库(我以本地安装的mysql为例)

Maven 中引入


    mysql
    mysql-connector-java
    6.0.5

代码示例

package com.yzy.spark;

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

public class demo5 {
    private static String appName = "spark.sql.demo";
    private static String master = "local[*]";

    public static void main(String[] args) {
        SparkSession spark = SparkSession
                .builder()
                .appName(appName)
                .master(master)
                .getOrCreate();

        Dataset df = spark.read()
                .format("jdbc")
                .option("url", "jdbc:mysql://localhost:3306/smvc")
                .option("dbtable", "user")
                .option("user", "root")
                .option("pass" + "word", "123456")
                .load();

        df.printSchema();
    }
}

运行结果:

root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- nicename: string (nullable = true)
 |-- age: integer (nullable = true)
 |-- pwd: string (nullable = true)

遇到的问题
1..option("pass" + "word", "123456")这样写是因为编译器的sonar 检查不过,password 是敏感字段。

2..format("jdbc"),如果忘记加上会报错:ConnectedComponents: error 'Unable to infer schema for Parquet. It must be specified manually.意思是必须手动指定我要连接的是jdbc。

3..option("url", "jdbc:mysql://localhost:3306/smvc")会报错:java.sql.SQLException:The server time zone value '?й???????' is unrecognized or represents more than onetime zone. You must configure either the server or JDBC driver (via theserverTimezone configuration property) to use a more specifc time zone value ifyou want to utilize time zone support. 解决办法参考此文
value 修改为jdbc:mysql://localhost:3306/smvc?useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai 即可

推荐阅读更多精彩内容