1. 新建Maven项目
初始Maven项目完成后,初始的配置(pom.xml)如下:
2. 配置Maven
向项目里新建Spark Core库
xmlns:xsi="/2001/XMLSchema-instance"
xsi:schemaLocation="/POM/4.0.0 /xsd/maven-4.0.0.xsd">
4.0.0
net.libaoquan
TestSpark
1.0-SNAPSHOT
org.apache.spark
spark-core_2.11
2.2.1
3.新建Java类
新建Java类,写入Spark(Java API)代码:
import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;
public class TestSparkJava {
public static void main(String[] args) {
String logFile = "D:\\ab.txt";
SparkConf conf = new SparkConf().setMaster("local").setAppName("TestSpark");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD logData = sc.textFile(logFile).cache();
long numAs = logData.filter(new Function() {
public Boolean call(String s) { return s.contains("0"); }
}).count();
long numBs = logData.filter(new Function() {
public Boolean call(String s) { return s.contains("1"); }
}).count();
System.out.println("Lines with 0: " + numAs + ", lines with 1: " + numBs);
sc.stop();
}
}
运行项目,结果如下: