1.参数设置(这两个参数我不怎么清楚什么意思,查了半天也没查到,有谁知道的务必不吝指教!!)
-K -- Use a kernel density estimator for numeric attributes rather than a normal distribution.对数值属性使用核密度估计代替正态分布
-D -- Use
supervised discretization to convert numeric attributes to nominal
ones.
这两者只能2选一
2.一开始加载arff文件是报错
原来是文件中属性有重名的,修改一下即可。
3.如果直接运行的话会因为内存不够而产生异常(数据量太大)
Exception in thread "main"
java.lang.OutOfMemoryError: Java heap space
at weka.core.Instances.sort(Instances.java:1231)
at weka.core.Instances.sort(Instances.java:1251)
at
weka.classifiers.bayes.NaiveBayes.buildClassifier(NaiveBayes.java:247)
at pers.NaiveBayesTest.main(NaiveBayesTest.java:37)
solution:右击要运行的class->Run
as->Run Cinfigurations->(X)=Argument
在VM
argument输入-Xmx512m即可运行。
4.代码如下:
package
pers;
import
java.io.File;
import
weka.classifiers.Classifier;
import
weka.classifiers.Evaluation;
import
weka.classifiers.bayes.NaiveBayes;
import
weka.core.converters.ArffLoader;
import
weka.core.Instances;
public class NaiveBayesTest
{
public
static
void
main(String[] args) throws Exception
{
// TODO Auto-generated method
stub
ArffLoader atf = new ArffLoader();
//Reads a source that is in
arff (attribute relation file format)
format.
File inputFile = new
File("Amazon_initial_50_30_10000.arff");//读入训练文件
atf.setFile(inputFile);
Instances instancesTrain = atf.getDataSet(); // 得到格式化的训练数据
instancesTrain.setClassIndex(instancesTrain.numAttributes()-1);//设置分类属性所在行号(第一行为0号),instancesTrain.numAttributes()可以取得属性总数
inputFile = new
File("Amazon_initial_50_30_10000test.arff");//读入测试文件
atf.setFile(inputFile);
Instances instancesTest = atf.getDataSet(); // 得到格式化的测试数据
instancesTest.setClassIndex(instancesTest.numAttributes() -
1); //设置分类属性所在行号(第一行为0号),instancesTest.numAttributes()可以取得属性总数
Classifier m_classifier = new
NaiveBayes();//用以建立一个naive
bayes分类器
String options[]=new
String[1];//训练参数数组
//options[0]="-K";//Use a
kernel estimator for numeric attributes rather than a normal
distribution.
//options[0]="-D";//Use
supervised discretization to convert numeric attributes to
nominal ones.
//m_classifier.setOptions(options);//设置训练参数
m_classifier.buildClassifier(instancesTrain); //训练
Evaluation eval = new
Evaluation(instancesTrain); //构造评价器
eval.evaluateModel(m_classifier, instancesTest);//用测试数据集来评价m_classifier
System.out.println(eval.toSummaryString("===
Summary ===\n",false));
//输出信息
System.out.println(eval.toMatrixString("===
Confusion Matrix ===\n"));//Confusion Matrix
}
}
训练数据和测试数据我用的是同一个文件:/redir?resid=DF3DA11703A9CA4C!164&authkey=!ACQWVXfn2asHy7I&ithint=file,.rar
项目文件我就懒得传了,和之前的decision
tree的类似,加入weka的jar包勿忘。