.mat,.txt,.csv 資料轉換為weka中的arff格式及matlab和Weka之間相互轉換格式
阿新 • • 發佈:2019-01-23
function r = CSVtoARFF (data, relation, type)
% csv to arff file converter
% load the csv data
[rows cols] = size(data);
% open the arff file for writing
farff = fopen(strcat(type,'.arff'), 'w');
% print the relation part of the header
fprintf(farff, '@relation %s', relation);
% Reading from the ARFF header
fid = fopen('ARFFheader.txt','r');
tline = fgets(fid);
while ischar(tline)
tline = fgets(fid);
fprintf(farff,'%s',tline);
end
fclose(fid);
% Converting the data
for i = 1 : rows
% print the attribute values for the data point
for j = 1 : cols - 1
if data(i,j) ~= -1 % check if it is a missing value
fprintf(farff, '%d,', data(i,j));
else
fprintf(farff, '?,');
end
end
% print the label for the data point
fprintf(farff, '%d\n', data(i,end));
end
% close the file
fclose(farff);
r = 0;
該方法的不足之處就是要單獨提供ARFFheader.txt ,很多情況下,該表頭需要人工新增(屬性少時),但當屬性大時,相對較麻煩,還是可以通過程式迴圈新增。
function Mat2Arff('input_filename','arff_filename')
%
% This function is used to convert the input data to '.arff'
% file format,which is compatible to weka file format ...
%
% Parameters:
% input_filename -- Input file name,only can conversion '.mat','.txt'
% or '.csv' file format ...
% arff_filename -- the output '.arff' file ...
% NOTEs:
%The input 'M*N' file data must be the following format:
% M: sampel numbers;
% N: sample features and label,"1:N-1" -- features, "N" - sample label ...
% 讀取檔案資料 ...
if strfind(input_filename,'.mat')
matdata = importdata(input_filename);
elseif strfind(input_filename,'.txt')
matdata = textread(input_filename) ;
elseif strfind(input_filename,'.csv')
matdata = csvread(input_filename);
end;
[row,col] = size(matdata);
f = fopen(arff_filename,'wt');
if (f < 0)
error(sprintf('Unable to open the file %s',arff_filename));
return;
end;
fprintf(f,'%s\n',['@relation ',arff_filename]);
for i = 1 : col - 1
st = ['@attribute att_',num2str(i),' numeric'];
fprintf(f,'%s\n',st);
end;
% 儲存檔案頭最後一行類別資訊
floatformat = '%.16g';
Y = matdata(:,col);
uY = unique(Y); % 得到label型別
st = ['@attribute label {'];
for j = 1 : size(uY) - 1
st = [st sprintf([floatformat ' ,'],uY(j))];
end;
st = [st sprintf([floatformat '}'],uY(length(uY)))];
fprintf(f,'%s\n\n',st);
% 開始儲存資料 ...
labelformat = [floatformat ' '];
fprintf(f,'@data\n');
for i = 1 : row
Xi = matdata(i,1:col-1);
s = sprintf(labelformat,Y(i));
s = [sprintf([floatformat ' '],[; Xi]) s];
fprintf(f,'%s\n',s);
end;
fclose(f);
最後給出關於weka資料處理的簡明介紹。
資料探勘簡述和weka介紹–資料探勘學習和weka使用(一)
輸入資料與ARFF檔案–資料探勘學習和weka使用(二)
簡單總結一下:
weka中的arff格式資料是由兩部分組成:頭部定義和資料區。
頭部定義包含了關係名稱(relation name)、一些屬性(attributes)和對應的型別,如
@RELATION iris
@ATTRIBUTE sepallength NUMERIC
@ATTRIBUTE sepalwidth NUMERIC
@ATTRIBUTE petallength NUMERIC
@ATTRIBUTE petalwidth NUMERIC
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
NUMERIC說明其為數字型,屬性class的取值是限定的,只能是Iris-setosa,Iris-versicolor,Iris-virginica中的一個。資料型別還可以是string和data資料區有@data開頭,如:
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
因此,完整的一個arff檔案如下:
@RELATION iris
@ATTRIBUTE sepallength NUMERIC
@ATTRIBUTE sepalwidth NUMERIC
@ATTRIBUTE petallength NUMERIC
@ATTRIBUTE petalwidth NUMERIC
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
weka使用自己的檔案格式,叫做ARFF,如果想從*matlab和Weka之間相互轉換,這裡有現成的package*:
不要以為下載下來就能用,你會在如下地方報錯:
if(~wekaPathCheck),wekaOBJ = []; return,end
import weka.core.converters.ArffLoader;
import java.io.File;
Tricky的事情就是得把weka.jar加入到matlab的classpath.txt列表。classpath.txt在哪兒?到matlab的command視窗敲:
which classpath.txt
D:\CMWang\MATLABR2014b\toolbox\local\classpath.txt
然後就是到classpath.txt里加入一行,weka.jar的絕對路徑,例如:
C:\Program Files\Weka-3-8 \weka.jar