1. 程式人生 > >【推薦演算法】協同過濾演算法——基於使用者 Java實現

【推薦演算法】協同過濾演算法——基於使用者 Java實現

基本概念就不過多介紹了,相信能看明白的都瞭解。如果想了解相關推薦先做好知識儲備:
1.什麼事推薦演算法
2.什麼是基於鄰域的推薦演算法

筆者選用的是GroupLens的MoviesLens資料
傳送門GroupLens

資料集處理

此處擷取資料 UserId + MovieId 作為隱反饋資料。個人的實現方式並不是很好,之後再考慮優化,如果有好的想法歡迎小紙條。
基本設定專案結構如下:


    /project
        /analyzer --推薦分析
            -CollaborativeFileringanalyzer
        /bean --資料元組
-BasicBean -HabitsBean /input --輸入設定 -ReaderFormat /recommender --推薦功能 -UserRecommender

首先思路是擷取MovieLens資料,轉化為格式化的書籍格式。MovieLens資料基本格式為

| user id | item id | rating | timestamp |

讀取後的資料為表結構,實際可以用 Map 或者 二維陣列 進行儲存。
考慮到之後轉化的問題,決定用二維陣列。

設定BasicBean用於儲存表結構中的行,主要設定List < String >用於儲存一行資料中的單項資料

    /**
     * A row of data sets describes in witch the parameters are included.
     * 
     * @author wqd 
     * 2016/01/18
     */
    public class BasicBean {
        private List<String> parameters;
    //  private int num;
        private
boolean tableHead; ///Default constructor,the row set n floders and is or not a table head public BasicBean(boolean head) { parameters = new ArrayList<String>(); this.tableHead = head; } //Default constructor,the row set table head and how much the row //set is defined by the variable parameters,it isn't a table head public BasicBean(String... strings) { this(false, strings); } //Default constructor,the row set table head and how much the row //set is defined by the variable parameters and is or not a table head public BasicBean(boolean head, String... strings) { parameters = new ArrayList<String>(); for(String string : strings) { parameters.add(string); } // this.num = parameters.size(); this.tableHead = head; } public int add(String param) { parameters.add(param); return this.getSize(); } //replace a parameter value pointed to a new value //If success,return true.If not,return false. public boolean set(int index, String param) { if(index < this.getSize()) parameters.set(index, param); else return false; return true; } //Get the head.If it has table head,return ture. //If not,return flase; public boolean isHead() { return tableHead; } //Override toString() public String toString() { StringBuilder str = new StringBuilder(" "); int len = 1; for (String string : parameters) { str.append("\t|" + string); if(len++ % 20 == 0) str.append("\n"); } return str.toString(); } //Get number of parameters public int getSize() { return parameters.size(); } //Get array public List<String> getArray() { return this.parameters; } //Get ID of a set public int getId() { return this.getInt(0); } public String getString(int index) { return parameters.get(index); } public int getInt(int index) { return Integer.valueOf(parameters.get(index)); } public boolean getBoolean(int index) { return Boolean.valueOf(parameters.get(index)); } public float getFloat(int index) { return Float.valueOf(parameters.get(index)); } }

在原資料讀取之後,資料處理的話效率還是比較差,冗餘欄位比較多,因為一個使用者會對多個電影反饋資料。因此,將
| user id | item id | rating | timestamp |
=>
| user id | item id 1 | item id 2 | item id 3 | item id 4 …

這邊設定HabitsBean用於儲存,單獨將id進行抽取,直接儲存在Bean中。實際在list中,儲存user item ids,原因是在之後進行操作時,ID操作頻繁。

public class HabitsBean extends BasicBean {
    private int id ;

    //get the ID
    public int getId() {
        return id;
    }

    //set the ID
    public void setId(int id) {
        this.id = id;
    }

    public HabitsBean() {
        this(-1);
    }

    //default id is -1,it means the id hadn't been evaluated
    public HabitsBean(int id) {
        this.id = id;
    }

    //Override Object toString() method
    public String toString() {
        StringBuilder str = new StringBuilder("HabitBean " + this.id + " :");
        str.append(super.toString());
        return str.toString();
    }

}

將元組資料讀取之後,再將元組資料進行壓縮重組,轉化為方便與處理的資料格式。設定ReaderFormat進行處理,Demo如下:

/**
 * This class for reading training and test files.It can 
 * be suitable for Grouplens and other data sets.
 * @author wqd
 *
 */
public class ReaderFormat {
    List<BasicBean> lists;
    List<HabitsBean> formLists;

    public List<BasicBean> read (String filePath) throws IOException {
        @SuppressWarnings("resource")
        BufferedReader in = new BufferedReader(
                new FileReader(filePath));
        String s;
        BasicBean basicBean = null;
        lists = new ArrayList<BasicBean>();
        while((s = in.readLine()) != null) {
//          System.out.println(s);
            String[] params = s.split("\t");

//          for (String string : params) {
//              System.out.println(string);
//          }

            basicBean = new BasicBean(params);
            lists.add(basicBean);
        }
        return lists;
    }

    //combine user log like | userID | habitID | ...
    //to userID and | habitID1 | habitID2 | habitID3 | ...
    //sort the userID
    public List<HabitsBean> formateLogUser(String filePath) throws IOException {
        lists = this.read(filePath);
        formLists = new LinkedList<HabitsBean>();
        HabitsBean row = null;
        for (BasicBean basicBean : lists) {
            if(basicBean.) {
                row = new HabitsBean(1);
                row.setId(basicBean.getInt(0));
                row.add(basicBean.getString(1));
                formLists.add(row);
            } else {
                this.addBinarySerch(formLists, basicBean);
            }
        }
        return formLists;
    }

    //binary serch
    private void addBinarySerch(List<HabitsBean> lists, BasicBean bean) {
        int start = 0;
        int end = lists.size()-1;
        int pointer = (start + end + 1) / 2;
        HabitsBean row = lists.get(pointer);
        while(start <= end) {
            if(row.getId() == bean.getId()) {
                row.add(bean.getString(1));
                lists.set(pointer, row);
                return ;
            } else if(start == end) {
                break;
            }else if(row.getId() > bean.getId()) {

                end = pointer;
            } else if(row.getId() < bean.getId()) {
                start = pointer;
            }
            pointer = (start + end + 1) / 2;
            row = lists.get(pointer);
        }
        HabitsBean newBean = new HabitsBean(bean.getId());
        newBean.add(bean.getString(1));
        lists.add(newBean);
        return ;
    }


    // test
    public static void main(String[] args) {
        ReaderFormat readerFormat = new ReaderFormat();
        try {
            List<HabitsBean> lists = readerFormat.formateLogUser("E:/WorkSpace/Input/ml-100k/u1.base");
            for (HabitsBean habitsBean : lists) {
                System.out.println(habitsBean.toString());
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

推薦演算法

協同過濾演算法的核心思想是根據使用者間的相似度,來進行推薦。
N(u),N(v)表示u,v使用者有過隱性反饋的集合,Jaccard公式
Jaccard公式
或者採用餘弦相似度
餘弦相似度