1. 程式人生 > >Poi解析對比excel表格

Poi解析對比excel表格

##前言
這次不是Android的技術分享,是java的,當然把poi的程式碼放到Android中也可以用,畢竟同源嘛

為啥會有這個文章呢,因為我老婆是會計嘛,她有時候會讓我幫忙對賬,兩個excel檔案,順序也不同,需要我來對比出哪裡有問題,也就是數不太對應,我想了一下,如果好幾百個甚至幾千個數字來對賬,那我豈不是眼睛都花了,這樣我哪裡還有時間去happy愉快的擼程式碼了?
作為一個程式設計師,我們要解放自己的眼睛,去做一些有意義的事情!

##開發環境
Intellij Idea+maven

pom檔案
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.kikt</groupId> <
artifactId
>
ExcelDemo</artifactId> <version>1.0-SNAPSHOT</version> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId>
<configuration> <source>1.7</source> <target>1.7</target> </configuration> </plugin> </plugins> </build> <dependencies> <!-- https://mvnrepository.com/artifact/org.apache.poi/poi --> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>3.15-beta2</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml --> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>3.15-beta2</version> </dependency> </dependencies> </project>

引入了poi的解析庫的兩個檔案

###結構
首先是poi對於excel的結構分析
WorkBook->Sheet->Row->Cell
放在wps/excel中看,WorkBook對應的是工作簿,Sheet是表,Row顧名思義是行,Cell是單元格

有了這個基礎,我們繼續去看

##獲取資料

###得到sheet表
要想拿到資料,首先需要先把工作簿拿到,然後拿到Sheet

首先拿到WorkBook

		File file = new File(path);
        FileInputStream is = new FileInputStream(file);
        Workbook sheets = WorkbookFactory.create(is);

這裡path是檔案對應的路徑

我們這裡建一個Utils檔案用於操作這樣的重複資料
ExcelUtils.java

public class ExcelUtils {
    private ExcelUtils() {
    }
    public static Sheet getSheet(String path, int sheetPosition) throws IOException, InvalidFormatException {
        File file = new File(path);
        FileInputStream is = new FileInputStream(file);
        Workbook sheets = WorkbookFactory.create(is);
        return sheets.getSheetAt(sheetPosition);
    }

    public static Sheet getSheet(String path, String sheetName) throws IOException, InvalidFormatException {
        File file = new File(path);
        FileInputStream is = new FileInputStream(file);
        Workbook sheets = WorkbookFactory.create(is);
        return sheets.getSheet(sheetName);
    }
}

兩個方法分別使用表格的名字/序號獲取
position從0開始,這裡為了處理有可能數十個sheet的情況,所以增加了一個用名稱獲取的方法

Sheet的宣告

public interface Sheet extends Iterable<Row> 

Sheet是一個介面,繼承Iterable,所以可以知道這裡的實現類一定實現了Iterable介面
可以用foreach迴圈來遍歷Sheet得到Row

###得到Cell

public interface Row extends Iterable<Cell>

Row同樣如此,可以通過foreach迴圈得到Cell,這樣可以每個單元格的遍歷

 Cell getCell(int var1);

Row中有一個方法,根據一個int值得到對應的Cell
這個方法從0開始,這裡就涉及到一個問題,Excel的列標是字母形式,而不是數字,需要轉化一下
這裡寫了一個小演算法

    private static Map<Integer, Integer> columnMap = new HashMap<>();

    private static int getColumnLength(int length) {
        Integer columnLength = columnMap.get(length);
        if (columnLength == null) {
            columnMap.put(length, (int) Math.pow(26, length));
        } else {
            return columnLength;
        }
        return getColumnLength(length);
    }

    /**
     * @param columnLetter 列的字母
     * @return 列對應的數字
     */
    public static int getColumnNumber(String columnLetter) {
        if (columnLetter == null) {
            throw new RuntimeException("列號不能為空");
        }
        columnLetter = columnLetter.toLowerCase();
        int letterLength = columnLetter.length();
        if (letterLength == 1) {
            char letter = columnLetter.charAt(0);
            return letter - 97;
        } else {
            Integer length =getColumnLength(letterLength - 1);
            return (getColumnNumber(columnLetter.charAt(0) + "")+1)*length+getColumnNumber(columnLetter.substring(1));
        }
    }

可以將AA、CA之類的列號轉為對應的數字
PS:題外話,這裡推薦下Sedgewick的《演算法》一書,最近重新研讀了下,雖然都是基礎,但是基礎的牢靠對於演算法有很大的幫助

###正式開始編碼的準備工作

這裡是對應的兩個表的截圖,這裡我給隱私部位打了些馬賽克



金額之類的可以看到
我們要對比的就是圖1的F列和圖2的H列

String recordFilePath = "H:\\1.xls";
        Sheet recordSheet = ExcelUtils.getSheet(recordFilePath, 0);
        List<RecordBean> recordBeanList = getRecordList(recordSheet, "a", "f");

        String invoiceFilePath = "2.xls";
        Sheet invoiceSheet = ExcelUtils.getSheet(invoiceFilePath, "外地預交增值稅及附加稅");
        List<InvoiceBean> invoiceBeanList = getInvoiceList(invoiceSheet, "a", "i");

這裡我首先通過util的方法獲取到了sheet表,然後將需要解析的列號寫入方法內
然後獲取到了對應的List集合

bean實體

package excel.bean;

/**
 * Created by kikt on 2017/2/26.
 * 記賬資訊
 */
public class RecordBean  extends NumberBean{
    private int index;
    private double number;

    public int getIndex() {
        return index;
    }

    public void setIndex(int index) {
        this.index = index;
    }

    public double getNumber() {
        return number;
    }

    public void setNumber(double number) {
        this.number = number;
    }

    @Override
    public String toString() {
        return "RecordBean{" +
                "index=" + index +
                ", number=" + number +
                '}';
    }
}

package excel.bean;

/**
 * Created by kikt on 2017/2/26.
 */
public class NumberBean {
    private int numberIndex;

    public int getNumberIndex() {
        return numberIndex;
    }

    public void setNumberIndex(int numberIndex) {
        this.numberIndex = numberIndex;
    }
}

獲取list的方法

private static List<RecordBean> getRecordList(Sheet recordSheet, String indexLetter, String numberLetter) {
        List<RecordBean> list = new ArrayList<>();
        for (Row cells : recordSheet) {
            RecordBean bean = new RecordBean();
            Cell indexCell = cells.getCell(ExcelUtils.getColumnNumber(indexLetter));
            if (indexCell == null || indexCell.getCellType() != Cell.CELL_TYPE_NUMERIC) {
                continue;
            }
            double numericCellValue = indexCell.getNumericCellValue();
            bean.setIndex((int) numericCellValue);
            int columnNumber = ExcelUtils.getColumnNumber(numberLetter);
            bean.setNumberIndex(columnNumber);
            bean.setNumber(cells.getCell(columnNumber).getNumericCellValue());
            list.add(bean);
        }

        return list;
    }

另一個大致相同,這裡不貼了

然後通過一個compare方法比較一下

private static List<InvoiceBean> compareList(List<RecordBean> recordBeanList, List<InvoiceBean> invoiceBeanList) {
        List<InvoiceBean> unMarkBeanList = new ArrayList<>();

        for (int i = recordBeanList.size() - 1; i >= 0; i--) {
            RecordBean recordBean = recordBeanList.get(i);
            for (int j = 0; j < invoiceBeanList.size(); j++) {
                InvoiceBean invoiceBean = invoiceBeanList.get(j);
                if (recordBean.getNumber() == invoiceBean.getNumber()) {
                    invoiceBeanList.remove(invoiceBean);
                    recordBeanList.remove(recordBean);
                    break;
                }
            }
        }

        unMarkBeanList.addAll(invoiceBeanList);

        return unMarkBeanList;
    }

將相同的移除掉,剩餘的就是不同的

##儲存結果
這裡光有比對結果不行,還需要修改表格,將不同的標記出來,以備人工查賬

###儲存sheet的方法

 public static void saveWorkbook(String path, Workbook workbook) throws IOException {
        File file = new File(path);
        workbook.write(new FileOutputStream(file));
    }

    public static void backupSheet(String path, Workbook workbook) throws IOException {
        File file = new File(path);
        String name = file.getName();
        String newPath = file.getParentFile().getAbsolutePath() + "\\backup\\";
        String newName = newPath + name + "_" + TimeUtils.getTimeString() + ".bak";
        File newFile = new File(newName);
        newFile.getParentFile().mkdirs();
        newFile.createNewFile();
        workbook.write(new FileOutputStream(newFile));
    }

    public static void saveSheet(Sheet sheet, String path) throws IOException {
        Workbook workbook = sheet.getWorkbook();
        saveWorkbook(path, workbook);
    }
package excel.utils;

import java.text.SimpleDateFormat;
import java.util.Date;

/**
 * Created by kikt on 2017/2/26.
 */
public class TimeUtils {
    
}public static String getTimeString() {
    SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd_HHmmss");
    return sdf.format(new Date());
}


核心儲存的方法是workbook.write(OutputStream)方法,簡單封裝了一下,saveSheet()也是封裝,引數不同,這裡還有一個備份的方法,可以大概看看,簡單的說就是修改檔名,加時間戳.bak字尾,儲存成檔案

###修改樣式
儲存和備份檔案說完了,這裡還需要修改下樣式,不然誰知道你最後查出了什麼

 private static void setStyle(Sheet invoiceSheet, int index, int numberIndex) {
        for (Row cells : invoiceSheet) {
            Cell cell = cells.getCell(ExcelUtils.getColumnNumber("a"));
            if (cell != null && cell.getCellType() == Cell.CELL_TYPE_NUMERIC) {
                if (index == cell.getNumericCellValue()) {
                    Cell numberCell = cells.getCell(numberIndex);
                    CellStyle cellStyle = invoiceSheet.getWorkbook().createCellStyle();
                    cellStyle.setFillPattern(HSSFCellStyle.SOLID_FOREGROUND);
                    cellStyle.setFillForegroundColor(HSSFColor.RED.index);
                    numberCell.setCellStyle(cellStyle);
                }
            }
        }
    }

這裡沒有封裝,只是簡單的修改了下
核心程式碼是

CellStyle cellStyle = invoiceSheet.getWorkbook().createCellStyle();//建立一個新單元格樣式
cellStyle.setFillPattern(HSSFCellStyle.SOLID_FOREGROUND);//填充方式是前景色
cellStyle.setFillForegroundColor(HSSFColor.RED.index);//設定前景色為紅色
numberCell.setCellStyle(cellStyle);//將單元格的樣式改為新建的樣式

到這裡簡單的修改樣式就結束了,只要在這之後儲存workbook就可以了

##結語
這篇文章主要是解析和簡單的修改,後面可能會涉及到生成檔案,到時候再寫篇文章吧