Poi解析對比excel表格
##前言
這次不是Android的技術分享,是java的,當然把poi的程式碼放到Android中也可以用,畢竟同源嘛
為啥會有這個文章呢,因為我老婆是會計嘛,她有時候會讓我幫忙對賬,兩個excel檔案,順序也不同,需要我來對比出哪裡有問題,也就是數不太對應,我想了一下,如果好幾百個甚至幾千個數字來對賬,那我豈不是眼睛都花了,這樣我哪裡還有時間去happy愉快的擼程式碼了?
作為一個程式設計師,我們要解放自己的眼睛,去做一些有意義的事情!
##開發環境
Intellij Idea+maven
pom檔案
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.kikt</groupId>
< artifactId>ExcelDemo</artifactId>
<version>1.0-SNAPSHOT</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.15-beta2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.15-beta2</version>
</dependency>
</dependencies>
</project>
引入了poi的解析庫的兩個檔案
###結構
首先是poi對於excel的結構分析
WorkBook->Sheet->Row->Cell
放在wps/excel中看,WorkBook對應的是工作簿,Sheet是表,Row顧名思義是行,Cell是單元格
有了這個基礎,我們繼續去看
##獲取資料
###得到sheet表
要想拿到資料,首先需要先把工作簿拿到,然後拿到Sheet
首先拿到WorkBook
File file = new File(path);
FileInputStream is = new FileInputStream(file);
Workbook sheets = WorkbookFactory.create(is);
這裡path是檔案對應的路徑
我們這裡建一個Utils檔案用於操作這樣的重複資料
ExcelUtils.java
public class ExcelUtils {
private ExcelUtils() {
}
public static Sheet getSheet(String path, int sheetPosition) throws IOException, InvalidFormatException {
File file = new File(path);
FileInputStream is = new FileInputStream(file);
Workbook sheets = WorkbookFactory.create(is);
return sheets.getSheetAt(sheetPosition);
}
public static Sheet getSheet(String path, String sheetName) throws IOException, InvalidFormatException {
File file = new File(path);
FileInputStream is = new FileInputStream(file);
Workbook sheets = WorkbookFactory.create(is);
return sheets.getSheet(sheetName);
}
}
兩個方法分別使用表格的名字/序號獲取
position從0開始,這裡為了處理有可能數十個sheet的情況,所以增加了一個用名稱獲取的方法
Sheet的宣告
public interface Sheet extends Iterable<Row>
Sheet是一個介面,繼承Iterable,所以可以知道這裡的實現類一定實現了Iterable介面
可以用foreach迴圈來遍歷Sheet得到Row
###得到Cell
public interface Row extends Iterable<Cell>
Row同樣如此,可以通過foreach迴圈得到Cell,這樣可以每個單元格的遍歷
Cell getCell(int var1);
Row中有一個方法,根據一個int值得到對應的Cell
這個方法從0開始,這裡就涉及到一個問題,Excel的列標是字母形式,而不是數字,需要轉化一下
這裡寫了一個小演算法
private static Map<Integer, Integer> columnMap = new HashMap<>();
private static int getColumnLength(int length) {
Integer columnLength = columnMap.get(length);
if (columnLength == null) {
columnMap.put(length, (int) Math.pow(26, length));
} else {
return columnLength;
}
return getColumnLength(length);
}
/**
* @param columnLetter 列的字母
* @return 列對應的數字
*/
public static int getColumnNumber(String columnLetter) {
if (columnLetter == null) {
throw new RuntimeException("列號不能為空");
}
columnLetter = columnLetter.toLowerCase();
int letterLength = columnLetter.length();
if (letterLength == 1) {
char letter = columnLetter.charAt(0);
return letter - 97;
} else {
Integer length =getColumnLength(letterLength - 1);
return (getColumnNumber(columnLetter.charAt(0) + "")+1)*length+getColumnNumber(columnLetter.substring(1));
}
}
可以將AA、CA之類的列號轉為對應的數字
PS:題外話,這裡推薦下Sedgewick的《演算法》一書,最近重新研讀了下,雖然都是基礎,但是基礎的牢靠對於演算法有很大的幫助
###正式開始編碼的準備工作
這裡是對應的兩個表的截圖,這裡我給隱私部位打了些馬賽克
金額之類的可以看到
我們要對比的就是圖1的F列和圖2的H列
String recordFilePath = "H:\\1.xls";
Sheet recordSheet = ExcelUtils.getSheet(recordFilePath, 0);
List<RecordBean> recordBeanList = getRecordList(recordSheet, "a", "f");
String invoiceFilePath = "2.xls";
Sheet invoiceSheet = ExcelUtils.getSheet(invoiceFilePath, "外地預交增值稅及附加稅");
List<InvoiceBean> invoiceBeanList = getInvoiceList(invoiceSheet, "a", "i");
這裡我首先通過util的方法獲取到了sheet表,然後將需要解析的列號寫入方法內
然後獲取到了對應的List集合
bean實體
package excel.bean;
/**
* Created by kikt on 2017/2/26.
* 記賬資訊
*/
public class RecordBean extends NumberBean{
private int index;
private double number;
public int getIndex() {
return index;
}
public void setIndex(int index) {
this.index = index;
}
public double getNumber() {
return number;
}
public void setNumber(double number) {
this.number = number;
}
@Override
public String toString() {
return "RecordBean{" +
"index=" + index +
", number=" + number +
'}';
}
}
package excel.bean;
/**
* Created by kikt on 2017/2/26.
*/
public class NumberBean {
private int numberIndex;
public int getNumberIndex() {
return numberIndex;
}
public void setNumberIndex(int numberIndex) {
this.numberIndex = numberIndex;
}
}
獲取list的方法
private static List<RecordBean> getRecordList(Sheet recordSheet, String indexLetter, String numberLetter) {
List<RecordBean> list = new ArrayList<>();
for (Row cells : recordSheet) {
RecordBean bean = new RecordBean();
Cell indexCell = cells.getCell(ExcelUtils.getColumnNumber(indexLetter));
if (indexCell == null || indexCell.getCellType() != Cell.CELL_TYPE_NUMERIC) {
continue;
}
double numericCellValue = indexCell.getNumericCellValue();
bean.setIndex((int) numericCellValue);
int columnNumber = ExcelUtils.getColumnNumber(numberLetter);
bean.setNumberIndex(columnNumber);
bean.setNumber(cells.getCell(columnNumber).getNumericCellValue());
list.add(bean);
}
return list;
}
另一個大致相同,這裡不貼了
然後通過一個compare方法比較一下
private static List<InvoiceBean> compareList(List<RecordBean> recordBeanList, List<InvoiceBean> invoiceBeanList) {
List<InvoiceBean> unMarkBeanList = new ArrayList<>();
for (int i = recordBeanList.size() - 1; i >= 0; i--) {
RecordBean recordBean = recordBeanList.get(i);
for (int j = 0; j < invoiceBeanList.size(); j++) {
InvoiceBean invoiceBean = invoiceBeanList.get(j);
if (recordBean.getNumber() == invoiceBean.getNumber()) {
invoiceBeanList.remove(invoiceBean);
recordBeanList.remove(recordBean);
break;
}
}
}
unMarkBeanList.addAll(invoiceBeanList);
return unMarkBeanList;
}
將相同的移除掉,剩餘的就是不同的
##儲存結果
這裡光有比對結果不行,還需要修改表格,將不同的標記出來,以備人工查賬
###儲存sheet的方法
public static void saveWorkbook(String path, Workbook workbook) throws IOException {
File file = new File(path);
workbook.write(new FileOutputStream(file));
}
public static void backupSheet(String path, Workbook workbook) throws IOException {
File file = new File(path);
String name = file.getName();
String newPath = file.getParentFile().getAbsolutePath() + "\\backup\\";
String newName = newPath + name + "_" + TimeUtils.getTimeString() + ".bak";
File newFile = new File(newName);
newFile.getParentFile().mkdirs();
newFile.createNewFile();
workbook.write(new FileOutputStream(newFile));
}
public static void saveSheet(Sheet sheet, String path) throws IOException {
Workbook workbook = sheet.getWorkbook();
saveWorkbook(path, workbook);
}
package excel.utils;
import java.text.SimpleDateFormat;
import java.util.Date;
/**
* Created by kikt on 2017/2/26.
*/
public class TimeUtils {
}public static String getTimeString() {
SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd_HHmmss");
return sdf.format(new Date());
}
核心儲存的方法是workbook.write(OutputStream)方法,簡單封裝了一下,saveSheet()也是封裝,引數不同,這裡還有一個備份的方法,可以大概看看,簡單的說就是修改檔名,加時間戳.bak字尾,儲存成檔案
###修改樣式
儲存和備份檔案說完了,這裡還需要修改下樣式,不然誰知道你最後查出了什麼
private static void setStyle(Sheet invoiceSheet, int index, int numberIndex) {
for (Row cells : invoiceSheet) {
Cell cell = cells.getCell(ExcelUtils.getColumnNumber("a"));
if (cell != null && cell.getCellType() == Cell.CELL_TYPE_NUMERIC) {
if (index == cell.getNumericCellValue()) {
Cell numberCell = cells.getCell(numberIndex);
CellStyle cellStyle = invoiceSheet.getWorkbook().createCellStyle();
cellStyle.setFillPattern(HSSFCellStyle.SOLID_FOREGROUND);
cellStyle.setFillForegroundColor(HSSFColor.RED.index);
numberCell.setCellStyle(cellStyle);
}
}
}
}
這裡沒有封裝,只是簡單的修改了下
核心程式碼是
CellStyle cellStyle = invoiceSheet.getWorkbook().createCellStyle();//建立一個新單元格樣式
cellStyle.setFillPattern(HSSFCellStyle.SOLID_FOREGROUND);//填充方式是前景色
cellStyle.setFillForegroundColor(HSSFColor.RED.index);//設定前景色為紅色
numberCell.setCellStyle(cellStyle);//將單元格的樣式改為新建的樣式
到這裡簡單的修改樣式就結束了,只要在這之後儲存workbook就可以了
##結語
這篇文章主要是解析和簡單的修改,後面可能會涉及到生成檔案,到時候再寫篇文章吧