1. 程式人生 > >利用supercsv讀寫CSV、TSV檔案

利用supercsv讀寫CSV、TSV檔案

先簡單介紹下CSV和TSV檔案的區別:

專案需要把原有的tsv檔案資料整理一下形成更方便使用的新tsv檔案(加幾列)。涉及到tsv檔案的讀寫。其實自己實現也是很簡單的功能,不過正好有現成的工具包supercsv,就拿來用用試試。 官網地址:http://supercsv.sourceforge.net/index.html

文件可以說是清晰明瞭,網上其實也有不少用supercsv解析csv檔案的例子,不過從tsv和csv的區別就可以看出,完全一套程式碼是可以解決的,只要換個分隔符就好餓了。supercsv裡,也確實做到了。 先附上官網的例子:http://supercsv.sourceforge.net/examples_reading.html

待解析的csv檔案:

customerNo,firstName,lastName,birthDate,mailingAddress,married,numberOfKids,favouriteQuote,email,loyaltyPoints 1,John,Dunbar,13/06/1945,"1600 Amphitheatre Parkway Mountain View, CA 94043 United States",,,"""May the Force be with you."" - Star Wars",[email protected],0 2,Bob,Down,25/02/1919,"1601 Willow Rd. Menlo Park, CA 94025 United States",Y,0,"""Frankly, my dear, I don't give a damn."" - Gone With The Wind",
[email protected]
,123456 3,Alice,Wunderland,08/08/1985,"One Microsoft Way Redmond, WA 98052-6399 United States",Y,0,"""Play it, Sam. Play ""As Time Goes By."""" - Casablanca",[email protected],2255887799 4,Bill,Jobs,10/07/1973,"2701 San Tomas Expressway Santa Clara, CA 95050 United States",Y,3,"""You've got to ask yourself one question: ""Do I feel lucky?"" Well, do ya, punk?"" - Dirty Harry",
[email protected]
,36

利用MapReader方式解析的程式碼:

/**
 * An example of reading using CsvMapReader.
 */private static void readWithCsvMapReader() throws Exception {
        
        ICsvMapReader mapReader = null;
        try {
                mapReader = new CsvMapReader(new FileReader(CSV_FILENAME), CsvPreference.STANDARD_PREFERENCE);
                
                // the header columns are used as the keys to the Map
                final String[] header = mapReader.getHeader(true);
                final CellProcessor[] processors = getProcessors();
                
                Map<String, Object> customerMap;
                while( (customerMap = mapReader.read(header, processors)) != null ) {
                        System.out.println(String.format("lineNo=%s, rowNo=%s, customerMap=%s", mapReader.getLineNumber(),
                                mapReader.getRowNumber(), customerMap));
                }
                
        }
        finally {
                if( mapReader != null ) {
                        mapReader.close();
                }
        }}

/**
 * Sets up the processors used for the examples. There are 10 CSV columns, so 10 processors are defined. Empty
 * columns are read as null (hence the NotNull() for mandatory columns).
 * 
 * @return the cell processors
 */private static CellProcessor[] getProcessors() {
        
        final String emailRegex = "[a-z0-9\\._][email protected][a-z0-9\\.]+"; // just an example, not very robust!
        StrRegEx.registerMessage(emailRegex, "must be a valid email address");
        
        final CellProcessor[] processors = new CellProcessor[] { 
                new UniqueHashCode(), // customerNo (must be unique)
                new NotNull(), // firstName
                new NotNull(), // lastName
                new ParseDate("dd/MM/yyyy"), // birthDate
                new NotNull(), // mailingAddress
                new Optional(new ParseBool()), // married
                new Optional(new ParseInt()), // numberOfKids
                new NotNull(), // favouriteQuote
                new StrRegEx(emailRegex), // email
                new LMinMax(0L, LMinMax.MAX_LONG) // loyaltyPoints
        };
        
        return processors;}

樣例的程式碼恐怕清楚的不能再清楚了。只需要解釋一點,分隔符是通過CsvPreference.STANDARD_PREFERENCE設定的。如果想要解析TSV檔案,只需要將這裡換成CsvPreference TAB_PREFERENCE即可。

附個原始碼吧:

/**
      * Ready to use configuration that should cover 99% of all usages.
      */
      public static final CsvPreference STANDARD_PREFERENCE = new CsvPreference.Builder('"' , ',',"\r\n").build();
     
      /**
      * Ready to use configuration for Windows Excel exported CSV files.
      */
      public static final CsvPreference EXCEL_PREFERENCE = new CsvPreference.Builder('"' , ',' , "\n").build();
     
      /**
      * Ready to use configuration for north European excel CSV files (columns are separated by ";" instead of ",")
      */
      public static final CsvPreference EXCEL_NORTH_EUROPE_PREFERENCE = new CsvPreference.Builder('"' , ';' , "\n" ).build();
     
      /**
      * Ready to use configuration for tab -delimited files.
      */
      public static final CsvPreference TAB_PREFERENCE = new CsvPreference.Builder( '"', '\t', "\n").build();