700字范文,内容丰富有趣,生活中的好帮手!
700字范文 > 超大数据量的xlsx格式的excel文件的读取和解析 解决了POI方式的内存溢出和性能问题

超大数据量的xlsx格式的excel文件的读取和解析 解决了POI方式的内存溢出和性能问题

时间:2019-02-14 07:52:55

相关推荐

超大数据量的xlsx格式的excel文件的读取和解析 解决了POI方式的内存溢出和性能问题

在之前的博文《POI读取并解析xlsx格式的excel文件》中,在小数据量的情况下是可以轻松愉快的处理的,但是当excel文件的数据量达到百万级的时候, InputStreamis=files[i].getInputStream(); XSSFWorkbookxssFWorkbook=newXSSFWorkbook(is);

在02处直接就会内存溢出了。无论怎么抢救都没用,看来得要换一条路走走了。

在不停的Google查资料折腾了两天之后,在POI官网成功的找到了解决方案。此处请允许我稍微吐槽一下POI,有瓶颈问题的解决方案却隐藏的很深,只有一个不起眼的小链接,难道是怕大家都知道有点丢脸?

总结一下该方案的主要原理:超大数据量的excel文件通过页面上传后(nginx的默认最大文件字节要调大一些:client_max_body_size xxxm),后台接收到该文件的对象CommonsMultipartFile。首先获取该文件对象的inputStream,然后用OPCPackage来打开该文件流,将excel在内存中转化为xml格式却不会出现内存溢出的问题,根据该xml的标签就可以识别是格式,标题还是内容。然后在内存中通过POI框架中的XSSFSheetXMLHandler类中的sheetContentsHandler接口来根据上述的标签来解析内容。可以将解析到的内容存入list或者map容器中便于后续的业务处理(纯内容数据,几百万的数据量轻松胜任,亲测不会内存溢出)。当然根据业务需要,需要复写sheetContentsHandler接口的startRow,endRow,cell,headerFooter四大方法。

当然了,笔者也亲测了另一种方案:就是用OPCPackage来打开该文件流并且将excel在内存中转化为xml格式之后,一股脑儿的用缓冲流分批的将所有原excel内容数据写入到本地的txt文件,再去分批的readLine该文件中的数据,进行业务处理。该方案的好处是交易导入的文件可以物理的落地,作为后期查验的依据和凭证。当然了,缺点是多了一次物理转储的过程,耗时会延长。如何选择看个人的业务需求咯。

本文中重点讲述第一种方案吧,话不多说,贴代码:

/*** @return* @throws Exception* @author jason.gao* 功能描述:交易导入*/@RequestMapping(value = "/transDetail/upload", method = {RequestMethod.POST, RequestMethod.GET})@RequestGuard(perm = "transaction.import.upload")public ResponseEntity<ResponseEnvelope<RestApiResp>> uploadFile(@RequestParam("file") CommonsMultipartFile[] files, HttpServletRequest req, HttpServletResponse resp) throws IOException {logger.info("uploadFile == >upload button start; fileName:[{}], CommonsMultipartFile[]:[{}]", files[0].getFileItem().getName(), files);long start = System.currentTimeMillis();String result = "完成交易文件的导入!";if (null == files || files.length != 1) {return RestApiResp.getSuccResponseEntity("必须上传一个文件", null);}//重置buffer,在可能会超时的地方输出resp的字节,避免前端ajax请求断开!resp.setBufferSize(1);ServletOutputStream out = resp.getOutputStream();XlsxProcessAbstract xlsxProcess = new XlsxProcessAbstract();long getFileAndDataTime;ProcessTransDetailDataDto data;try {//获取明细行累积的支付/退款的总金额/总笔数等数据的DTOdata = xlsxProcess.processAllSheet(files[0]);logger.info("汇总行的数据:[{}]", data.dtoToString());//获取汇总行和明细行数据(包含标题)List<String> contentList = data.contentList;logger.info("明细行的数据条数为:[{}]", JSON.toJSONString(contentList.size() - 3));getFileAndDataTime = System.currentTimeMillis();logger.info("获取文件并得到数据完成。耗时:[{}]秒", (getFileAndDataTime - start)/1000);//校验汇总行数据正确性checkDetailSummary(contentList, data, out);logger.info("汇总行数据正确性的校验已通过!");//分批调用OSP插入过程String handleResult = doOspHandle(contentList, data, out);if (!handleResult.equals(TransImportJobStatus.Success.getValue())) {result = TransImportJobStatus.getDescByKey(handleResult);logger.error(result);}} catch (CellDataException e) {logger.error("CellDataException: Error:[{}]", e);return RestApiResp.getSuccResponseEntity(e.getMessage(), null);} catch (OspException e) {logger.error("OspException:[{}]", e);return RestApiResp.getSuccResponseEntity(e.getMessage(), null);} catch (IOException e) {logger.error("IOException:[{}]", e);return RestApiResp.getSuccResponseEntity(e.getMessage(), null);} catch (Exception e) {logger.error("未知异常:[{}]", e);return RestApiResp.getSuccResponseEntity("未知异常,请排查日志:" + e.getMessage(), null);}long finishCheckAndInsertTime = System.currentTimeMillis();logger.info("完成数据校验和数据分批插入。耗时:[{}]秒", (finishCheckAndInsertTime - getFileAndDataTime)/1000);logger.info("[{}],整个后台处理过程共耗时:[{}]秒", result, (finishCheckAndInsertTime - start)/1000);return RestApiResp.getSuccResponseEntity(result, HttpStatus.OK);}

上面代码块是整个后台的主流程,注意的是要充分的捕捉异常,将异常信息呈献给前端页面和日志系统,便于生产故障时排查问题。

接下来的四个代码块是对excel中字段的业务处理,属于业务部分,不关心业务的可以忽略这些代码片段

public String doOspHandle (List<String> contentList, ProcessTransDetailDataDto data, ServletOutputStream out) throws CellDataException, OspException, IOException{// 获取当前工作薄的明细行int start = 3;int size = 1000;String importStatus = "";//分批调用OSP接口执行插入while(start < contentList.size()) {importStatus = handleTransImport(contentList, start, size, data);if (!importStatus.equals(TransImportJobStatus.Success.getValue())) {logger.error("从第[{}]到[{}]行的数据,分批调用OSP接口失败", start + 1, start + size + 1);return importStatus;}start += size;out.write(new String(" ").getBytes());out.flush();}//最终状态:交易全部成功if (importStatus.equals(TransImportJobStatus.Success.getValue())){logger.info("调用“交易明细导入”的OSP接口成功!");TransDetailResp confirmResp;OspTransDetailServiceHelper.OspTransDetailServiceClient ospTransDetailServiceClient = new OspTransDetailServiceHelper.OspTransDetailServiceClient();logger.info("调用“确认交易明细成功”的OSP接口的请求参数:商户号=[{}],结算单号=[{}],总条数=[{}]", data.getMerchantId(), data.getSettleOrderNo(), contentList.size()-3);try{confirmResp = ospTransDetailServiceClient.transDetailConfirm(data.getMerchantId(), data.getSettleOrderNo(), contentList.size()-3);} catch (OspException e) {logger.error("调用“确认交易明细成功”的OSP接口的抛出异常![{}]", e);throw e;} finally {out.write(new String("").getBytes());out.flush();}logger.info("调用“确认交易明细成功”的OSP接口的返回参数为:{}", JSON.toJSONString(confirmResp));if (!confirmResp.getResponseCode().equals(MessageEnum.SUCCESS.getValue())) {throw new OspException(TransImpFileExceptEnums.OspTransDetailConfirm.getValue(), TransImpFileExceptEnums.OspTransDetailConfirm.getDesc());}}return importStatus;}

/*** 调用osp接口:执行交易明细的导入* 返回OSP操作完成的状态*/public String handleTransImport(List<String> contentList, int start, int size, ProcessTransDetailDataDto data) throws CellDataException, OspException{//分批的调用osp接口:执行交易明细的导入OspTransDetailServiceHelper.OspTransDetailServiceClient ospTransDetailServiceClient = new OspTransDetailServiceHelper.OspTransDetailServiceClient();TransDetailResp transDetailResp;List<TransDetailImport> transDetailImportList = new ArrayList<>();//组织好一个list数据:读取从start -> start+size行的数据for (int i = start; i < start + size && i < contentList.size(); i++) {TransDetailImport transDetailImport = new TransDetailImport();String[] detailRow = contentList.get(i).split("\\|@\\|");if (detailRow != null || !detailRow.equals("")) {try {transDetailImport.setMerchantId(data.getMerchantId());transDetailImport.setMerchantName(data.getMerchantName());transDetailImport.setMerchantBatchNo(data.getSettleOrderNo()); //商户批次号transDetailImport.setMerchantBatchSerialNo(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[0], i, 0)); //商户批次序号<来源:页面导入模板中明细行的序号>模板必填transDetailImport.setMerchantOrderNo(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[1], i, 1)); //商户订单号:模板必填transDetailImport.setPlatformOrderNo(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[2], detailRow[1])); //平台订单号(支付退款订单号):如果不送默认商户订单号transDetailImport.setMerchantTransDate(detailRow[4].equals("") ? new Date() : new Date(detailRow[4])); //商户交易日期:如果不送默认上送日期transDetailImport.setTransType(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[5], TransTypeEnums.Payment.getValue())); //交易类型:如果不送默认支付transDetailImport.setOriginOrderNo(XssfCellValueCheckHelper.checkAndGetOriginOrderNo(detailRow[3], transDetailImport.getTransType(), i)); //原支付订单号transDetailImport.setCurrency(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[6], "CNY")); //币种:三位货币代码,如果不送默认CNY:人民币transDetailImport.setAmount(XssfCellValueCheckHelper.getAmount(detailRow[7], i)); //交易金额:外部交易上传金额,内部商户订单金额transDetailImport.setCustomerName(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[9], i, 9)); //客户名称:模板必填transDetailImport.setIdType(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[10], i, 10)); //证件类型:模板必填transDetailImport.setCustomerType(XssfCellValueCheckHelper.checkAndGetCustomerType(detailRow, i, 8)); //客户类型:根据证件类型确定transDetailImport.setIdNo(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[11], i, 11));//证件号码:模板必填transDetailImport.setMoneyType(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[12], MoneyTypeEnums.Currency.getValue())); //款项类型:默认:A 预付款项transDetailImport.setIsPayUnderBonded(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[13], IsPayUnderBondedEnums.YES.getValue())); //是否保税货物项下付款:默认:是transDetailImport.setTradingCode(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[14], TradingCodeEnums.GoodsTrade.getValue())); //交易编码:交易编码 默认:122030货物贸易transDetailImport.setRmbAccount(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[15], "")); //人民币账号transDetailImport.setTrmo(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[16], "一般贸易")); //交易附言:默认"一般贸易"transDetailImport.setProductDesc(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[17], i, 17)); //产品描述transDetailImport.setWaybillNum(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[18], i, 18)); //运单号transDetailImport.setProductNum(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[19], i, 19).trim()); //销售数量transDetailImport.setTransFrom("3"); //交易来源:1支付引擎、2门户导入、3运营控制台导入、4外部商户导入} catch (Exception e) {logger.error("组装数据时检测到数据转换异常,调用“交易明细删除”的OSP接口,将未检测到错误的已插入部分数据全部回滚掉,请求参数为:[{}],[{}],错误详情:[{}]", data.getMerchantId(), data.getSettleOrderNo(), e);TransDetailResp delResp = ospTransDetailServiceClient.transDetailDel(data.getMerchantId(), data.getSettleOrderNo());if (!delResp.getResponseCode().equals(MessageEnum.SUCCESS.getValue())) {throw new OspException(TransImpFileExceptEnums.OspTransDetailDelRrror.getValue(), TransImpFileExceptEnums.OspTransDetailDelRrror.getDesc());}return TransImportJobStatus.InsetFailAndRollBack.getValue();}} else {throw new CellDataException(TransImpFileExceptEnums.EmptyHeadLine.getValue(), TransImpFileExceptEnums.EmptyHeadLine.getDesc());}transDetailImportList.add(transDetailImport);}logger.info("开始调用“导入交易明细”的OSP接口,从第[{}]行到第[{}]行的数据", start, (start + size) < contentList.size() ? (start + size) : contentList.size());try{transDetailResp = ospTransDetailServiceClient.transDetailImport(transDetailImportList);} catch (OspException e) {logger.error("调用“导入交易明细”的OSP接口的抛出异常![{}]", e);logger.info("调用“交易明细删除”的OSP接口的请求参数为:[{}],[{}]", data.getMerchantId(), data.getSettleOrderNo());TransDetailResp delResp = ospTransDetailServiceClient.transDetailDel(data.getMerchantId(), data.getSettleOrderNo());if (!delResp.getResponseCode().equals(MessageEnum.SUCCESS.getValue())) {throw new OspException(TransImpFileExceptEnums.OspTransDetailDelRrror.getValue(), TransImpFileExceptEnums.OspTransDetailDelRrror.getDesc());}return TransImportJobStatus.InsetFailAndRollBack.getValue();}logger.info("调用“导入交易明细”的OSP接口的返回参数为:{}", JSON.toJSONString(transDetailResp));if (!transDetailResp.getResponseCode().equals(MessageEnum.SUCCESS.getValue())) {logger.info("调用“交易明细删除”的OSP接口的请求参数为:[{}],[{}]", data.getMerchantId(), data.getSettleOrderNo());TransDetailResp delResp;try{delResp = ospTransDetailServiceClient.transDetailDel(data.getMerchantId(), data.getSettleOrderNo());} catch (OspException e) {logger.error("调用“交易明细删除”的OSP接口的抛出异常![{}]", e);throw e;}logger.info("调用“交易明细删除”的OSP接口的返回参数为:{}", JSON.toJSONString(delResp));if (!delResp.getResponseCode().equals(MessageEnum.SUCCESS.getValue())) {throw new OspException(TransImpFileExceptEnums.OspTransDetailDelRrror.getValue(), TransImpFileExceptEnums.OspTransDetailDelRrror.getDesc());}return TransImportJobStatus.InsetFailAndRollBack.getValue();}return TransImportJobStatus.Success.getValue();}

/*** 校验汇总行所有必填项*/public void checkHeadNotEmpty(ProcessTransDetailDataDto dataDto) throws CellDataException{if (dataDto.getMerchantId()==null || dataDto.getMerchantId().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(1).getDesc());}if (dataDto.getMerchantName()==null || dataDto.getMerchantName().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(2).getDesc());}if (dataDto.getSettleOrderNo()==null || dataDto.getSettleOrderNo().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(3).getDesc());}if (dataDto.getTotalPaymentCount() == null || dataDto.getTotalPaymentCount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(4).getDesc());}if (dataDto.getTotalPaymentAmount()==null || dataDto.getTotalPaymentAmount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(5).getDesc());}if (dataDto.getTotalRefundCount()==null || dataDto.getTotalRefundCount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(6).getDesc());}if (dataDto.getTotalRefundAmount()==null || dataDto.getTotalRefundAmount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(7).getDesc());}if (dataDto.getNetTotalCount()==null || dataDto.getNetTotalCount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(8).getDesc());}if (dataDto.getNetTotalAmount()==null || dataDto.getNetTotalAmount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(9).getDesc());}}

package com.vip.vpal.mgr.controller;import com.vip.vpal.mgr.enums.IdTypeCustomerTypeEnums;import com.vip.vpal.mgr.enums.TransImpFileExceptEnums;import com.vip.vpal.mgr.enums.TransTypeEnums;import com.vip.vpal.mgr.exception.CellDataException;import org.springframework.beans.factory.annotation.Autowired;/*** Created by jason.gao on /8/11 0011.*/public class XssfCellValueCheckHelper {public static String getStringNotEmpty(String cellValue, int row, int col) throws CellDataException {if (cellValue.equals("")) {throw new CellDataException(TransImpFileExceptEnums.EmptyFieldError.getValue(), TransImpFileExceptEnums.EmptyFieldError.setParams(row+1, col +1).getDesc());}return cellValue;}public static String getRealOrDefaultValue(String cellValue, String defVlaue) {if (cellValue.equals("")) {return defVlaue;}return cellValue;}/*** 校验原支付订单号在退款时必填* 返回原支付订单号*/public static String checkAndGetOriginOrderNo(String cellValue, String transType, int row) throws CellDataException {if (transType.equals(TransTypeEnums.Refund.getValue()) && cellValue.equals("")) {throw new CellDataException(TransImpFileExceptEnums.EmptyFieldError.getValue(), TransImpFileExceptEnums.EmptyFieldError.setParams(row +1, 4).getDesc());}return cellValue; //原支付订单号}/*** 校验证件类型和客户类型的关系一致性* 返回客户类型的值*/public static String checkAndGetCustomerType(String[] cellRow, int row, int cell) throws CellDataException {//校验证件类型是否正确String idType = cellRow[10];String customerTypeByDict = IdTypeCustomerTypeEnums.getDesc(idType);if (customerTypeByDict.equals("")) {throw new CellDataException(TransImpFileExceptEnums.DetailDateError.getValue(), TransImpFileExceptEnums.DetailDateError.setParams(row + 1, 11).getDesc());}//校验客户类型是否与证件类型相匹配String cusTomerType = cellRow[8];//客户类型,可不填,默认为证件类型对应的if (cusTomerType.equals("")) {return customerTypeByDict;}if (!customerTypeByDict.equals(cusTomerType)){throw new CellDataException(TransImpFileExceptEnums.DetailDateError.getValue(), TransImpFileExceptEnums.DetailDateError.setParams(row+1, 9).getDesc());}return cusTomerType;}public static long getAmount(String cellValue, int row) throws CellDataException {if (cellValue.equals("")) {throw new CellDataException(TransImpFileExceptEnums.EmptyFieldError.getValue(), TransImpFileExceptEnums.EmptyFieldError.setParams(row+1, 8).getDesc());}return (long) (new Double(cellValue) * 100);}}

上面四个代码块是主流程对excel中字段的业务处理,其中调用OSP接口(一种rpc)的时候要注意,由于是分布式的远程调用,所以不可以使用事物的失败回滚方案,只能手动的捕捉异常并且手动的调用失败的补偿方法。

下面注意了,是超大数据量的excel文件解析的核心代码了:

package com.vip.vpal.mgr.controller;import java.io.IOException;import java.io.InputStream;import javax.xml.parsers.ParserConfigurationException;import org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable;import org.apache.poi.xssf.eventusermodel.XSSFReader;import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler;import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler;import org.apache.poi.xssf.model.StylesTable;import org.apache.poi.xssf.usermodel.XSSFComment;import org.apache.poi.openxml4j.opc.OPCPackage;import org.apache.poi.openxml4j.opc.PackageAccess;import org.apache.poi.ss.usermodel.DataFormatter;import org.apache.poi.ss.util.CellAddress;import org.apache.poi.ss.util.CellReference;import org.apache.poi.util.SAXHelper;import org.slf4j.Logger;import org.slf4j.LoggerFactory;import org.springframework.monsMultipartFile;import org.xml.sax.ContentHandler;import org.xml.sax.InputSource;import org.xml.sax.SAXException;import org.xml.sax.XMLReader;import com.vip.vpal.mgr.dto.ProcessTransDetailDataDto;/*** Excle xxls 批量读取大文件操作类* */public class XlsxProcessAbstract {private final Logger logger = LoggerFactory.getLogger(XlsxProcessAbstract.class);//开始读取行数从第0行开始计算private int rowIndex = -1;private final int minColumns = 0;/*** Destination for data*/private final StringBuffer rowStrs = new StringBuffer();ProcessTransDetailDataDto processTransDetailData = new ProcessTransDetailDataDto();/*** 支持遍历同一个excle文件下多个sheet的解析* excel记录行操作方法,以行索引和行元素列表为参数,对一行元素进行操作,元素为String类型* @param filename* @return* @throws Exception*/public ProcessTransDetailDataDto processAllSheet(String filename) throws Exception {OPCPackage pkg = OPCPackage.open(filename, PackageAccess.READ);ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(pkg);XSSFReader xssfReader = new XSSFReader(pkg);StylesTable styles = xssfReader.getStylesTable();XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();InputStream stream = null;while (iter.hasNext()) {try {stream = iter.next();parserSheetXml(styles, strings, new SheetToCSV(), stream);} catch (Exception e) {logger.error("parserSheetXml error: ",e);} finally {stream.close();}}return processTransDetailData;}/*** 支持遍历同一个excle文件下多个sheet的解析* excel记录行操作方法,以行索引和行元素列表为参数,对一行元素进行操作,元素为String类型* @param xlsxFile* @return* @throws Exception* @author nevin.zhang*/public ProcessTransDetailDataDto processAllSheet(CommonsMultipartFile xlsxFile) throws Exception {OPCPackage pkg = OPCPackage.open(xlsxFile.getInputStream());ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(pkg);XSSFReader xssfReader = new XSSFReader(pkg);StylesTable styles = xssfReader.getStylesTable();XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();InputStream stream = null;while (iter.hasNext()) {try {stream = iter.next();parserSheetXml(styles, strings, new SheetToCSV(), stream);} catch (Exception e) {logger.error("parserSheetXml error: ",e);} finally {stream.close();}}return processTransDetailData;}/*** 解析excel 转换成xml* * @param styles* @param strings* @param sheetHandler* @param sheetInputStream* @throws IOException* @throws SAXException*/public void parserSheetXml(StylesTable styles, ReadOnlySharedStringsTable strings, SheetContentsHandler sheetHandler, InputStream sheetInputStream) throws IOException, SAXException {DataFormatter formatter = new DataFormatter();InputSource sheetSource = new InputSource(sheetInputStream);try {XMLReader sheetParser = SAXHelper.newXMLReader();ContentHandler handler = new XSSFSheetXMLHandler(styles, null, strings, sheetHandler, formatter, false);sheetParser.setContentHandler(handler);sheetParser.parse(sheetSource);} catch (ParserConfigurationException e) {throw new RuntimeException("SAX parser appears to be broken - " + e);}}/*** 读取excel行、列值* * @author nevin.zhang*/private class SheetToCSV implements SheetContentsHandler {private boolean firstCellOfRow = false;private int currentRowNumber = -1;private int currentColNumber = -1;/*** 处理cell中为空值的情况* @param number*/private void processCellBlankCells(int number) {for (int i = 0; i < number; i++) {for (int j = 0; j < minColumns; j++) {rowStrs.append("|@|");}rowStrs.append('\n');}}@Overridepublic void startRow(int rowNum) {processCellBlankCells(rowNum - currentRowNumber - 1);firstCellOfRow = true;currentRowNumber = rowNum;currentColNumber = -1;}@Overridepublic void endRow(int rowNum) {for (int i = currentColNumber; i < minColumns; i++) {rowStrs.append("|@|");}// 从设置的rowIndex的行数开始加入到list,前三行为标题,多个sheet都从第三行开始读取的数据加入到listString endRowStrs=rowStrs.toString();if(currentRowNumber>rowIndex && !rowStrs.toString().equals("|@|")){processTransDetailData.contentList.add(endRowStrs);}if(!rowStrs.toString().equals("|@|")) {processTransDetailData.processTransTotalData(endRowStrs, currentRowNumber);}rowStrs.delete(0, rowStrs.length());// 清空buffer}@Overridepublic void cell(String cellReference, String cellValue, XSSFComment comment) {if (firstCellOfRow) {firstCellOfRow = false;} else {rowStrs.append("|@|");}if (cellReference == null) {cellReference = new CellAddress(currentRowNumber, currentColNumber).formatAsString();}int thisCol = (new CellReference(cellReference)).getCol();int missedCols = thisCol - currentColNumber - 1;for (int i = 0; i < missedCols; i++) {// excel中为空的值设置为“|@|”rowStrs.append("|@|");}currentColNumber = thisCol;rowStrs.append(cellValue);}@Overridepublic void headerFooter(String text, boolean isHeader, String tagName) {}}}

由于导入的excel文件的格式是:第一行为汇总数据的标题,第二行为汇总数据,第三行为明细行的标题,其余行为明细数据,所以需要一个ProcessTransDetailDataDto实体类

来封装一下。

package com.vip.vpal.mgr.dto;import com.vip.vpal.mgr.enums.TransTypeEnums;import mons.lang.StringUtils;import java.math.BigDecimal;import java.util.ArrayList;import java.util.List;/*** 交易导入明细文件值处理*/public class ProcessTransDetailDataDto{private String merchantId;//商户号private String merchantName;//商户名称private String settleOrderNo;//批次号private String totalPaymentCount; // 支付总笔数private String totalPaymentAmount;// 支付总金额private String totalRefundAmount;// 退款总金额private String totalRefundCount; // 支付退款笔数private String netTotalAmount;// 净金额汇总private String netTotalCount;// 净笔数汇总private int currentRowNumber;private int paymentCount = 0;private BigDecimal paymentAmount = BigDecimal.ZERO;private int refundCount = 0;private BigDecimal refundAmount = BigDecimal.ZERO;private int paymentIndex = 0;private int refundIndex = 0;private int readRowTitleIndex = 1; //读取标题汇总行private int readDetailRowIndex = 2;//读取交易明细行public List<String> contentList = new ArrayList<>();public void processTransTotalData(String rowStrs, int currentRowNumber) {String[] cellStrs = rowStrs.split("\\|@\\|");// 读取第二行汇总行if (currentRowNumber == readRowTitleIndex) {this.setMerchantId(cellStrs[0]);this.setMerchantName(cellStrs[1]);this.setSettleOrderNo(cellStrs[2]);this.setTotalPaymentCount(cellStrs[3]);this.setTotalPaymentAmount(cellStrs[4]);this.setTotalRefundCount(cellStrs[5]);this.setTotalRefundAmount(cellStrs[6]);this.setNetTotalCount(cellStrs[7]);this.setNetTotalAmount(cellStrs[8]);}// 读取交易明细行if (currentRowNumber > readDetailRowIndex) {//原支付订单号不为空则为支付交易if (cellStrs[5].toString().equals(TransTypeEnums.Refund.getValue())) {refundIndex++;this.setRefundCount(refundIndex);// 累积退款笔数this.setRefundAmount(this.getRefundAmount().add(stringToBigDecimal(cellStrs[7])));// 累积退款金额} else if (cellStrs[5].toString().equals("") || cellStrs[5].toString().equals(TransTypeEnums.Payment.getValue())){paymentIndex++;this.setPaymentCount(paymentIndex);this.setPaymentAmount(this.getPaymentAmount().add(stringToBigDecimal(cellStrs[7])));//累加支付金额}}}private static BigDecimal stringToBigDecimal(String str) {if (StringUtils.isBlank(str)) {return BigDecimal.ZERO;}BigDecimal bd = new BigDecimal(str);return bd;}public String dtoToString(){StringBuffer sb = new StringBuffer("");sb.append("商户号="+ this.merchantId);sb.append(", 商户名称="+ this.merchantName);sb.append(", 批次号="+ this.settleOrderNo);sb.append(", 支付总笔数="+ this.totalPaymentCount);sb.append(", 支付总金额="+ this.totalPaymentAmount);sb.append(", 退款总金额="+ this.totalRefundAmount);sb.append(", 支付退款笔数="+ this.totalRefundCount);sb.append(", 净金额汇总="+ TotalAmount);sb.append(", 净笔数汇总="+ TotalCount);return sb.toString();}public String getMerchantId() {return merchantId;}public void setMerchantId(String merchantId) {this.merchantId = merchantId;}public String getMerchantName() {return merchantName;}public void setMerchantName(String merchantName) {this.merchantName = merchantName;}public String getSettleOrderNo() {return settleOrderNo;}public void setSettleOrderNo(String settleOrderNo) {this.settleOrderNo = settleOrderNo;}public String getTotalPaymentCount() {return totalPaymentCount;}public void setTotalPaymentCount(String totalPaymentCount) {this.totalPaymentCount = totalPaymentCount;}public String getTotalPaymentAmount() {return totalPaymentAmount;}public void setTotalPaymentAmount(String totalPaymentAmount) {this.totalPaymentAmount = totalPaymentAmount;}public String getTotalRefundAmount() {return totalRefundAmount;}public void setTotalRefundAmount(String totalRefundAmount) {this.totalRefundAmount = totalRefundAmount;}public String getTotalRefundCount() {return totalRefundCount;}public void setTotalRefundCount(String totalRefundCount) {this.totalRefundCount = totalRefundCount;}public String getNetTotalAmount() {return netTotalAmount;}public void setNetTotalAmount(String netTotalAmount) {TotalAmount = netTotalAmount;}public String getNetTotalCount() {return netTotalCount;}public void setNetTotalCount(String netTotalCount) {TotalCount = netTotalCount;}public int getCurrentRowNumber() {return currentRowNumber;}public void setCurrentRowNumber(int currentRowNumber) {this.currentRowNumber = currentRowNumber;}public int getPaymentCount() {return paymentCount;}public void setPaymentCount(int paymentCount) {this.paymentCount = paymentCount;}public BigDecimal getPaymentAmount() {return paymentAmount;}public void setPaymentAmount(BigDecimal paymentAmount) {this.paymentAmount = paymentAmount;}public int getRefundCount() {return refundCount;}public void setRefundCount(int refundCount) {this.refundCount = refundCount;}public BigDecimal getRefundAmount() {return refundAmount;}public void setRefundAmount(BigDecimal refundAmount) {this.refundAmount = refundAmount;}public List<String> getContentList() {return contentList;}public void setContentList(List<String> contentList) {this.contentList = contentList;}}

上述代码考虑到了原excel文件空行和空值的问题,做了处理。在开发的时候踩了一个坑:将空值的处理成“,”导致了其他数据带逗号的时候,数据和列不匹配的问题,所以本文中就将逗号写成了“|@|”,尽量的避免偶然性。

下面贴上依赖的POI的jar包中的最核心源码,供各位参考:

/* ====================================================================Licensed to the Apache Software Foundation (ASF) under one or morecontributor license agreements. See the NOTICE file distributed withthis work for additional information regarding copyright ownership.The ASF licenses this file to You under the Apache License, Version 2.0(the "License"); you may not use this file except in compliance withthe License. You may obtain a copy of the License at/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.==================================================================== */package org.apache.poi.xssf.eventusermodel;import static org.apache.poi.xssf.usermodel.XSSFRelation.NS_SPREADSHEETML;import java.util.LinkedList;import java.util.Queue;import org.apache.poi.ss.usermodel.BuiltinFormats;import org.apache.poi.ss.usermodel.DataFormatter;import org.apache.poi.ss.util.CellAddress;import org.apache.poi.util.POILogFactory;import org.apache.poi.util.POILogger;import org.apache.poi.mentsTable;import org.apache.poi.xssf.model.StylesTable;import org.apache.poi.xssf.usermodel.XSSFCellStyle;import org.apache.poi.xssf.usermodel.XSSFComment;import org.apache.poi.xssf.usermodel.XSSFRichTextString;import org.openxmlformats.schemas.spreadsheetml.x.main.CTComment;import org.xml.sax.Attributes;import org.xml.sax.SAXException;import org.xml.sax.helpers.DefaultHandler;/*** This class handles the processing of a sheet#.xml * sheet part of a XSSF .xlsx file, and generates* row and cell events for it.*/public class XSSFSheetXMLHandler extends DefaultHandler {private static final POILogger logger = POILogFactory.getLogger(XSSFSheetXMLHandler.class);/*** These are the different kinds of cells we support.* We keep track of the current one between* the start and end.*/enum xssfDataType {BOOLEAN,ERROR,FORMULA,INLINE_STRING,SST_STRING,NUMBER,}/*** Table with the styles used for formatting*/private StylesTable stylesTable;/*** Table with cell comments*/private CommentsTable commentsTable;/*** Read only access to the shared strings table, for looking* up (most) string cell's contents*/private ReadOnlySharedStringsTable sharedStringsTable;/*** Where our text is going*/private final SheetContentsHandler output;// Set when V start element is seenprivate boolean vIsOpen;// Set when F start element is seenprivate boolean fIsOpen;// Set when an Inline String "is" is seenprivate boolean isIsOpen;// Set when a header/footer element is seenprivate boolean hfIsOpen;// Set when cell start element is seen;// used when cell close element is seen.private xssfDataType nextDataType;// Used to format numeric cell values.private short formatIndex;private String formatString;private final DataFormatter formatter;private int rowNum;private int nextRowNum;// some sheets do not have rowNums, Excel can read them so we should try to handle them correctly as wellprivate String cellRef;private boolean formulasNotResults;// Gathers characters as they are seen.private StringBuffer value = new StringBuffer();private StringBuffer formula = new StringBuffer();private StringBuffer headerFooter = new StringBuffer();private Queue<CellAddress> commentCellRefs;/*** Accepts objects needed while parsing.** @param styles Table of styles* @param strings Table of shared strings*/public XSSFSheetXMLHandler(StylesTable styles,CommentsTable comments,ReadOnlySharedStringsTable strings,SheetContentsHandler sheetContentsHandler,DataFormatter dataFormatter,boolean formulasNotResults) {this.stylesTable = styles;mentsTable = comments;this.sharedStringsTable = strings;this.output = sheetContentsHandler;this.formulasNotResults = formulasNotResults;this.nextDataType = xssfDataType.NUMBER;this.formatter = dataFormatter;init();}/*** Accepts objects needed while parsing.** @param styles Table of styles* @param strings Table of shared strings*/public XSSFSheetXMLHandler(StylesTable styles,ReadOnlySharedStringsTable strings,SheetContentsHandler sheetContentsHandler,DataFormatter dataFormatter,boolean formulasNotResults) {this(styles, null, strings, sheetContentsHandler, dataFormatter, formulasNotResults);}/*** Accepts objects needed while parsing.** @param styles Table of styles* @param strings Table of shared strings*/public XSSFSheetXMLHandler(StylesTable styles,ReadOnlySharedStringsTable strings,SheetContentsHandler sheetContentsHandler,boolean formulasNotResults) {this(styles, strings, sheetContentsHandler, new DataFormatter(), formulasNotResults);}private void init() {if (commentsTable != null) {commentCellRefs = new LinkedList<CellAddress>();for (CTComment comment : commentsTable.getCTComments().getCommentList().getCommentArray()) {commentCellRefs.add(new CellAddress(comment.getRef()));}} }private boolean isTextTag(String name) {if("v".equals(name)) {// Easy, normal v text tagreturn true;}if("inlineStr".equals(name)) {// Easy inline stringreturn true;}if("t".equals(name) && isIsOpen) {// Inline string <is><t>...</t></is> pairreturn true;}// It isn't a text tagreturn false;}@Override@SuppressWarnings("unused")public void startElement(String uri, String localName, String qName,Attributes attributes) throws SAXException {if (uri != null && ! uri.equals(NS_SPREADSHEETML)) {return;}if (isTextTag(localName)) {vIsOpen = true;// Clear contents cachevalue.setLength(0);} else if ("is".equals(localName)) {// Inline string outer tagisIsOpen = true;} else if ("f".equals(localName)) {// Clear contents cacheformula.setLength(0);// Mark us as being a formula if not alreadyif(nextDataType == xssfDataType.NUMBER) {nextDataType = xssfDataType.FORMULA;}// Decide where to get the formula string fromString type = attributes.getValue("t");if(type != null && type.equals("shared")) {// Is it the one that defines the shared, or uses it?String ref = attributes.getValue("ref");String si = attributes.getValue("si");if(ref != null) {// This one defines it// TODO Save it somewherefIsOpen = true;} else {// This one uses a shared formula// TODO Retrieve the shared formula and tweak it to // match the current cellif(formulasNotResults) {logger.log(POILogger.WARN, "shared formulas not yet supported!");} else {// It's a shared formula, so we can't get at the formula string yet// However, they don't care about the formula string, so that's ok!}}} else {fIsOpen = true;}}else if("oddHeader".equals(localName) || "evenHeader".equals(localName) ||"firstHeader".equals(localName) || "firstFooter".equals(localName) ||"oddFooter".equals(localName) || "evenFooter".equals(localName)) {hfIsOpen = true;// Clear contents cacheheaderFooter.setLength(0);}else if("row".equals(localName)) {String rowNumStr = attributes.getValue("r");if(rowNumStr != null) {rowNum = Integer.parseInt(rowNumStr) - 1;} else {rowNum = nextRowNum;}output.startRow(rowNum);}// c => cellelse if ("c".equals(localName)) {// Set up defaults.this.nextDataType = xssfDataType.NUMBER;this.formatIndex = -1;this.formatString = null;cellRef = attributes.getValue("r");String cellType = attributes.getValue("t");String cellStyleStr = attributes.getValue("s");if ("b".equals(cellType))nextDataType = xssfDataType.BOOLEAN;else if ("e".equals(cellType))nextDataType = xssfDataType.ERROR;else if ("inlineStr".equals(cellType))nextDataType = xssfDataType.INLINE_STRING;else if ("s".equals(cellType))nextDataType = xssfDataType.SST_STRING;else if ("str".equals(cellType))nextDataType = xssfDataType.FORMULA;else {// Number, but almost certainly with a special style or formatXSSFCellStyle style = null;if (stylesTable != null) {if (cellStyleStr != null) {int styleIndex = Integer.parseInt(cellStyleStr);style = stylesTable.getStyleAt(styleIndex);} else if (stylesTable.getNumCellStyles() > 0) {style = stylesTable.getStyleAt(0);}}if (style != null) {this.formatIndex = style.getDataFormat();this.formatString = style.getDataFormatString();if (this.formatString == null)this.formatString = BuiltinFormats.getBuiltinFormat(this.formatIndex);}}}}@Overridepublic void endElement(String uri, String localName, String qName)throws SAXException {if (uri != null && ! uri.equals(NS_SPREADSHEETML)) {return;}String thisStr = null;// v => contents of a cellif (isTextTag(localName)) {vIsOpen = false;// Process the value contents as required, now we have it allswitch (nextDataType) {case BOOLEAN:char first = value.charAt(0);thisStr = first == '0' ? "FALSE" : "TRUE";break;case ERROR:thisStr = "ERROR:" + value.toString();break;case FORMULA:if(formulasNotResults) {thisStr = formula.toString();} else {String fv = value.toString();if (this.formatString != null) {try {// Try to use the value as a formattable numberdouble d = Double.parseDouble(fv);thisStr = formatter.formatRawCellContents(d, this.formatIndex, this.formatString);} catch(NumberFormatException e) {// Formula is a String result not a Numeric onethisStr = fv;}} else {// No formating applied, just do raw value in all casesthisStr = fv;}}break;case INLINE_STRING:// TODO: Can these ever have formatting on them?XSSFRichTextString rtsi = new XSSFRichTextString(value.toString());thisStr = rtsi.toString();break;case SST_STRING:String sstIndex = value.toString();try {int idx = Integer.parseInt(sstIndex);XSSFRichTextString rtss = new XSSFRichTextString(sharedStringsTable.getEntryAt(idx));thisStr = rtss.toString();}catch (NumberFormatException ex) {logger.log(POILogger.ERROR, "Failed to parse SST index '" + sstIndex, ex);}break;case NUMBER:String n = value.toString();if (this.formatString != null && n.length() > 0)thisStr = formatter.formatRawCellContents(Double.parseDouble(n), this.formatIndex, this.formatString);elsethisStr = n;break;default:thisStr = "(TODO: Unexpected type: " + nextDataType + ")";break;}// Do we have a comment for this cell?checkForEmptyCellComments(EmptyCellCommentsCheckType.CELL);XSSFComment comment = commentsTable != null ? commentsTable.findCellComment(new CellAddress(cellRef)) : null;// Outputoutput.cell(cellRef, thisStr, comment);} else if ("f".equals(localName)) {fIsOpen = false;} else if ("is".equals(localName)) {isIsOpen = false;} else if ("row".equals(localName)) {// Handle any "missing" cells which had comments attachedcheckForEmptyCellComments(EmptyCellCommentsCheckType.END_OF_ROW);// Finish up the rowoutput.endRow(rowNum);// some sheets do not have rowNum set in the XML, Excel can read them so we should try to read them as wellnextRowNum = rowNum + 1;} else if ("sheetData".equals(localName)) {// Handle any "missing" cells which had comments attachedcheckForEmptyCellComments(EmptyCellCommentsCheckType.END_OF_SHEET_DATA);}else if("oddHeader".equals(localName) || "evenHeader".equals(localName) ||"firstHeader".equals(localName)) {hfIsOpen = false;output.headerFooter(headerFooter.toString(), true, localName);}else if("oddFooter".equals(localName) || "evenFooter".equals(localName) ||"firstFooter".equals(localName)) {hfIsOpen = false;output.headerFooter(headerFooter.toString(), false, localName);}}/*** Captures characters only if a suitable element is open.* Originally was just "v"; extended for inlineStr also.*/@Overridepublic void characters(char[] ch, int start, int length)throws SAXException {if (vIsOpen) {value.append(ch, start, length);}if (fIsOpen) {formula.append(ch, start, length);}if (hfIsOpen) {headerFooter.append(ch, start, length);}}/*** Do a check for, and output, comments in otherwise empty cells.*/private void checkForEmptyCellComments(EmptyCellCommentsCheckType type) {if (commentCellRefs != null && !commentCellRefs.isEmpty()) {// If we've reached the end of the sheet data, output any// comments we haven't yet already handledif (type == EmptyCellCommentsCheckType.END_OF_SHEET_DATA) {while (!commentCellRefs.isEmpty()) {outputEmptyCellComment(commentCellRefs.remove());}return;}// At the end of a row, handle any comments for "missing" rows before usif (this.cellRef == null) {if (type == EmptyCellCommentsCheckType.END_OF_ROW) {while (!commentCellRefs.isEmpty()) {if (commentCellRefs.peek().getRow() == rowNum) {outputEmptyCellComment(commentCellRefs.remove());} else {return;}}return;} else {throw new IllegalStateException("Cell ref should be null only if there are only empty cells in the row; rowNum: " + rowNum);}}CellAddress nextCommentCellRef;do {CellAddress cellRef = new CellAddress(this.cellRef);CellAddress peekCellRef = commentCellRefs.peek();if (type == EmptyCellCommentsCheckType.CELL && cellRef.equals(peekCellRef)) {// remove the comment cell ref from the list if we're about to handle it alongside the cell contentcommentCellRefs.remove();return;} else {// fill in any gaps if there are empty cells with comment mixed in with non-empty cellsint comparison = pareTo(cellRef);if (comparison > 0 && type == EmptyCellCommentsCheckType.END_OF_ROW && peekCellRef.getRow() <= rowNum) {nextCommentCellRef = commentCellRefs.remove();outputEmptyCellComment(nextCommentCellRef);} else if (comparison < 0 && type == EmptyCellCommentsCheckType.CELL && peekCellRef.getRow() <= rowNum) {nextCommentCellRef = commentCellRefs.remove();outputEmptyCellComment(nextCommentCellRef);} else {nextCommentCellRef = null;}}} while (nextCommentCellRef != null && !commentCellRefs.isEmpty());}}/*** Output an empty-cell comment.*/private void outputEmptyCellComment(CellAddress cellRef) {XSSFComment comment = commentsTable.findCellComment(cellRef);output.cell(cellRef.formatAsString(), null, comment);}private enum EmptyCellCommentsCheckType {CELL,END_OF_ROW,END_OF_SHEET_DATA}/*** You need to implement this to handle the results* of the sheet parsing.*/public interface SheetContentsHandler {/** A row with the (zero based) row number has started */public void startRow(int rowNum);/** A row with the (zero based) row number has ended */public void endRow(int rowNum);/** * A cell, with the given formatted value (may be null), * and possibly a comment (may be null), was encountered */public void cell(String cellReference, String formattedValue, XSSFComment comment);/** A header or footer has been encountered */public void headerFooter(String text, boolean isHeader, String tagName);}}

本文在性能测试时处理100W数据的excel文件的时候,“获取前台导入的文件并得到分类汇总的文件数据和明细的文件数据”只需要25秒,这一点上性能已经很高了,本系统中的性能瓶颈已经完美解决了。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。