Coder Social home page Coder Social logo

filespilt-demo's Introduction

java实现大文件拆分示例代码

本示例程序实现了两种大文件拆分的方案

  • 单线程读多线程写方案,该方案使用了两种不同的线程池实现:ThreadPoolExcutor和ForkJoinPool,分别对应NORMAL和FORKJOIN两种执行模式;
  • 生产者-消费者模式下的多线程读/写方案,对应PRODUCERCONSUMER执行模式。
  • 基于Disruptor的生产者-消费者模式下的多线程读/写方案,对于DISRUPTOR执行模式。

程序目录结构

  • com.daoqidlv.filespilt —— 公共类,及程序入口类
  • com.daoqidlv.filespilt.single.normal —— NORMAL模式的具体实现类
  • com.daoqidlv.filespilt.single.forkjoin —— FORKJOIN模式的具体实现类
  • com.daoqidlv.filespilt.mutil —— PRODUCERCONSUMER模式下的具体实现类
  • com.daoqidlv.filespilt.disruptor —— DISRUPTOR模式下的具体实现类

示例程序介绍

入口类: com.daoqidlv.filespilt.Test.java

执行命令格式: java -jar fileapilt.jar #fileDir #fileName #subFileSizeLimit #mode [#readTaskNum #writeTaskNum #queueSize #bufferSize]

  1. #root_dir —— 源文件及拆分后子文件放置的根目录
  2. #orign_file_name —— 原文件名
  3. #subFileSizeLimit —— 拆分后的子文件大小上限值,开区间
  4. #mode —— 执行模式:NORMAL -- 使用普通线程池,FORKJOIN -- 使用ForkJoinPool, PRODUCERCONSUMER --生产者-消费者模式
  5. #readTaskNum —— 可选。读任务数,当参数4为PRODUCERCONSUMER/DISRUPTOR时有效,默认为24
  6. #writeTaskNum —— 可选。写任务数,当参数4为PRODUCERCONSUMER/DISRUPTOR时有效,默认为8; 当mode为DISRUPTOR,必须为2的整数倍
  7. #queueSize —— 可选。任务队列大小,当参数4为PRODUCERCONSUMER/DISRUPTOR时有效,默认为10240; PRODUCERCONSUMER时,表示所有消费者共享一个queue,DISRUPTOR时,每个消费者独享一个queue
  8. #bufferSize —— 可选。Disruptor容量大小,当参数4为DISRUPTOR时有效,默认为1024

执行命令示例:

  • NORMAL/FORKJOIN模式
    java -jar fileapilt.jar D:\Users\daoqidelv\Desktop\alibaba localhost_access_log.txt 10 FORKJOIN
  • PRODUCERCONSUMER模式
    java -jar fileapilt.jar D:\Users\daoqidelv\Desktop\alibaba localhost_access_log.txt 10 PRODUCERCONSUMER 24 8 10240

设计文档及相关讨论

大文件拆分问题的java实践 Disruptor的应用示例——大文件拆分

filespilt-demo's People

Contributors

daoqidelv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

filespilt-demo's Issues

发现两个问题

NormalPoolMaster 这个线程池 在分割文件的时候有两个小问题

1.在计算缓存大小的时候,最后满足分割条件的时候 丢掉最后一行数据

public FileWriteTask spilt(String lineContent) {
int totalSize = this.fileCacheSize + lineContent.length();
//当前行加入后,缓存的文件内容大于上限值,则生成一个新的Task
if(totalSize >= subFileSizeLimit) {
this.subFileCounter++;
String subFileName = genSubFileName();
List fileCacheCopy = new ArrayList();
fileCacheCopy.addAll(this.fileCache);
// 这里应该加入 start
this.fileCache.add(lineContent);
//end
FileWriteTask fileWriteTask = new FileWriteTask(this.fileDir, subFileName, fileCacheCopy, this.fileCacheSize);
//重置文件缓存和大小
this.fileCache.clear();
this.fileCacheSize = 0;
return fileWriteTask;
} else {
this.fileCache.add(lineContent);
this.fileCacheSize += lineContent.length();
return null;
}
}

2.在分割完成后,丢失最后一个不足缓存大小的数据。比如 :206M 文件,10M 分割,最后6M 回丢失。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.