Coder Social home page Coder Social logo

Comments (10)

massdosage avatar massdosage commented on August 23, 2024

I think the general idea of having a version of uploading a file that lets you know when it has been replicated to meet the replication policy rather than just after the first copy is uploaded is a good one. You could implement it like you suggested or just have a version of the existing upload methods which take one extra optional parameter indicating whether you want a "reliable" or "fully replicated" upload. I guess that's up to the mogilefs-server developers and their API but I can definitely see value in having something like this.

from moji.

hrchu avatar hrchu commented on August 23, 2024

There are several possible strategies to upload a file to multiple destinations about this issue. I think the basic two are 1. trivial single thread upload and 2. multi thread producer-consumer upload.

I wrote a little PoC code to benchmark above two strategies posted here. I run the code in a multiple hosts/disks environment. Since the network in the environment is expected to be bottleneck, I run the code via 1Gbps and 1Gbps*2=2Gbps network respectively to see it's performance (latency/bandwidth) variance in specific setup. The result is described below. Notice that the "original upload" shown below only uploads a single copy and act as a control group.

  • Time to complete 1GB file upload via 1Gbps network:
    original upload: 10557.4 ms
    single thread upload: 20150.7 ms
    producer-consumer upload: 21179.4 ms
  • Time to complete 1GB file upload via 2Gbps network:
    original upload: 10239.2 ms => 102MBps
    single thread upload: 12777.3 ms => 82MBps
    producer-consumer upload: 13914.9 ms => 75MBps
  • Time to complete 128KB file upload via 1Gbps network:
    original upload: 7.1ms
    single thread upload: 7.5ms
    producer-consumer upload: 6.5ms
  • Time to complete 128KB file upload via 2Gbps network:
    original upload: 7.3ms
    single thread upload: 5.7ms
    producer-consumer upload: 6.2ms

Personally I think the first version will adopt "single thread upload" for it's simplicity and acceptable overhead in performance. Design for users who needs durability > performance.

from moji.

hrchu avatar hrchu commented on August 23, 2024

API design

The original API design let storage class to be assigned in Moji.getFile(String key, String storageClass) instead of MojiFile.getOutputStream(String storageClass) or elsewhere. I intends to follow the same style of design here. API after modification will add a the following new method in Moji:

  /**
   * Creates an abstract representation of a remote MogileFS file for the given key and intends to assign the specified
   * storage class to the file. Mogilefs stores just one copy when uploading file in default. To store file in a more 
   * durable way, set durableWrite to true, then the file will have at least two replicas before file write is 
   * acknowledged. Caution that durableWrite consumes more bandwidth and may cause performance degradation.
   *
   * @param key MogileFS file key.
   * @param storageClass The storage class to which a new file will be assigned.
   * @param durableWrite To create at least two replicas when uploading file. Default is false.
   * @return Representation of the remote file.
   */
  MojiFile getFile(String key, String storageClass, boolean durableWrite);

Then when doing file upload in following examples, the file will have multiple copies before file upload is finished.

    MojiFile fooFighters = moji.getFile("stacked-actors", "dom1", true);
    fooFighters.copyToFile(new File("foo-fighters.mp3")); 

// or

    OutputStream stream = null;
    try {
      stream = fooFighters.getOutputStream();
      // Do something streamy
      //   stream.write(...);
      stream.flush();
    } finally {
      stream.close();
    }

The file will lost just in a rarely scenario that two disks which retains replicas belong to the file are broken.

p.s. I post the progress so that anyone who interest in this could join the design here.

from moji.

teabot avatar teabot commented on August 23, 2024

I have a general API design comment; personally I'd avoid a boolean flag to
modify this behaviour as it obscures the intent from anyone reading the
code. Instead I'd go for either:

  • Enum parameter such as: WriteType.ONE_COPY, WriteType.DURABLE or
    similar
  • A separate, appropriately named method to work alongside
    Moji.getFile(String
    key, String storageClass) such as: Moji.getDurableFile(String key,
    String storageClass)

Elliot.

On 5 October 2016 at 09:21, hrchu [email protected] wrote:

API design

The original API design let storage class to be assigned in Moji.getFile(String
key, String storageClass) instead of MojiFile.getOutputStream(String
storageClass) or elsewhere. I intends to follow the same style of design
here. API after modification will add a the following new method in Moji:

/**

  • Creates an abstract representation of a remote MogileFS file for the given key and intends to assign the specified
  • storage class to the file. Mogilefs stores just one copy when uploading file in default. To store file in a more
  • durable way, set durableWrite to true, then the file will have at least two replicas before file write is
  • acknowledged. Caution that durableWrite consumes more bandwidth and may cause performance degradation.
    *
  • @param key MogileFS file key.
  • @param storageClass The storage class to which a new file will be assigned.
  • @param durableWrite To create at least two replicas when uploading file. Default is false.
  • @return Representation of the remote file.
    */
    MojiFile getFile(String key, String storageClass, boolean durableWrite);

Then when doing file upload in following examples, the file will have
multiple copies before file upload is finished.

MojiFile fooFighters = moji.getFile("stacked-actors", "dom1", true);
fooFighters.copyToFile(new File("foo-fighters.mp3"));

// or

OutputStream stream = null;
try {
  stream = fooFighters.getOutputStream();
  // Do something streamy
  //   stream.write(...);
  stream.flush();
} finally {
  stream.close();
}

The file will lost just in a rarely scenario that two disks which retains
replicas belong to the file are broken.

p.s. I post the progress so that anyone who interest in this could join
the design here.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#25 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAN-VvSzXI5sc2gOTITN_x9cKjRz5nXGks5qw13zgaJpZM4JecpW
.

from moji.

hrchu avatar hrchu commented on August 23, 2024

@teabot I agree that boolean flag is hard to understand it's usage without IDE. I like the enum parameter approach, with values WriteStrategy.DEFAULT and WriteStrategy.DURABLE. I think that not to use number in the enum value can prevent user confusing this with storageClass (which also implies number of replicas has to be retained). Thank you for the suggestion (and for this project!)

from moji.

massdosage avatar massdosage commented on August 23, 2024

+1 for the enum suggestion with the values proposed by @hrchu

from moji.

hrchu avatar hrchu commented on August 23, 2024

I have finished the first version in branch enhance/durableWrite. I think it should not be merged before mogilefs/MogileFS-Server#39 be accepted. Since mogilefs team is inactive, I am going to use it in my production first.

from moji.

massdosage avatar massdosage commented on August 23, 2024

Sounds good and agree we shouldn't merge it here until it gets supported upstream (or in an upstream fork that is available for end users).

from moji.

hrchu avatar hrchu commented on August 23, 2024

the branch enhance/durableWrite is disappeared, wired.

from moji.

massdosage avatar massdosage commented on August 23, 2024

Hmmm, not sure how that happened. I have a version of that branch still checked out locally, I can do a diff against master and send you patch if that would help? A lot has changed in master since then so it would require quite a bit of work to get it mergeable but at least it would be a start.

from moji.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.