Coder Social home page Coder Social logo

pomegranate's Introduction

Pomegranate File System Documentation

It is a distributed file system, but not only a file system!

Wiki Page

Introduction

Pomegranate File System (abbr. PFS) is originally proposed for large scale small file access. It contains many optimizations for small objects.

  • Automatic small file aggregation based on file system directory
  • Tabular directory model, support metadata deduplication
  • Automatic migrating file creations in a cluster
  • Metadata store and small file data store is designed for flash device
  • Support POSIX, REST interface
  • Has C/Python bindings

Architecture

To exploit fast storage devices to accelerate small file performace, e.g. SSD, PFS adopts a 3-tier storage architecture.

The first tier is memory caching layer, which is used for metadata caching to reduce metadata latency. Metadata latency has significant impacts on small file I/O latency. Decreasing metadata latency can efficient improve the small file performace.

The second tier is flash caching layer, which is used for durability of metadata and small data. Flash device has lower I/O latency. Thus, it is suitable for small data access.

The third tier is disk store layer, which is designed for longer durability of all data. It use data replication for data reliability and deduplication for efficient space consumption.

Tabular Directory Model

In many Web 2.0 applications, objects (e.g. photos, videos, docs, ...) are saved in several different forms. For example, in a photo gallery web site, photoes that updated by users are transformed to several resolutions. These different object forms that derived from the same (original) object contains almost the same metadata. Thus, if we save these different forms into different files, then we would have many metadata duplication in distributed file system. We define this issue as N-Form issue.

To overcome the above N-Forms issue, we propose to introduce powerful directory model to traditional file system. In PFS, we use tabular directory model to keep file system metadata. With one file name, users can save many different object forms in different columns' cells. File metadata is a special table column of the directory table.

By adopting tabular directory model, the metadata duplication of N-Form issue can be overcomed. Besides this benefit, the new directory model grouped the file data which has the same property or usage purpose in the same column. Thus, we can do more efficient file placements and aggregations.

File Aggregation

In Web 2.0 applications, objects are mainly in small size. For example, social network web pages contain many small sized photoes and short video segments. The typical size of these objects are less than 10MB. Many traditional distributed file systems are designed for HPC applications, which targets at large file I/O optimization. Thus, for small files, many of these I/O optimizations are not as efficient as that for large files.

To optimize small file I/O, we propose to do file aggregation based on tabular directory model. For files that in the same directory, we do file aggregations automatically. For each directory column, we generate an aggregated large file. File content is cached and then write sequentially to low level SSD. File aggregation can maximally utilize low level I/O bandwidth.

Extendible Metadata Service

There are so many objects to store in Web 2.0 applications. User generated objects, such as uploaded photoes, videos, documents, are tremendous. To manage these massive objects in a file system means that we need a expandable metadata service.

In PFS, we exploid the extendible hash technology to distribute file metadata across many cache servers. Metadata can migrate from one server to other server when there are too many cached file entries. The cache server can be add in or remove out at any time with little latency. File metadata is redistributed automatically on server changes.

Development Cycle

A new OBJECT STORE LAYER for large files is under developing.

pomegranate's People

Contributors

macan avatar

Stargazers

Eason Wang avatar Oktavianus Ginting avatar Michael Corrado avatar Prem avatar  avatar Ben Wills avatar Grzegorz Wierzowiecki avatar FeiZhang avatar Ertuğrul TAŞ avatar Kevin Hatfield avatar Devying avatar  avatar  avatar Angus H. avatar Vlad Temian avatar James Scott avatar David.Gao avatar Tom avatar Daehyung Lee avatar Franklin Wise avatar Artem avatar  avatar  avatar Vladimir avatar  avatar Maxence avatar GP Wang avatar Michael Williams avatar  avatar  avatar Guo-Wei Su avatar Zou Guangxian avatar Jan Jongboom avatar  avatar Sebastien Caps avatar Kartik Talwar avatar  avatar  avatar Pablo Saavedra Rodiño  avatar Henning Rauch avatar Angus He avatar Isaac Christoffersen avatar Yuan Yu avatar David Long avatar llei avatar Achim Friedland avatar Mitch Dempsey avatar  avatar Yin Yee Lai avatar Noah White avatar xiaoao avatar Henrik Westphal avatar Michalis Polakis avatar Florent Solt avatar Christoph Lühr avatar Derek Gerstmann avatar  avatar Rostyslav Mykhajliw avatar Sunny Gleason avatar  avatar  avatar Gimi Liang avatar  avatar Chuck McKnight avatar Steve Morin avatar  avatar Suleman Chikhalia avatar Sergio Bossa avatar Diego Caravana avatar Adam Ramadhan avatar Dennis Watson avatar Alfredo Serafini avatar Manuel Polo avatar Stephano Zanzin Ferreira avatar Ed McCaffrey avatar Paolo de Dios avatar Laurent Laborde avatar Carson McDonald avatar Matthew Matey avatar  avatar Reza Lotun avatar Łukasz Twarogowski avatar Trey Hyde avatar Francisco Alves avatar Vasco Fernandes avatar Ric Roberts avatar 张炎泼 avatar Pham Cong Dinh avatar Stefan Schälle avatar Joubin Houshyar avatar Martin Pompéry avatar Mehdi avatar Brad Jasper avatar Abdulaziz Alshetwi avatar Rogier Peters avatar Olivier Marin avatar  avatar Mark Obcena avatar Bernie Telles avatar Flinn Mueller avatar

Watchers

 avatar Laurence avatar Xavier Normand avatar Frank Lu avatar  avatar  avatar  avatar Dmitriy avatar  avatar

pomegranate's Issues

Bug in xnet_simple st_update()

Sometimes, the accepted socket fd can not be set to the table. There is a conflict entry already table, maybe the last broken connection. Check it!

xTable

How does xtable organize data in the Pomegranate file system? Could you please explain it with a diagram? Thank you very much

MDSL ITB toe lookup failure

The writebacked ITB can not be lookuped in the TOE list.
Maybe the itb counter is not correct or the MDSL code is not correct, should be rechecked!

itb_dirty assertion failed!

itb_dirty in future access case, we got a assertion failure.
it is related to mds_read_itb() and cbht_itb_miss(), the itb->h.state can be CLEAN.

xnet_free_msg() bug

In xnet-simple RESEND mode, there is a race between the reply message and the original xnet_msg, it is hard to trace the calling path.
For now, we just disable the RESEND mode to fix this bug:(

Crash on gossiping removed directory

[macan@gh01 hvfs]$ *** glibc detected *** /home/macan/hvfs/test/xnet/mds.ut: double free or corruption (!prev): 0x00002aaaac48fce0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3bab271ce2]
/lib64/libc.so.6[0x3bab2738e2]
/lib64/libc.so.6(realloc+0x1d0)[0x3bab275c30]
/home/macan/hvfs/test/xnet/mds.ut(txg_add_rdir+0x7d)[0x44373d]
/home/macan/hvfs/test/xnet/mds.ut(mds_gossip_rdir+0x1d7)[0x45ba37]
/home/macan/hvfs/test/xnet/mds.ut(mds_mds_dispatch+0x127)[0x464667]
/home/macan/hvfs/test/xnet/mds.ut[0x45d5b2]
/lib64/libpthread.so.0[0x3babe06367]
/lib64/libc.so.6(clone+0x6d)[0x3bab2d2f7d]

PFS fuse client failed with mdtest

macan@MACANA ~/hvfs/working $ git bisect start
macan@MACANA ~/hvfs/working $ git bisect bad
macan@MACANA ~/hvfs/working $ git bisect good 1491353
Bisecting: 24 revisions left to test after this (roughly 5 steps)
[19f1fb1] Finish PFS xattr integrated test case. Seems that it works fine :)
macan@MACANA ~/hvfs/working $ git bisect good 19f1fb1
Bisecting: 12 revisions left to test after this (roughly 4 steps)
[15b2e14] Cleanup c2m.c and remove redundant code, and add fuse client latency distrbution. To use fuse client latency distribution, add 'CFLAGS='-DHVFS_FUSE_STAT'' and 'USE_FUSE=1' before make.
macan@MACANA ~/hvfs/working $ git bisect good 15b2e14
Bisecting: 6 revisions left to test after this (roughly 3 steps)
[ba24348] State checking and corrupt message checking in bitmap gossip
macan@MACANA ~/hvfs/working $ git bisect good
Bisecting: 2 revisions left to test after this (roughly 2 steps)
[2a11fef] Merge fuse microbenchmark from master branch
macan@MACANA ~/hvfs/working $ git bisect good
Bisecting: 0 revisions left to test after this (roughly 1 step)
[97d6bac] Add dbsearch bench, change xattr microbench for fast12 paper.
macan@MACANA ~/hvfs/working $ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[1d0595a] Internal API bug fix for hvfs_stat()

Hash Imbalance in Postmark Test

Using postmark to test tiny file I/O, we found that the disk usage for each server is imbalance. The largest disk image can be 2 or 3 times of the smallest one. Maybe a new hash function is useful for postmark test.

ITB_JUST_SPLIT Assertion failed

Enable memory limit, run unit test.
After running a few TXGs, mds.ut crashed with ITB reference assertion failure.

It is a conflict between txg_wb() and aur_itb_split(). The usage of itb_put() should be revised.

ITB active entry count is incorrect

Under heavy memory pressure, the active entry counter in ITB is not correct.
It is easy to reproduce this bug. Run the API unit test several times, you can got it!

Source file missing from tree

[root@vs1 Pomegranate]# make
make: *** No rule to make target /usr/src/Pomegranate/test/xnet/cr.c', needed byunit_test'. Stop.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.