Coder Social home page Coder Social logo

frugalos's Introduction

Frugalos

Frugal Object Storage

Crates.io: frugalos Documentation Build Status License: MIT

Frugalos is a distributed object storage written by Rust.
It is suitable for storing medium size BLOBs that become petabyte scale in total.

Documentation

Installation

You can install frugalos by the following command:

$ cargo install frugalos

Note: The current installation process requires automake, autoconf, and libtool to build liberasurecode internally. If you have not installed them, please install them. (See also liberasurecode's prerequisites)

You can also use pre-build binaries from the releases page.

Simple Example

// Create a cluster.
$ frugalos create --id example --data-dir example/
Oct 26 13:42:06.244 INFO [START] create: local=Server { id: "example", seqno: 0, host: V4(127.0.0.1), port: 14278 }; data_dir.as_ref()="example/"; , server: [email protected]:14278, module: frugalos_config::cluster:121
Oct 26 13:42:06.245 INFO Creates data directry: "example/", server: [email protected]:14278, module: frugalos_config::cluster:113
Oct 26 13:42:06.256 INFO [START] LoadBallot: lump_id=LumpId("03000000000000000000000000000000"); , server: [email protected]:14278, module: frugalos_raft::storage::ballot:21
...
...

// Start a frugalos process in the background.
$ frugalos start --data-dir example/ &
Oct 26 13:46:16.046 INFO Local server info: Server { id: "example", seqno: 0, host: V4(127.0.0.1), port: 14278 }, module: frugalos_config::service:68
Oct 26 13:46:16.062 INFO [START] LoadBallot: lump_id=LumpId("03000000000000000000000000000000"); , module: frugalos_raft::storage::ballot:21
Oct 26 13:46:16.086 INFO Starts RPC server, server: 127.0.0.1:14278, module: fibers_rpc::rpc_server:221
...
...

// Add a device and a bucket to store objects.
$ DEVICE_JSON='{"file": {"id": "file0", "server": "example", "filepath": "example/file0.lusf"}}'
$ curl -XPUT -d "$DEVICE_JSON" http://localhost:3000/v1/devices/file0
{"file":{"id":"file0","seqno":0,"weight":"auto","server":"example","capacity":19556691462,"filepath":"example/file0.lusf"}}%

$ BUCKET_JSON='{"metadata": {"id": "bucket0", "device": "file0", "tolerable_faults": 1}}'
$ curl -XPUT -d "$BUCKET_JSON" http://localhost:3000/v1/buckets/bucket0
{"metadata":{"id":"bucket0","seqno":0,"device":"file0","segment_count":1,"tolerable_faults":1}}%

// PUT and GET an object.
$ curl -XPUT -d 'your_object_data' http://localhost:3000/v1/buckets/bucket0/objects/your_object_id
$ curl http://localhost:3000/v1/buckets/bucket0/objects/your_object_id
your_object_data

Please see REST API for details and other available APIs.

For Frugalos Developers

Please see Developer's Guide.

frugalos's People

Contributors

brly avatar dependabot[bot] avatar dw-hkoba avatar koba-e964 avatar kyos3 avatar shinnya avatar sile avatar yoffy avatar yuezato avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

frugalos's Issues

`create` and `join` command sometimes fails due to Monitor target aborted

Error:

Jan 11 15:16:56.587 INFO [FINISH] create, server: [email protected]:14278, module: frugalos_config::cluster:170
thread '<unnamed>' panicked at 'Error: Other (cause; Monitor target aborted)
HISTORY:
  [0] at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/fibers_rpc-0.2.17/src/rpc_server.rs:404
  [1] at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/fibers_rpc-0.2.17/src/rpc_server.rs:281
', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/frugalos_config-0.3.0/src/cluster.rs:153:48

ディスククリア時の扱いの改善

理想的には「ディスクの内容がクリアされた場合には、そのディスクを使用しているraftノードはIDを変えて、再構成を走らせるようにする(具体的には、IDにcannylsのUUIDを含めるようにする)」といったように形で、Raftレベルでは別のノードとして認識されるのが望ましい(理屈上は「ノードIDは変わらないけれどディスククリア発生後に再起動」してしまうと不整合に陥る可能性がある)。
ただし、これはこれで面倒なので、本当にそうするかどうかは要検討。

Allow users to change `put_content_timeout` dynamically.

I would like to allow users to change put_content_timeout.

Background

frugalos automatically adds 60 secs to put_content_timeout specified by a user. It's inconvenient to debug a repair operation because the timeout also affects to a processing delay of Synchronizer.

再起動時にInstall Snapshotが無限に走り続ける

問題点は件名の通りですが、詳述のために条件・状況を先に記載します。

再現方法(状況)

  1. 複数台のfrugalosインスタンスでクラスタを作る。
  2. ログがスナップショット化される程度に多くのデータをPUTする。
  3. クラスタ中のfrugalosインスタンスのうち1つのインスタンス(以降Xと呼ぶ)を停止する。
    4a. Xに関するファイルのうち、 cluster.lusflocal.dat 以外の 全てのデータファイル(lusfファイル)を削除する。
    4b. 4aに並行して、データをPUTし続ける。
  4. Xを再起動する。

ここで5においてXを再起動後、2でログがスナップショット化されていることと、4aでデータファイルを消しておりログが初期化されているため、XはSnapshotを受け取りSaveする処理が走ることに注意。

問題が発生する前提

  • Xの使っているディスクの書き込み・読み込み速度が低下しており、XのSnapshot Save処理が終了するまでに膨大な時間を要する。
    impl Future for SaveLogPrefix {
    type Item = ();
    type Error = Error;
    fn poll(&mut self) -> Poll<Self::Item, Self::Error> {
    while let Async::Ready(phase) = self.phase.poll()? {
    let next = match phase {
    Phase5::A(index) => {
    let index = index.unwrap_or(Range { start: 0, end: 0 });
    self.old_prefix_index = index.clone();
    let prefix = self.prefix.take().expect("Never fails");
    let future =
    track!(SaveLogPrefixBytes::new(self.handle.clone(), index, prefix))?;
    Phase5::B(future)
    }
    Phase5::B(prefix_index) => {
    let future =
    track!(SaveLogPrefixIndex::new(self.handle.clone(), prefix_index))?;
    Phase5::C(future)
    }
    Phase5::C(()) => {
    let future = DeleteOldLogPrefixBytes::new(
    self.handle.clone(),
    self.old_prefix_index.clone(),
    );
    Phase5::D(future)
    }
    Phase5::D(()) => {
    let future = track!(DeleteOldLogEntries::new(
    self.handle.clone(),
    self.old_entries.clone()
    ))?;
    Phase5::E(future)
    }
    Phase5::E(()) => {
    info!(self.handle.logger, "[FINISH] SaveLogPrefix");
    let event = Event::LogPrefixUpdated {
    new_head: self.new_head,
    };
    let _ = self.event_tx.send(event);
    return Ok(Async::Ready(()));
    }
    };
    self.phase = next;
    }
    Ok(Async::NotReady)
    }
    }

問題の発生

(前提から)XのSnapshot Save処理の終了までに時間がかかってしまうため、4bにおいて他のログが進んでしまう。
このためSnapshot Save処理が終了した段階ではログ情報がリーダーらからみると古いため、Snapshot Save終了後にまたしてもSnapshotをインストールしなければならない。

これが いつまでも 繰り返されてしまう。

メモ

  • そもそも、これ単体は危機的な状況を招く問題なのか?
  • やむを得ない状況であり、回避不能ではないか?

ただし、類似した状況で、更に別の要因が重なると、危機的な問題 #54 が起こることは判明している。

Massive snapshot casting and High memory usage When restart a frugalos instance

To correctly write this problem, we start from a way to reproduce this one.

How to Reproduce

  1. Build a frugalos cluster.
  2. Put many data where the amount of the put data requires frugalos instances to take a snapshot.
  3. Drop a frugalos instance X from the cluster and delete all X's lusf files except cluster.lusf and local.dat .
  4. Restart the frugalos instance X .
  5. Drop another frugalos instance Y in which there is a leader raft node w.r.t a raft node belonging to X.

Assumption

X takes a long time to finish installing snapshot for some reasons on its device:

impl Future for SaveLogPrefix {
type Item = ();
type Error = Error;
fn poll(&mut self) -> Poll<Self::Item, Self::Error> {
while let Async::Ready(phase) = self.phase.poll()? {
let next = match phase {
Phase5::A(index) => {
let index = index.unwrap_or(Range { start: 0, end: 0 });
self.old_prefix_index = index.clone();
let prefix = self.prefix.take().expect("Never fails");
let future =
track!(SaveLogPrefixBytes::new(self.handle.clone(), index, prefix))?;
Phase5::B(future)
}
Phase5::B(prefix_index) => {
let future =
track!(SaveLogPrefixIndex::new(self.handle.clone(), prefix_index))?;
Phase5::C(future)
}
Phase5::C(()) => {
let future = DeleteOldLogPrefixBytes::new(
self.handle.clone(),
self.old_prefix_index.clone(),
);
Phase5::D(future)
}
Phase5::D(()) => {
let future = track!(DeleteOldLogEntries::new(
self.handle.clone(),
self.old_entries.clone()
))?;
Phase5::E(future)
}
Phase5::E(()) => {
info!(self.handle.logger, "[FINISH] SaveLogPrefix");
let event = Event::LogPrefixUpdated {
new_head: self.new_head,
};
let _ = self.event_tx.send(event);
return Ok(Async::Ready(()));
}
};
self.phase = next;
}
Ok(Async::NotReady)
}
}

What happens ?

After the above reproduce steps, it occurs that massive snapshot casting to X's raft nodes from nodes in the cluster.
If the size of one snapshot size is huge, this behavior leads to out of memory errors in frugalos instances that send massive snapshots and destroy the frugalos cluster.

Why do massive snapshot castings occur?

Here we fix a raft node N belonging to the instance X such that

  • for N, there is a leader node in Y (we call this node L).

After the step-4, by the step-2, N receives a snapshot and installs it through the following codes:
https://github.com/frugalos/frugalos/blob/master/frugalos_raft/src/storage/log_prefix/save.rs#L64-L110

By the assumption, N takes a long time to install the snapshot.
Furthermore, among this snapshot installing, by the step-5, N changes its state to candidate because of a raft-level timeout due to the absence of its leader L.

After that, N votes a raft node and changes its state to Follower (more specifically, FollowerIdele):
https://github.com/frugalos/raftlog/blob/2e3cb4647d4ebf888f836bb9b1a209626a5c344b/src/node_state/follower/idle.rs#L13-L15

Now, N is installing a snapshot and in FollowerIdle.
This state leads to the next properties:

  • Since N is installing a snapshot, N still does not change log status (therefore, its log tail position is 0)
  • Since N is in FollowerIdle and N's log is too old w.r.t that of the leader.

OK, if N receives a heartbeat from a (new) leader, then what happens?

  1. N replies it from a (new) leader with an information such that N's log is too old.
  2. After the leader receives the reply, it casts a snapshot to N.
  3. N receives a snapshot, but it is still installing the old snapshot. Therefore, N drops the cast.
  4. N again receives a heartbeat and goto 1.

Finally, the leader quickly and repeatedly cast a snapshot to N until N finishes its slow snapshot installing.

Math: Estimate how many data will be sent from a frugalos instance Z.

  • n: the size of the set { <x, z> : a raft node z of Z is the leader of a raft ode x of X }
  • S: the byte size of snapshots (that is sent to X from Z)
  • I: the interval seconds to send a heartbeat from Z to X

For every I-seconds, the frugalos instance Z put n*S-bytes data.

For example, on our experiment environment, n = 500, S = 50MB and I = 5 secs.
num_of_cast
(On this picture, each 1-block means 10 seconds)

For every 5-seconds, one frugalos instance try to send 25GB-data to X and this eventually leads to run out its memory; finally, the frugalos instance stops.

erasure coding でデータを復元する時に、fragment 位置を指す index が fragments の最大値を超える

現象

let missing_index = self

上記コードの missing_index の計算時(とその後の使用)にフラグメントの個数を考慮していないため、openstack/liberasurecode 内でデータの復元に失敗し invalid memory reference が発生する。

なお、データを put する際はフラグメントの個数と candidates 数の最小値が使われるのでエラーは発生しない。

再現方法

以下の patch をあてて cargo tet を実行する。

diff --git a/frugalos_segment/src/client/storage.rs b/frugalos_segment/src/client/storage.rs
index 9b2e425..7b84d20 100644
--- a/frugalos_segment/src/client/storage.rs
+++ b/frugalos_segment/src/client/storage.rs
@@ -836,4 +836,43 @@ mod tests {
 
         Ok(())
     }
+
+    #[test]
+    fn it_fails() -> TestResult {
+        // データフラグメントを 5 (data_fragments = 4, parity_fragments = 1) にして、6 ノードのクラスタを組む。
+        let data_fragments = 5;
+        let mut system = System::new(data_fragments)?;
+        let (node_id, device_id, _) = system.make_node()?;
+        let mut members = Vec::new();
+
+        members.push(ClusterMember {
+            node: node_id,
+            device: device_id,
+        });
+
+        for _ in 0..data_fragments {
+            let (node, device, _) = system.make_node()?;
+            members.push(ClusterMember { node, device });
+        }
+
+        let storage_client = system.boot(members)?;
+
+        // 6 を取得する。(node_id は本来はこのオブジェクトの保存担当ではない)
+        let version = ObjectVersion(6);
+
+        let _ = wait(storage_client.clone().put(
+            version.clone(),
+            vec![0x01],
+            Deadline::Infinity,
+            Span::inactive().handle(),
+        ))?;
+
+        // ここで invalid memory reference が発生する
+        let _ = wait(storage_client.clone().get_fragment(
+            node_id.clone(),
+            version.clone()
+        ))?;
+
+        Ok(())
+    }
 }

Break the precondition of `handle_committed`

Description

We can break the following precondition of the handle_committed method:

track_assert_eq!(self.next_commit, commit, ErrorKind::InvalidInput);

Reproduce

Use these files: https://gist.github.com/yuezato/9c0af68320935b342d0b152811f58cfc

Why is the precondition broken

In this while-loop:
https://github.com/frugalos/frugalos/blob/master/frugalos_mds/src/node/node.rs#L729-L738
here we assume the two raft events [ Event::SnapshotLoaded, Event::Committed ] come in this order.

First, we deal the Event::SnapshotLoaded

E::SnapshotLoaded { new_head, snapshot } => {
info!(
self.logger,
"New snapshot is loaded: new_head={:?}, bytes={}",
new_head,
snapshot.len()
);
let logger = self.logger.clone();
let future = fibers_tasque::DefaultCpuTaskQueue.async_call(move || {
let machine = track!(codec::decode_machine(&snapshot))?;
let versions = machine.to_versions();
info!(logger, "Snapshot decoded: {} bytes", snapshot.len());
Ok((new_head, machine, versions))
});
self.decoding_snapshot = Some(future);
}

without updating self.next_commit.

Immediately after receiving Event::Committed, we reach this line:

track_assert_eq!(self.next_commit, commit, ErrorKind::InvalidInput);

Finally, the precondition is broken.

How Solve This

Once we encounter a SnapshotLoaded event,
we should wait to deal committed events that follows the loaded event among decoding the snapshot.

Indeed, in this part (Especially line 704)

match track!(self.decoding_snapshot.poll().map_err(Error::from))? {
Async::NotReady => return Ok(Async::NotReady),
Async::Ready(None) => {}
Async::Ready(Some(result)) => {
let (new_head, machine, versions) = track!(result)?;
info!(self.logger, "Snapshot decoded: new_head={:?}", new_head);
let delay = env::var("FRUGALOS_SNAPSHOT_REPAIR_DELAY")
.ok()
.and_then(|v| v.parse().ok())
.unwrap_or(10);
self.events.reserve_exact(machine.len());
self.events
.extend(versions.into_iter().map(|version| Event::Putted {
version,
put_content_timeout: Seconds(delay),
}));
self.next_commit = new_head.index;
self.machine = machine;
self.metrics.objects.set(self.machine.len() as f64);
self.decoding_snapshot = None;
}
}

we can correctly update self.next_commit and this maybe solve the present issue.

Functionality to support make a cluster

Now there are no support functionalities or external tools that support to make/start a frugalos cluster. If there is such a builtin functionality, it is convenient and safe.

I consider the following idea. Do you have any comments?

Idea; Using Configuration File

# YAML format
- name: instance_name1
  role: create
  rpc-addr: 192.168.0.1:14000
  http-addr: 192.168.0.1:3000
  files: path1/file1, path2/file2

- name: instance_name2
  role: join
  rpc-addr: 192.168.0.2:14000
  http-addr: 192.168.0.2:3000
  files: pathA/fileA, pathB/fileB, pathC/fileC

- name: instance_name3
  role: join
  rpc-addr: 192.168.0.3:14000
  http-addr: 192.168.0.3:3000
  files: pathX/fileX, pathY/fileY

- bucket_name: bucket
  type: dispersed
  data-fragment: 8
  parity-fragment: 4
  device: { type: virtual, instances: [instance_name1, instance_name2, instance_name3] }

Usage:

on 192.168.0.{1, 2, 3}: frugalos cluster-start config.file

Internally,

  1. instance_name1 gathers the config files from instance_name2 and instance_name3 and then check all the three configuration files are equal.
  2. instance_name1 first execute frugalos create and then request instance_name{2, 3} to execute frugalos join againt instance_name1
  3. Finally, instance_name1 makes buket on itself.

オブジェクトの接頭辞削除ですべてのオブジェクトを削除してしまうミスを防ぐ方法を検討する

すべてのデータを削除したいユースケースは実運用ではほぼないと想定されるため、プログラムのバグによりすべてのデータが消失してしまうのは避けられるといいかもしれないという話です。

以下 sile 先生のアイディア

個人的には、(例えば)以下のような許可リストファイルを起動時にfrugalosに指定するようにするのが良いのではないかと思っています:

# なぜかYAML形式(RustならTOMLの方が良さそう)
allow_prefix_delete:
  - bucket: "foo"                                # 接頭辞削除を許可しているバケツ
    object_prefix: "xxxxx.timeshift."  # この文字列を接頭辞に含まない場合には、削除要求は拒否される

Error in LoadLogPrefix after restarting a frugalos server that has stopped abnormally.

Where does error occur

track!(protobuf::decode_log_prefix(&bytes))? in the line 57 of

Phase::B(bytes) => {
if let Some(bytes) = bytes {
let prefix = track!(protobuf::decode_log_prefix(&bytes))?;
info!(
self.handle.logger,
"[FINISH] LoadLogPrefix: {}",
dump!(prefix.tail, prefix.config, bytes.len())
);
return Ok(Async::Ready(Some(prefix)));
} else {
// 対応するlumpが見つからなかった.
// => ロード中に新しい`LogPrefix`がインストールされた可能性が高いので、
// リトライを行う.
info!(self.handle.logger, "[RETRY] LoadLogPrefix");
Phase::A(LoadLogPrefixIndex::new(self.handle.clone()))
}
}

Reproduce

  1. Make a frugalos cluster with multiple frugalos servers
  2. Kill one server in the cluster (DOES NOT issue frugalos stop)
  3. Restart the server

Log

There are three frugalos servers on 127.0.0.1:14278, 127.0.0.1:14279, and 127.0.0.1:14280.
The following log is produced by the server on 127.0.0.1:14279.
For the sake of explanation, I extract the part of the original log that is related to the node 401 in 127.0.0.1:14279.

[frugalos_raft/src/storage/log_prefix/save.rs:47] [START] SaveLogPrefix: prefix.tail=LogPosition { prev_term: Term(13), index: LogIndex(350) }; prefix.config=ClusterConfig { new: {No
deId("[email protected]:14280"), NodeId("[email protected]:14279"), NodeId("[email protected]:14278")}, old: {}, state: Stable }; prefix.snapshot.len()=22;  @ LocalNodeId("401")

[frugalos_raft/src/storage/log_prefix/load.rs:91] [START] LoadLogPrefixIndex: lump_id=LumpId("00000000000401020000000000000000");

[frugalos_raft/src/storage/log_prefix/load.rs:118] [FINISH] LoadLogPrefixIndex: index=Some(23..24);  @ LocalNodeId("401")

[frugalos_raft/src/storage/log_prefix/save.rs:130] [START] SaveLogPrefixBytes: prefix_index=24..25; prefix_bytes.len()=107;  @ LocalNodeId("401")

[frugalos_raft/src/storage/log_prefix/save.rs:165] [PROGRESS] SaveLogPrefixBytes: index=24; lump_id=LumpId("00000000000401030000000000000018"); bytes.len()=107;  @ LocalNodeId("401")

[frugalos_raft/src/storage/log_prefix/save.rs:153] [FINISH] SaveLogPrefixBytes @ LocalNodeId("401")

[frugalos_raft/src/storage/log_prefix/save.rs:201] [START] SaveLogPrefixIndex: index=24..25; bytes.len()=9; lump_id=LumpId("00000000000401020000000000000000");  @ LocalNodeId("401")

[frugalos_raft/src/storage/log_prefix/save.rs:224] [FINISH] SaveLogPrefixIndex @ LocalNodeId("401")

[frugalos_raft/src/storage/log_prefix/delete.rs:50] [PROGRESS] DeleteOldLogPrefixBytes: index=23; lump_id=LumpId("00000000000401030000000000000017");

[frugalos_raft/src/storage/log_prefix/save.rs:99] [FINISH] SaveLogPrefix @ LocalNodeId("401")

Here I killed the frugalos server and the following log is obtained after restarting the server:

[frugalos_raft/src/storage/log_prefix/load.rs:23] [START] LoadLogPrefix @ LocalNodeId("401")

[frugalos_raft/src/storage/log_prefix/load.rs:91] [START] LoadLogPrefixIndex: lump_id=LumpId("00000000000602020000000000000000");

[frugalos_raft/src/storage/log_prefix/load.rs:118] [FINISH] LoadLogPrefixIndex: index=Some(23..24);  @ LocalNodeId("401")

[frugalos_raft/src/storage/log_prefix/load.rs:141] [START] LoadLogPrefixBytes: prefix_index=23..24;  @ LocalNodeId("401")

[frugalos_raft/src/storage/log_prefix/load.rs:178] [FINISH] LoadLogPrefixBytes: bytes.len()=510;  @ LocalNodeId("401")

Jan 29 16:27:16.711 CRIT Node down: Other (cause; assertion failed: `left == right`; assertion failed: `(left == right)` (left: `2147550283`, right: `0`))
HISTORY:
  [0] at frugalos_raft/src/protobuf.rs:85
  [1] at frugalos_raft/src/protobuf.rs:23
  [2] at frugalos_raft/src/storage/log_prefix/load.rs:61
  [3] at frugalos_raft/src/storage/log.rs:86
  [4] at /Users/yuuya_uezato/.cargo/registry/src/github.com-1ecc6299db9ec823/raftlog-0.4.0/src/node_state/loader.rs:97
  [5] at /Users/yuuya_uezato/.cargo/registry/src/github.com-1ecc6299db9ec823/raftlog-0.4.0/src/node_state/loader.rs:23
  [6] at /Users/yuuya_uezato/.cargo/registry/src/github.com-1ecc6299db9ec823/raftlog-0.4.0/src/node_state/mod.rs:113
  [7] at /Users/yuuya_uezato/.cargo/registry/src/github.com-1ecc6299db9ec823/raftlog-0.4.0/src/replicated_log.rs:260 -- node=Node { id: NodeId("[email protected]:14279"), role: Follower, ballot: Ballot { term: Term(13), voted_for: NodeId("[email protected]:14278") } }
  [8] at frugalos_mds/src/node/node.rs:730
  [9] at frugalos_segment/src/service.rs:265
  [10] at frugalos_segment/src/service.rs:280
, node: 401, module: frugalos_segment::service:282

Explanation of log

Let us check the following lines:

[frugalos_raft/src/storage/log_prefix/save.rs:201] [START] SaveLogPrefixIndex: index=24..25; bytes.len()=9; lump_id=LumpId("00000000000401020000000000000000");  @ LocalNodeId("401")

[frugalos_raft/src/storage/log_prefix/save.rs:224] [FINISH] SaveLogPrefixIndex @ LocalNodeId("401")

# after restarting

[frugalos_raft/src/storage/log_prefix/load.rs:91] [START] LoadLogPrefixIndex: lump_id=LumpId("00000000000602020000000000000000");

[frugalos_raft/src/storage/log_prefix/load.rs:118] [FINISH] LoadLogPrefixIndex: index=Some(23..24);  @ LocalNodeId("401")

After the frugalos process put the range (24, 25) into 00000000000401020000000000000000;
however, we get the range (23, 24) from 00000000000401020000000000000000.
This is due to the presence of journal memory buffer.
Note: LogPrefixIndex put in the manner of embedding put.
Therefore, after restarting the frugalos server, unfortunately, we get the old range (23, 24).

Furthermore, since we issue the following delete:

[frugalos_raft/src/storage/log_prefix/delete.rs:50] [PROGRESS] DeleteOldLogPrefixBytes: index=23; lump_id=LumpId("00000000000401030000000000000017")

This may leads that an unknown value is written to the position in which the lump "00000000000401030000000000000017" had lived.

Finally, after restarting the frugalos server, there is an unknown value in LumpId("00000000000401030000000000000017").

How to fix this issue?

  1. Enforce to sync the journal region into a disk by this method:
    https://docs.rs/cannyls/0.9.2/cannyls/device/struct.DeviceRequest.html#method.journal_sync
  2. If we fail to decode bytes at here, we simply return Ok(Async::Ready(None)).

I think the latter plan is superior in performance to the former one.

クラスタやバケツ構成の取得・可視化

自分の作ったクラスタやバケツが意図通りの構成になっているかを確認するために、
クラスタやバケツ構成を出力するためのコマンドを追加する。

HTML形式など、構造化された形での出力が得られると尚良い。

Inconsistent use of frugalos_config::machine::Segment

track!(self.handle_patch_segment(bucket_no, segment_no, &groups[0]))?;

The above code implicitly assumes that there is only one DeviceGroup but this is contradictory to the definition of frugalos_config::machine::Segment.

Here are some ideas:

  1. Fix the definition of Segment.
  2. Fix handle_patch_segment to accept multiple DeviceGroups.

各種パラメータをファイルから読み込めるようにする

現在のFrugalosでは、内部の挙動に関わる様々なパラメータを「コマンドライン引数」から渡せるようにしている。
一方で、パラメータの個数が相当増えておりコマンドライン引数に全てを書くのが煩雑であること、実行中のfrugalosインスタンスがどんなパラメータのもとで動いているか後から分かるようにすることなどを鑑み、ファイルからパラメータを読み込めるようにしたい。

Can't build a Docker image from `docker/hub/Docker`.

I met the following error caused by the edition of Rust when I built a Docker image from docker/hub/Dockerfile.

$ docker build -t foo/bar docker/hub
(omitted)
   Compiling rustracing v0.1.8
   Compiling thrift_codec v0.1.1
   Compiling sloggers v0.3.1
   Compiling rustracing_jaeger v0.1.9
error: Edition 2018 is unstable and only available for nightly builds of rustc.

error: Could not compile `rustracing_jaeger`.
warning: build failed, waiting for other jobs to finish...
error: failed to compile `frugalos v0.9.0`, intermediate artifacts can be found at `/tmp/cargo-install8fbnMT`

Caused by:
  build failed
The command '/bin/sh -c cargo install frugalos --version $FRUGALOS_VERSION' returned a non-zero code: 101

The error message says rustracing_jaeger uses Edition 2018 but the version of Rust compiler used in Dockerfile is not compatible with Edition 2018.

I've found that this problem can be resolved by using rust:1.31.0-slim as a base image.

Possibility of High CPU utilization

cf. frugalos/raftlog#12

impl Future for DeleteOldLogEntries {
type Item = ();
type Error = Error;
fn poll(&mut self) -> Poll<Self::Item, Self::Error> {
// handle をスレッドを跨がせるのは面倒なのでログは poll() 内で出す
track!(
self.future
.poll()
.map(|result| result
.map(|_| info!(self.handle.logger, "[FINISH] DeleteOldLogEntries")))
)
}
}

The core part of self.future is the following:

.delete_range(deleted_range)

Run integration tests on travis-ci

It's desirable to execute integration tests automatically when a new PR is proposed or a PR is merged into master branch.

In this context, I refer to it/testsuites/* as integration tests.

mds へ PUT が成功し、storage へ PUT が失敗した時に不整合が発生する

mds への PUT と storage への PUT が atomic ではないために発生しうる。例えば、HTTP で PUT をした際にクライアント側から接続を切るとこの状況がよく起きる。切断された際に fibers_http_server の Handler が途中までしか実行されないことによるものだと思われる。HTTP クライアント側からの切断があり、mds のみに PUT された際 frugalos のログにはオブジェクトの状態に不整合が発生していることを示すものは何も出ない点に注意が必要。

また、別のケースとして、replicated_log に対して proposal が実行され、commit 待ちの状態になっている request の fibers::sync::oneshot::Monitored が drop されると mds への PUT が失敗扱いになるが、タイミングによっては drop された proposal(request) の commit が完了しているケースがあり、その場合も mds の処理中にエラーになった扱いとなり storage への PUT が行なわれなくなってしまう。

Add host information to `to_rpc_error`

Frugalos RPC servers kindly reply error messages to the sender that called RPCs when internal errors occur:
https://github.com/frugalos/frugalos/blob/master/frugalos_mds/src/server.rs#L52-L61

However, such error messages do not contain information of hosts who had internal erros.

I think it is useful to add information that holds which host does invoke errors;
and the following function is suitable for this purpose:
https://github.com/frugalos/frugalos/blob/master/frugalos_mds/src/error.rs#L124-L132

Deleting massive objects using object_prefixes leads to some problems.

Whe issue the following command for the frugalos cluster that has 50million objects whose name starts with frugalos

time curl -XDELETE http://192.168.0.1:3000/v1/buckets/bucket0/object_prefixes/frugalos

then it makes the following result:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="http://192.168.0.1:3000/v1/buckets/bucket0/object_prefixes/frugalos">DELETE&nbsp;http://192.168.0.1:3000/v1/buckets/bucket0/object_prefixes/frugalos</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>

real    1m0.075s
user    0m0.004s
sys     0m0.008s

I hope that the following grafana image will be of some help.
2019-01-22 20 44 34

リペアの動作検証

リペアは最後に動作検証をしてから時間が経ってしまったため動作検証を再度行う。
大量のリペアが発生した際に、通常の処理の実行を阻害してしまわないかも要確認。

Publish a new release to GitHub automatically.

https://docs.travis-ci.com/user/deployment/releases

I'm trying the process described in the document.

Plans

Deployment will be executed only when a git tag is pushed to GitHub and the tag starts with a semantic version(for example 0.9.1).

Built binaries will be compiled under CentOS7 to support a lagacy environment and attached to a GitHub releases as an asset.

Travis CI supports only Debian(trusty and precise), so use docker on Travis for building binaries.

Constraints

  • The git tag attached to a released commit MUST satisfy the regular expresion [0-9]\.[0-9].*(Travis CI syntax).

Build error

Congratulation on making OSS!
I was trying build fruglos.
However, I got error message.
The version of rustc is 1.31.0-nightly.
In which versions does this work?
Maybe, Is this stable only?

   Compiling regex v1.0.5                                                                                                                                                                                          
   Compiling jemalloc-ctl v0.2.0                                                                                                                                                                                   
   Compiling url v1.7.1                                                                                                                                                                                            
   Compiling slog-term v2.4.0                                                                                                                                                                                      
   Compiling slog-kvfilter v0.7.0                                                                                                                                                                                  
   Compiling trackable_derive v0.1.1                                                                                                                                                                               
   Compiling serde_derive v1.0.80                                                                                                                                                                                  
error: failed to run custom build command for `liberasurecode v1.0.2`                                                                                                                                              
process didn't exit successfully: `/home/utam0k/ghq/github.com/frugalos/frugalos/target/debug/build/liberasurecode-ae7bd1a0d8ba835c/build-script-build` (exit code: 101)
--- stdout
cargo:rerun-if-changed=build.rs

--- stderr
+ BUILD_DIR=/home/utam0k/ghq/github.com/frugalos/frugalos/target/debug/build/liberasurecode-df87a8d2fae3fa75/out/build
+ git clone https://github.com/ceph/gf-complete.git
Cloning into 'gf-complete'...
+ cd gf-complete/
+ git checkout a6862d1
Note: checking out 'a6862d1'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at a6862d1 Merge branch 'wip-signed-integer-overflow-cppcheck' into 'master'
+ ./autogen.sh
./autogen.sh: 2: ./autogen.sh: autoreconf: not found
thread 'main' panicked at './install_deps.sh failed: exit-code=Some(127)', /home/utam0k/.cargo/registry/src/github.com-1ecc6299db9ec823/liberasurecode-1.0.2/build.rs:28:17
note: Run with `RUST_BACKTRACE=1` for a backtrace.

warning: build failed, waiting for other jobs to finish...
error: build failed                                       

metadataだけをgetするRPCが欲しい

現在のfrugalosでは、GETされると、下記の箇所を通過し、metadata属性がないストレージの場合には実データを見る処理になっている:
https://github.o-in.dwango.co.jp/frugalfs/frugalos/blob/0c73c3ee095c464b728168de6f7f41057129d238/frugalos_segment/src/client/storage.rs#L82

もちろんGETではこの処理で問題がないが、GET_METADATA的な、metadataではない場合は空データを返すRPCが欲しい(REST APIとする必要はない)。
用途は、限定的だが、frugalosを動かして挙動を調べる際にgetのコントロールフローと異なるものがあると便利かもしれないという想定

バケツの削除に対応する

以下が必要:

  • fruglaos_configで保存しているクラスタ状態から、該当バケツを取り除く
  • 該当バケツのセグメントノードを停止する(必須ではない)
  • 該当バケツ用のストレージ内のデータを削除する(cannylsの範囲削除を、全デバイスに発行すれば良い)

frugalos プロセスを停止した時にタイミングよく API にアクセスすると RPC でエラーが発生する

エラーメッセージ

{"kind":"Other","cause":"client service or server is unavailable","history":[{"module_path":"fibers_rpc::rpc_client","file":"/Users/shinya_yamaoka/.cargo/regi
stry/src/github.com-1ecc6299db9ec823/fibers_rpc-0.2.17/src/rpc_client.rs","line":152,"message":""},{"module_path":"fibers_rpc::client_side_handlers","file":"/
Users/shinya_yamaoka/.cargo/registry/src/github.com-1ecc6299db9ec823/fibers_rpc-0.2.17/src/client_side_handlers.rs","line":37,"message":""},{"module_path":"fr
ugalos_api::client","file":"/Users/shinya_yamaoka/.cargo/git/checkouts/frugalos_api-99a511e2400fd938/26f4432/src/client/mod.rs","line":27,"message":""},{"modu
le_path":"frugalos_api::client::mds","file":"/Users/shinya_yamaoka/.cargo/git/checkouts/frugalos_api-99a511e2400fd938/26f4432/src/client/mds.rs","line":226,"m
essage":"frugalos.mds.object.put"},{"module_path":"frugalos_segment::client::mds","file":"frugalos_segment/src/client/mds.rs","line":324,"message":"node=None"
},{"module_path":"frugalos::client","file":"src/client.rs","line":119,"message":""},{"module_path":"frugalosd::server","file":"frugalosd/src/server.rs","line"
:583,"message":""}]}

再現方法

# 環境構築
cargo run -p frugalosd --bin frugalos -- create --id srv1 --data-dir /tmp/srv1 --addr 127.0.0.1:3201
cargo run -p frugalosd --bin frugalos -- join --id srv2 --data-dir /tmp/srv2 --addr 127.0.0.1:3202 --contact-server=127.0.0.1:3201
cargo run -p frugalosd --bin frugalos -- join --id srv3 --data-dir /tmp/srv3 --addr 127.0.0.1:3203 --contact-server=127.0.0.1:3201

# 起動
cargo run -p frugalosd --bin frugalos -- start --data-dir /tmp/srv1 --http-server-bind-addr 127.0.0.1:3101
cargo run -p frugalosd --bin frugalos -- start --data-dir /tmp/srv2 --http-server-bind-addr 127.0.0.1:3102
cargo run -p frugalosd --bin frugalos -- start --data-dir /tmp/srv3 --http-server-bind-addr 127.0.0.1:3103

# デバイス登録
curl -X PUT -d '{"file": {"id": "dev0", "server": "srv1", "filepath":"/tmp/srv1/devices/dev0.lusf"}}' http://127.0.0.1:3101/v1/devices/dev0
curl -X PUT -d '{"file": {"id": "dev1", "server": "srv1", "filepath":"/tmp/srv1/devices/dev1.lusf"}}' http://127.0.0.1:3101/v1/devices/dev1
curl -X PUT -d '{"file": {"id": "dev2", "server": "srv1", "filepath":"/tmp/srv1/devices/dev2.lusf"}}' http://127.0.0.1:3101/v1/devices/dev2

curl -X PUT -d '{"file": {"id": "dev3", "server": "srv2", "filepath":"/tmp/srv2/devices/dev3.lusf"}}' http://127.0.0.1:3101/v1/devices/dev3
curl -X PUT -d '{"file": {"id": "dev4", "server": "srv2", "filepath":"/tmp/srv2/devices/dev4.lusf"}}' http://127.0.0.1:3101/v1/devices/dev4
curl -X PUT -d '{"file": {"id": "dev5", "server": "srv2", "filepath":"/tmp/srv2/devices/dev5.lusf"}}' http://127.0.0.1:3101/v1/devices/dev5

curl -X PUT -d '{"file": {"id": "dev6", "server": "srv3", "filepath":"/tmp/srv3/devices/dev6.lusf"}}' http://127.0.0.1:3101/v1/devices/dev6
curl -X PUT -d '{"file": {"id": "dev7", "server": "srv3", "filepath":"/tmp/srv3/devices/dev7.lusf"}}' http://127.0.0.1:3101/v1/devices/dev7
curl -X PUT -d '{"file": {"id": "dev8", "server": "srv3", "filepath":"/tmp/srv3/devices/dev8.lusf"}}' http://127.0.0.1:3101/v1/devices/dev8

curl -X PUT -d '{"virtual": {"id": "store01", "children": ["dev0", "dev1", "dev2"]}}' http://127.0.0.1:3101/v1/devices/store01
curl -X PUT -d '{"virtual": {"id": "store02", "children": ["dev3", "dev4", "dev5"]}}' http://127.0.0.1:3101/v1/devices/store02
curl -X PUT -d '{"virtual": {"id": "store03", "children": ["dev6", "dev7", "dev8"]}}' http://127.0.0.1:3101/v1/devices/store03
curl -X PUT -d '{"virtual": {"id": "root", "children": ["store01", "store02", "store03"]}}' http://127.0.0.1:3101/v1/devices/root

# バケツ作成
curl -sf -X PUT -d '{"dispersed": {"id": "vod_chunk", "device":"root", "tolerable_faults": 1, "data_fragment_count": 2}}' http://localhost:3101/v1/buckets/vod_chunk

その後にノードを一台落として、すぐに以下のコマンドを実行する。

$ for vid in `seq 10 100`; do; curl --dump-header - -X PUT http://localhost:3101/v1/buckets/vod_chunk/objects/sm$vid -d test; done

動的なデバイス構成変更に対応する

既に登録済みのデバイスをPUTで更新可能にする。
その際に、既にセグメントとマッピングされているデバイスの対応関係が変わった場合には、データのマイグレーションも実施する。

travisのbetaとnightlyのbuild失敗を許す

問題

現在のtravis設定では、warningが1つでもあると失敗(RUSTFLAGS="-D warnings")するようになっている。
一方で、Rustのbetaとnightly channelではwarningレベルでの変更が頻繁に入る。
これらの変更のたびに、stableと、beta/nightlyの両方で1つもwarningを出さないように追従して変更していくことは難しい。

解決方針

stableに対する優先度をbeta/nightlyよりも上げる。前提としてstableでのwarningは一切許さない。
beta/nightlyに対しては、次の二つがすぐに考えられる:

  1. beta/nightlyに対するbuild時のwarningは許し、testは行う。
  2. beta/nightlyに対するbuild時のwarningが発生すると、testは行わない。

Validates erasure coding configuration anywhere before using it.

Now frugalos uses a configuration passed by a user without any validation (the combination of data_fragments = 1 and parity_fragments = 1 is rejected actually). But it is desirable to check a configuration because an incorrect configuration causes internal errors.

再起動時に同期処理を実行する

現状でもRaftの仕組みを使うことで、一定時間切り離されていたノードの再起動時の状態同期を行っている。
ただし、セグメントが保持するオブジェクトインデックスの状態同期に関しては、これで十分ではあるが、オブジェクトデータ(ErasureCodingによって分割されたフラグメント)の同期が上手く行えないことがある。
具体的には、該当ノードの停止と再起動の間に、リーダノードでRaftのスナップショット取得が行われた場合には、それより以前のRaftのログに関しては、再起動ノードに直接的には伝わることがないので、現状の「再起動時にログを辿って処理を再実行する」といった方法で、停止中に追加や削除されたオブジェクトのデータの状態の同期が行えない。
この問題を解決するために、再起動時に各セグメントノードが自分のオブジェクトインデックスの状態とローカルストレージの状態を照会して、必要なオブジェクトデータの追加(リペア)と削除を実施する必要がある。

内部通信時のメッセージフォーマットをProtocolBuffersに統一する

現状はProtocolBuffersとbincodeが混ざっているので、統一しておきたい。
ただし、ProtocolBuffersに完全に満足している訳でもないので、より良いフォーマットがあれば、それを採用したい気持ちも若干ある。
もし将来的に内部通信のgRPC置き換えを目指すなら、ProtocolBuffers一択だが、そうではないならあまり拘る必要もないのかもしれない。

一度に送信するログエントリの数に上限を設

あまり大量のエントリを一度に送信すると送信側・受信側共に負荷が増えてしまう可能性があるので、例えば一度に送信可能なエントリ数は100まで、といったように制限を設けられるようにしたいかもしれない。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.