Comments (6)
A Container
has two layers: the interface layer composed of its own methods, properties, and attached aggregators, and the backend layer which handles the details of storing the state of the object in a serialized form (ContainerFile
and its subclasses). The reason for this separation is to make it easy to write interface elements like aggregators without having to think too carefully about atomicity of operations to the file, read and write locks (sensitive!), etc. Basically, the interface elements handle the details of how information from the user is obtained and returned, while the containerfile elements handle the details of storage to what amounts to a small database. I think it's important to keep this separation of roles for the sake of maintainability/sanity.
Even for something as simple as Tags
, the interface methods do work that the ContainerFile
doesn't (and shouldn't) have to. One example is Tags.add()
, which can handle tags that are entered as lists of lists cleanly (perhaps rare), then feeds a flattened list of tags to the ContainerFile
. The ContainerFile.add_tags()
method for adding tags is as thin as it can be while fully completing the operation of adding tags nonredundantly, while Tags.add()
includes machinery for making life convenient for a user that has better things to think about than whether their list of strings is flat.
I suppose in short: interface elements implement user convenience, while containerfile elements implement sensitive atomicity of serialization.
I'm not 100% sure on how this would work with all the file decorators, maybe generic Containerfile.read_table and Containerfile.write_table methods which Aggregators could use...
In principle, ContainerFile
contains methods that give complete access to its elements in a safe way; that is, use of them won't result in a ContainerFile
in a damaged or inconsistent state. Also, the schema for the ContainerFile
tables are defined within it, and these have to remain clearly defined, preferably in one place like they are now. Otherwise it becomes difficult to guarantee containers produced with one version of MDS will work with a new version.
from mdsynthesis.
Because many aggregators can't really function without a Container
instance attached, I wouldn't worry about testing them as standalone units. Instead, I would treat them as part of the Container
class they are found associated with. They have to function in that context anyway in actual use. :D
As for the ContainerFile
and its subclasses, those should work completely independently of any interface components. We should eventually build tests for each method of ContainerFile
and its subclasses, even if much of this code is "covered" by testing at the level of the interface (Sim
, Group
).
from mdsynthesis.
I wanted to create a fake container file, which would essentially store everything in memory instead.. that way we could isolate the Aggregators on their own. So you could check that your duplicate management etc is all working, and any errors are occuring as you try and save, but the correct stuff is being sent to be saved?
I understand the line between ContainerFile and Tags, but things like add
are confusing because the args are handled in Tags, and then the duplicates are handled in ContainerFile. Wouldn't it be cleaner to have ContainerFile just serving up appropriate handles for users (Aggs)? Looking through, all the Aggregators ever use is .handle
from mdsynthesis.
I wanted to create a fake container file, which would essentially store everything in memory instead.. that way we could isolate the Aggregators on their own. So you could check that your duplicate management etc is all working, and any errors are occuring as you try and save, but the correct stuff is being sent to be saved?
I think this would be necessary if ContainerFile
and Container
needed to talk to each other (bidirectional codependency), but because containerfiles work just as well without an attached Container
, it's only necessary to write tests for the ContainerFile
and then test Container
with the real ContainerFile
backend. Faking a containerfile would create the need to write tests for the fake: why do that when you can test the production one, which we have to do anyway?
I understand the line between ContainerFile and Tags, but things like add are confusing because the args are handled in Tags, and then the duplicates are handled in ContainerFile.
Here's how I reason the separation: nonredundancy is an inherent feature of a set of tags, because we expect tags to work like a set (we only care about whether something is present). It's therefore containerfile's job to make sure that its stored table of tags is nonredundant. Not including this at the level of the containerfile means that any interface to manipulate tags must also reinvent removal of redundancy, and any accessors to tags must do the same to compensate for lapses on the part of any adders. It's a mess.
As for making things easy for the user by not caring how funky their lists of tags they want to add look, that's not the containerfile's problem. That's a convenience issue that was decided at the interface level, and the flexibility could have been left out. For Tags.add()
, we could make it so that it only takes tags presented as individual arguments or only as a single, unnested list, but that's a choice that the underlying concept of what tags are and how they should be stored isn't concerned with.
Wouldn't it be cleaner to have ContainerFile just serving up appropriate handles for users (Aggs)? Looking through, all the Aggregators ever use is .handle
I don't think it would be cleaner, because the current implementation separates interface from serialization. One could completely rewrite the interface of Tags
without needing to write a bit of pytables code, which is a huge plus. If instead we exposed ContainerFile.add_tags()
as the method for adding tags, changing how flexible this is at the interface level means tampering around the underlying serialization scheme. It also means that what should be a low-level interface for adding tags to the ContainerFile
can itself start collecting bloat.
I think for Tags
this scheme looks like overengineering, but for something like Universes
or Selections
I think it's easier to see how this organizational scheme keeps the structure of a Container
from becoming a mess. An even better example is Members
, which I recently refactored to share an interface with Bundle
, an in-memory version of Members
. Members
uses a ContainerFile
's member table as its backend, while Bundle
uses a _BundleBackend
instance with a structured array instead. That might be of interest to you.
On that note, perhaps I'll change the _containerfile
attribute to _backend
for all Aggregator
classes to reflect the separation between interfaces and the underlying backend.
from mdsynthesis.
Ok if you're happy then I'll leave it. I'll look at another way of testing Tags.
from mdsynthesis.
This is a good discussion to have, since the underlying design determines future possibilities. For something like Tags
, I think it's important to make sure it works at the Container
level (interface), for all Container
subclasses (this is already done in test_containers.py, in lines 62 to 96 at the moment) and also at the storage/backend level, the only backend being ContainerFile
at the moment. Those tests are not yet written, but we will need them.
I changed the _containerfile
references in all containers and aggregators to _backend
in 86e0732. This reflects the agnosticism of Sims/Group/Tags/etc. to the underlying backend they are talking to, which need not even be writing to a file. For example, a backend could be written that stores everything in a MySQL database, a flat-file, in-memory, etc, but a Sim
that uses this backend will have exactly the same interface as one that uses the default pytables HDF store.
from mdsynthesis.
Related Issues (20)
- Failing in adding a Universe creates bad state HOT 2
- failed to store DataFrame with column multi-index HOT 4
- One Sim, one Universe: remove multiple Universes functionality HOT 15
- first release HOT 12
- Setting Sim.universe to universe with chain reader doesn't work HOT 1
- Support auxilliaries in Universe definitions HOT 3
- create sim from existing Treant HOT 4
- handle folders with several simulations HOT 2
- The current release doesn't work with MDAnalysis 0.16.0dev HOT 3
- make a 1.0 release? HOT 11
- Remove Sim treanttype HOT 3
- Move to MDAnalysis org HOT 9
- Removing all tags/categories HOT 2
- python 3 release? HOT 3
- add conversion script from 0.8 to 1.0 HOT 3
- Error when using sim.tags/sim.categories HOT 3
- So much faster! HOT 1
- Is it possible to store in data or somewhere else files or paths to them? HOT 8
- Two different documentations online? HOT 1
- Order of Treants in a Bundle HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mdsynthesis.