ofiwg / libfabric Goto Github PK
View Code? Open in Web Editor NEWOpen Fabric Interfaces
Home Page: http://libfabric.org/
License: Other
Open Fabric Interfaces
Home Page: http://libfabric.org/
License: Other
From the OFIWG F2F, there was a request to allow an application to indicate that a new connection request would be handed off to another process (forked or otherwise). The idea is that the provider could arrange its data structures accordingly, so that the new connection could successfully be migrated to another process.
Allow an app to open interfaces that are associated with a specific object. Move open_if to struct fi_ops.
The verbs provider currently handles CM connect calls synchronously. Convert this to asynchronous operation.
struct fi_info contains a source and destination address, which correspond to an endpoint address. The fi_getinfo call takes a node and service parameter, which represent either the source or destination address. Determine if the src_addr and dest_addr fields are needed. The actual addresses can be retrieved from the endpoint getname/getpeer calls, once an endpoint has been created. With the librdmacm, the addresses were used to identify a local device, but that can be determined through the fi_info::domain_name field.
Allow an application to register memory for access from a specific remote address.
Expand the EQ API to allow an application to register for specific types of events, including fabric and provider specific events -- e.g. remote node available, port up/down, topology change, congestion notification, receive buffer consumed, etc.
Identify EQs as either belonging to a control or data domain. Data EQs are equivalent to CQs -- used to report data transfer completions -- and are optimized for performance, expected to be implemented in HW. Control EQs will be used to report all other events, and will trade off performance for ease of use by the app.
Register memory exposes a memory region for access by remote processes immediately after the registration completes. The region is open to access by all endpoints associated with a domain. Define a mechanism by which the region is 'closed' for access until it is attached to a specific endpoint.
The auth_key and auth_keylen fields of struct fi_info are intended to be used for job authorization. Determine if there's a use case for these fields as defined (since they come from the applications). If not, remove them and determine what, if any mechanism is needed by libfabric to support job isolation.
Provide a way for an application to read whether an error has occurred for completions that simply increment a counter.
Triggered requests are somewhat defined for generating a data transfer when an event occurs. Investigate whether it makes sense for a triggered request to take other actions, such as inserting an item into an EQ. Review application requirements to see if this makes sense, versus using an existing mechanism, such as selective event generation.
From the OFIWG F2F - a decision was made to delay any extension support to the existing verbs interfaces. Remove the verbs and rdma cm code bases from libfabric, and instead use whatever version may be installed on the current system. This will also help prevent conflicts between distro, OFED, and/or vendor versions of the libraries and that used by libfabric.
Applications need some sort of hint regarding the optimal way to use a provider, in the absence of application usage hints. Document a method by which a provider can indicate the best method for using their hardware. This may be as simple as returning fi_info structures in priority order.
There needs to be a mechanism for applications to enable/disable flow control, along with events defined when flow control or buffer overruns occur.
We can provide stronger type checking in the data transfer APIs by using a typedef for remote addresses. E.g. typedef uint64_t fi_addr_t. The output of AV insert would be type fi_addr_t. The data transfer calls, sendto, writeto, etc. would accept this type. This would force the use of an AV for all unconnected endpoint types. It also guarantees that 64-bits of addressing data is available to the provider to return from AV insert, making it simpler to encode raw address data.
Verify and document that the output from EP getname may be used with AV insert. This allows for an application to do an all to all exchange of addresses and insert the results into an AV table.
Support the notion of a shared receive queue. Define a receive only endpoint that may be attached to active endpoints, so that data buffers can be shared among multiple connections.
An EQ used to report control related events (e.g. CM requests, memory registration, AV insertions, etc.) must indicate the type of event that was read. Either we need a generic event structure for this purpose, or the EQ read must return the event through a separate parameter. Control EQs may need to return a single event per read.
An EQ may be associated with a domain or a fabric object. (The fabric EQ may be modified to be unassociated.) When binding an EP to an EQ, there's no way to know if the EQ was associated with the domain or fabric object. This can result in a provider attempting to dereference a fabric EQ as a domain EQ, resulting in a crash.
Update EQ API to allow an application to insert user defined events onto an EQ. This will be an optional feature for data EQs, but supported on control EQs.
Document the use of raw and packet endpoint types. Define flow steering mechanisms for packet endpoints. The flow steering defined in libibverbs is a reasonable starting point
Add a call to retrieve one or more addresses store in an AV. This may be useful for apps for debugging purposes, or for extracting addresses from an AV in order to share them with another process.
Other data formats may be more concise than iovec for referencing multiple buffers. For example, strided operations may be able to point to a buffer, a size, an offset between buffers, and the number of buffers using a single structure, rather than chaining together a large set of SGEs. Incorporate 'expanded iovec' support into the APIs.
Add datagram message endpoint support to the socket provider.
Expand the domain attributes to include the max SGL supported by the provider.
Add inject calls to the atomic operations, similar to what's available for RMA.
The endpoint attribute structure should be expanded to expose the size of the underlying queue. Now that the EP attribute exist, we can simplify things for the user and avoid needing to use control interfaces to override the default values. But default values should still be available to the user, with the actual values returned when an endpoint is created.
Add reliable datagram message (RDM) endpoint support to the socket provider.
The struct fi_info returned from fi_getinto may only be used once. Redefine the API to allow the fi_info to be used multiple times. This will require changes from the verbs provider to handle the 'data' field differently. Note that the data field is also used when establishing a connection request.
fi_sync was intended to allow applications to block until all data transfers of a specific type have completed. It's actually exceptionally hard to implement over all existing hardware. Remove it. Applications can use the primitive EQ events or counters to wait until all necessary operations have completed.
From the OFIWG F2F, data structure versions will be indicated using a version parameter to fi_getinfo. The version parameter will indicate the version of the set of data structures known to the application. libfabric will adjust its behavior accordingly, based on the data structures and fields known to the app. This mechanism will replace the field/mask concept in the current data structure scheme.
From OFIWG F2F, interface structure versioning will be done using a size field within the struct. A query method (or static inline or define) will indicate if a specific interface is available.
Allow a provider to optionally support EQ readfrom. It may not be possible for a provider to implement readfrom efficiently, compared to the app carrying the source address in the message. Also figure out how to write the first sentence without using a split infinitive.
The data flow endpoint attribute is basically defining sessions. Either fully define it or remove it from the API until it can be defined. It may be possible to remove data flow in favor of fully defined sessions.
There are several fi_ep_attr fields which are size_t. Determine which, if any, should be ssize_t, so that a negative value can be used to indicate that the provider can select the maximum or best default value.
The threading level in the configure.ac file is no longer relevant. Remove it.
RDM = reliable datagram message. This is based on the socket type of a similar name. Decide whether this is acceptable or if we should rename this to RUM - reliable unconnected message.
Define a mechanism to request and report the maximum size of remote EQ data (i.e. immediate data). Need to decide if the size is reported in bits or bytes.
Define struct fi_fabric_attr, similar to fi_ep_attr and fi_domain_attr.
Endpoints are vaguely defined. Refine the endpoint definition to indicate that an endpoint represents a session level address (using the OSI model). As such, multiple endpoints may share the same transport and network address, if multiple sessions are defined. Expand the API and data structures to handle this.
Eliminate all reference in the man pages to close calls other than fi_close. Do not define object specific close calls, such as fi_ep_close.
There's overlap between endpoint attributes and setopt options. Eliminate any unnecessary duplicate options.
In practice, FI_RANGE is difficult to use and harder to implement. Define an alternative method, FI_SYMMETRIC, instead, which allows an app to indicate that addresses on remote systems use the same transport addresses (port numbers), with an equal number of processes placed per node. This will simplify the implementation and the application usage of the interface, plus enable optimal storage of the data.
The tagged message API uses a tag and a mask to identify messages. However, applications are really using the tag space as a collection of independent fields. For example, 32-bits may represent the message, 16-bits the source address, and 16-bits as a group address. By exposing the use of the fields to a provider, it enables additional optimizations within the provider. For example, a provider could maintain separate queues, to greatly improve search times. Define a mechanism by which the app and provider can communicate the number of fields and their size.
There's not a clear link between what's needed to implement the fabric object and related interfaces and the providers. Maybe we need a fabric provider? Or providers need interfaces that allow the framework to implement fabric interface support in a generic fashion.
There are 3 identified cases where an operation can complete on the initiator side of a data transfer. The first is when the data buffer is reusable. The second is when the transfer has been ack'ed by the remote side (FI_REMOTE_ACK). The third is when the remote side has placed the data into a fault domain outside of the fabric hardware (such as memory, NVM, or hard disk) -- FI_REMOTE_COMPLETE. Applications may have use for any of these notification types. See if remote ack and remote complete both need to exist, and if so, define them.
Add message (MSG) endpoint support to the socket provider.
Enhance the AV API to accept a name and service parameter for insertion, to align with fi_getinfo.
Figure out what this note means: "remove wait from EQ attr", and do it. Next time provide more context to my notes.
The endpoint attribute msg_tag_value is defined as a maximum value. Redefine as a number of bits. This is needed to align with defining the tagged bits as fields, rather than generic bits.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.