alibaba / graphar Goto Github PK
View Code? Open in Web Editor NEWAn open source, standard data file format for graph data storage and retrieval.
Home Page: https://graphar.apache.org/
License: Apache License 2.0
An open source, standard data file format for graph data storage and retrieval.
Home Page: https://graphar.apache.org/
License: Apache License 2.0
Is your feature request related to a problem? Please describe.
Implement the Spark Reader to provide functions for reading GraphAr files into Spark DataFrames.
Describe the solution you'd like
The reader should include VertexReader and EdgeReader:
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
We select GraphScope
as GraphAr' first landing system and make that as an example to use GraphAr.
Implement writer of Fragment in GraphScope with GraphAr to support dump the in-memory property graph to GraphAr format files.
Describe the solution you'd like
The process of writer works like:
FragmentWriter
loads the yaml files as Info(GraphInfo, VertexInfo and EdgeInfo), and use the ArrowChunkWriter API of GraphAr to dumps the arrow table to GraphAr format files.Here is a prototype implementation of FragmentWriter
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Use GraphScope as our first landing system.
Describe the solution you'd like
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
run mvn test -Dsuites='com.alibaba.graphar.WriterSuite test edge writer with vertex table and edge table
the offset0 output path is /tmp/edge/person_knows_person/ordered_by_source/offset/part0
but [getAdjListOffsetFilePath]
method of edge info return /tmp/edge/person_knows_person/ordered_by_source/offset0
https://github.com/alibaba/GraphAr/blob/0991064e3f5a5844d453d2743bc2b03dc65fdf14/spark/src/main/scala/com/alibaba/graphar/EdgeInfo.scala#L291
cd spark
mvn test -Dsuites='com.alibaba.graphar.WriterSuite test edge writer with vertex table and edge table
No response
No response
Is your feature request related to a problem? Please describe.
Currently the libraries for GraphAr are only available for C++ and Spark. But many graph processing systems are implemented by other programming languages (like Neo4j by java). We need to provide libraries for more programming languages.
Describe the solution you'd like
Implement library with
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Currently, the GraphAr C++ and Spark libraries supports only several basic data types (including BOOL, INT32, INT64, FLOAT, DOUBLE, and STRING). To serve more scenarios, more built-in data types need to be added in GraphAr libraries.
Describe the solution you'd like
Add more common data types to the GraphAr libraries, such as DATE, TIME, BINARY, STRUCT, MAP, ARRARY, and JSON. Since these types are not always supported by the CSV/ORC/Parquet file types and the C++/Spark standard libraries, careful handling should be taken in each case, e.g., performing the necessary type conversions.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
We select GraphScope
as GraphAr' first landing system and make that as an example to use GraphAr.
Implement a builder of Fragment in GraphScope with GraphAr to support build the in-memory property graph from GraphAr format files.
Describe the solution you'd like
The process of builder works like:
FragmentBuilder
load the yaml files as Info(GraphInfo, VertexInfo and EdgeInfo), and use the ArrowChunkReader API of GraphAr to load chunk files as arrow table(including vertex table, edge table and offset table) and use these table to construct fragment.Here is a prototype implementation of FragmentBuilder
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Maybe we need a widely-used property graph to demonstrate the GAR file format. The ldbc dataset seems to be a good choice.
This issue can be a good first issue
for a developer.
Describe the solution you'd like
Generate formatted files in GraphAr for a property graph including:
csv
for easy to readDescribe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
related to issue #37
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Utilize ProtoBuf to ensure the metadata information of GAR file format to behave exactly the same across different languages.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Currently, the GraphAr Information classes support the users to extend their custom data types base on the info version (#27). However, the Reader/Writer implementations of our libraries do not support to read/write data in user-defined types.
Describe the solution you'd like
Extend the GraphAr libraries to support pass a user-defined parser to the Reader/Writer, to handle the custom data types.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Currently the images in README not show in alibaba.github.io/GraphAr , fix the images link of README
When I use gar library in my project with
target_link_libraries(my_example PUBLIC ${GAR_LIBRARIES})
and build my project
got error:
-larrow_static not found
it looks like the target link libraries has inherited the gar library's dependency.
https://github.com/alibaba/GraphAr/blob/e8edfe38aa776f091dce24f4480fae06827194f4/CMakeLists.txt#L170-L172
DO NOT inherit the dependencies interface of GraphAr in user's project
project(MyExample)
find_package(gar REQUIRED)
include_directories(${GAR_INCLUDE_DIRS})
add_executable(my_example my_example.cc)
target_compile_features(my_example PRIVATE cxx_std_17)
target_link_libraries(my_example PRIVATE ${GAR_LIBRARIES})
No response
No response
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
GraphAr Spark tools are required as a library for generating, loading and transforming GAR files with Apache Spark easy.
Describe the solution you'd like
GraphAr Spark tools consist of the following parts:
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Add some good-first
issues for developer.
In documentation and meta data, the prefix is "./vertex/person/first_name_last_name_gender", but the file path for property chunks is "./vertex/person/firstName_lastName_gender".
documentation: https://alibaba.github.io/GraphAr/user-guide/getting-started.html#property-data
meta data: https://github.com/acezen/gar-test/blob/master/ldbc_sample/csv/person.vertex.yml
file path for the chunks: https://github.com/acezen/gar-test/tree/master/ldbc_sample/csv/vertex/person/firstName_lastName_gender
Below is a high-level road map view for GraphAr to provide a sense of direction of where the project is going. This can change at any point and does not reflect many features and improvements that will also be included as part of the journey along this road map. For more granular detail of what will be included in upcoming releases you can review the project milestones as defined in our Release Process documentation.
Format Spec
C++
Java
Spark
Python
Is your feature request related to a problem? Please describe.
Add an individual page in GraphAr document to introduce the Spark tools.
Describe the solution you'd like
The document would include:
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
To avoid include arrow's header in GraphAr headers, we need to remove the arrow/api.h
include of graph.h and move the related code to graph.cc
Is your feature request related to a problem? Please describe.
Improve the document about the GAR file format introduction to make it more clear. Also, re-organize and improve the examples for helping the users to getting started with GraphAr.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
An important application case of GraphAr is to serve out-of-core graph processing scenarios. With the graph data saved as GAR files in the disk, GraphAr provides a set of reading interfaces to allow to load part of graph data into memory when needed, to conduct analytics.
Since for out-of-core graph processing, disk I/O time usually dominates the overall execution time. It is critically important that the GraphAr C++ library perform efficiently for traversing vertices/edges through high-level graph iterators.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Implement info classes for GraphAr spark tool. the Info
include GraphInfo
, VertexInfo
and EdgeInfo
and align to the classes of c++ SDK.
Describe the solution you'd like
Here is a proposal of the Info classes api:
class Property () {
@BeanProperty var name: String = ""
@BeanProperty var data_type: String = ""
@BeanProperty var is_primary: Boolean = false
}
//methods of Property:
// -- getName: String
// -- getData_type: String
// -- getData_type_in_gar: GarType.Value
// -- getIs_primary: Boolean
class PropertyGroup () {
@BeanProperty var prefix: String = ""
@BeanProperty var file_type: String = ""
@BeanProperty var properties = new java.util.ArrayList[Property]()
}
//methods of PropertyGroup:
// -- getPrefix: String
// -- getFile_type: String
// -- getFile_type_in_gar: FileType.Value
// -- getProperties: ArrayList[Property]
class AdjList () {
@BeanProperty var ordered: Boolean = false
@BeanProperty var aligned_by: String = "src"
@BeanProperty var prefix: String = ""
@BeanProperty var file_type: String = ""
@BeanProperty var property_groups = new java.util.ArrayList[PropertyGroup]()
}
//methods of AdjList:
// -- getOrdered: Boolean
// -- getAligned_by: String
// -- getPrefix: String
// -- getFile_type: String
// -- getFile_type_in_gar: FileType.Value
// -- getAdjList_type: String
// -- getAdjList_type_in_gar: AdjListType.Value
// -- getPropertyGroups: ArrayList[PropertyGroup]
class GraphInfo() {
@BeanProperty var name: String = ""
@BeanProperty var prefix: String = ""
@BeanProperty var vertices = new java.util.ArrayList[String]()
@BeanProperty var edges = new java.util.ArrayList[String]()
@BeanProperty var version: String = ""
}
//methods of GraphInfo:
// -- getName: String
// -- getPrefix: String
// -- getVertices: ArrayList[String]
// -- getEdges: ArrayList[String]
// -- getVersion: String
class VertexInfo() {
@BeanProperty var label: String = ""
@BeanProperty var chunk_size: Long = 0
@BeanProperty var prefix: String = ""
@BeanProperty var property_groups = new java.util.ArrayList[PropertyGroup]()
@BeanProperty var version: String = ""
}
//methods of VertexInfo:
// -- getLabel: String
// -- getChunk_size: Long
// -- getPrefix: String
// -- getProperty_groups: ArrayList[PropertyGroup]
// -- getVersion: String
// -- containPropertyGroup(property_group: PropertyGroup) : Boolean
// -- containProperty(property_name: String) : Boolean
// -- getPropertyGroup(property_name: String):PropertyGroup
// -- getPropertyType(property_name: String): GarType.Value
// -- isPrimaryKey(property_name: String): Boolean
// -- getPrimaryKey(): String
// -- isValidated(): Boolean
// -- getVerticesNumFilePath(): String
// -- getFilePath(property_group: PropertyGroup, chunk_index: Long): String
// -- getDirPath(property_group: PropertyGroup): String
class EdgeInfo() {
@BeanProperty var src_label: String = ""
@BeanProperty var edge_label: String = ""
@BeanProperty var dst_label: String = ""
@BeanProperty var chunk_size: Long = 0
@BeanProperty var src_chunk_size: Long = 0
@BeanProperty var dst_chunk_size: Long = 0
@BeanProperty var directed: Boolean = false
@BeanProperty var prefix: String = ""
@BeanProperty var adj_lists = new java.util.ArrayList[AdjList]()
@BeanProperty var version: String = ""
}
//methods of EdgeInfo:
// -- getSrc_label: String
// -- getEdge_label: String
// -- getDst_label: String
// -- getChunk_size: Long
// -- getSrc_chunk_size: Long
// -- getDst_chunk_size: Long
// -- getDirected: Boolean
// -- getPrefix: String
// -- getAdj_lists: ArrayList[AdjList]
// -- containAdjList(adj_list_type: AdjListType.Value): Boolean
// -- getAdjListPrefix(adj_list_type: AdjListType.Value): String
// -- getAdjListFileType(adj_list_type: AdjListType.Value): FileType.Value
// -- containPropertyGroup(property_group: PropertyGroup, adj_list_type: AdjListType.Value) : Boolean
// -- containProperty(property_name: String) : Boolean
// -- getPropertyGroups(adj_list_type: AdjListType.Value): java.util.ArrayList[PropertyGroup]
// -- getPropertyType(property_name: String): GarType.Value
// -- getPropertyGroup(property_name: String, adj_list_type: AdjListType.Value): PropertyGroup
// -- isPrimaryKey(property_name: String): Boolean
// -- getPrimaryKey(): String
// -- isValidated(): Boolean
// -- getAdjListOffsetFilePath(chunk_index: Long, adj_list_type: AdjListType.Value) : String
// -- getAdjListOffsetDirPath(adj_list_type: AdjListType.Value) : String
// -- getAdjListFilePath(vertex_chunk_index: Long, chunk_index: Long, adj_list_type: AdjListType.Value) : String
// -- getAdjListDirPath(adj_list_type: AdjListType.Value) : String
// -- getPropertyFilePath(property_group: PropertyGroup, adj_list_type: AdjListType.Value, vertex_chunk_index: Long, chunk_index: Long): String
// -- getPropertyDirPath(property_group: PropertyGroup, adj_list_type: AdjListType.Value) : String
// -- getVersion: String
Is your feature request related to a problem? Please describe.
Current README of GraphAr is a little clumsy and incomplete. It is hard to help user/developer to know What is GraphAr
.
Describe the solution you'd like
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
According to GAR file format, the global index of vertex is important in GAR file format and it is continuous and unique.
The original data source for spark(e.g vertex dataframe and edge dataframe) usually not contain such column.
IndexGenerator
is a helper object that help GraphAr to generate index of vertex for vertex dataframe and edge dataframe.
Describe the solution you'd like
Here is a API proposal of IndexGenerator
object IndexGenerator {
//helper methods for vertex DataFrame
def constructVertexIndexMapping(vertexDf: DataFrame, primaryKey: String): DataFrame = {
//return a DataFrame contains two columns: vertex index & primary key
}
def generateVertexIndexColumn(vertexDf: DataFrame): DataFrame = {
//add a column contains vertex index
}
//helper methods for edge DataFrame
//generate index from vertex mapping
def generateSrcIndexForEdgesFromMapping(edgeDf: DataFrame, srcColumnName: String, srcIndexMapping: DataFrame): DataFrame = {
// join the edge table with the vertex index mapping for source column
}
def generateDstIndexForEdgesFromMapping(edgeDf: DataFrame, dstColumnName: String, dstIndexMapping: DataFrame): DataFrame = {
// join the edge table with the vertex index mapping for destination column
}
def generateVertexIndexForEdgesFromMapping(edgeDf: DataFrame, srcColumnName: String, dstColumnName: String, srcIndexMapping: DataFrame, dstIndexMapping: DataFrame): DataFrame = {
// join the edge table with the vertex index mapping for source & destination columns
}
//generate index by sorting the src/dst column
def generateSrcIndexForEdges(edgeDf: DataFrame, srcColumnName: String): DataFrame = {
// construct vertex index for source column
}
def generateDstIndexForEdges(edgeDf: DataFrame, dstColumnName: String): DataFrame = {
// construct vertex index for destination column
}
def generateSrcAndDstIndexUnitedlyForEdges(edgeDf: DataFrame, srcColumnName: String, dstColumnName: String): DataFrame = {
// construct vertex index for source & destination columns together
}
}
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
The graph data migration between NebulaGraph and GraphAr could be an important application of GraphAr. This can be implemented based on the NebulaGraph Spark connector and the GraphAr Spark library, including reading graph data from NebulaGraph to generate GAR files, and reading from GraphAr to create/update instances in NebulaGraph.
Describe the solution you'd like
Please refer to the integration with Neo4j (#107).
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Currently, CSV chunk files generated by c++/spark writer does not contains the header row and it would lost schema information of data. We should include the header row when generate CSV chunk files.
Describe the solution you'd like
enable the include_header
option of C++ chunk writer , refer from: https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N5arrow3csv12WriteOptions14include_headerE
enable the header
option in spark dataframe writer. refer from: https://spark.apache.org/docs/latest/sql-data-sources-csv.html
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Use doc URL, instead of raw file link to repo.
Is your feature request related to a problem? Please describe.
Since the C++ library is the first library support by GraphAr, its code is put directly in the root of source. For extending other language library easily, we need to reorganize the code directory like:
.
โโโ cpp (c++ library code)
โโโ docs
โโโ examples
โโโ spark
โโโ thirdparty
Describe the solution you'd like
Is your feature request related to a problem? Please describe.
LDBC provides a synthetic graph generator running on Spark (https://github.com/ldbc/ldbc_snb_datagen_spark). We can utilize the GraphAr spark library to integrate with this graph generator, for dumping the generated graph data into GraphAr files.
Describe the solution you'd like
Refer to the API reference of Reader/Writer and graph-level interface of the GraphAr Spark library. The integration with neo4j Spark connector (#107 ) can also help.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
The version
attribute of infos(graph, vertex, edge) now is only a number. Actually it can contain the implicit information that the property data types support with the version. With the version growing, the supported data types could be extended. Likes:
version 1 -> support bool
,int32
, int64
, float
, double
, string
version 2 -> support bool
,int32
, int64
, float
, double
, string
, date32
Describe the solution you'd like
Use string instead of number as version, something like User Agent
message of browser.
version
example:
gar/v1
gar/v2
gar/v3 (user_define1, user_define2)
# suppose the version 3 or higher support user define type.
Add a VersionMeta
class to keep record different version supported data types and do the version string parse job.
If the yaml contains the data type that the yaml version not support, raise error to user.
Code of conduct help establish expectations for behavior of the project's participants, and facilitate healthy, constructive community behavior.
We should add a document to the root of the git repository to direct interested individuals to the CoC.
Is your feature request related to a problem? Please describe.
Optimize the Spark Reader to support reading multiple chunks in parallel for better performance, and maintain the relative order of the chunks in resulting DataFrame.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Implement writer of Spark tool to provide functions to generate GraphAr format files from hive table.
Describe the solution you'd like
It's better to read hive table as a spark DataFrame and use operators of DataFrame to generate the files.
The writer should include VertexWriter
and EdgeWriter
.
VertexWriter
provide functions to generate chunk files of property group base on the vertex info user definedEdgeWriter
provide function to generate chunk files of adj list/offset/property group base on the edge info that user definedDescribe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
The GraphAr's chunk files could be stored in ORC, parquet or CSV now. We can support more builtin file formats like
Json, hdf5 and avro to enhance the capacity of GraphAr and satisfy different requirements for file formats.
Describe the solution you'd like
Support more file types by extending the metadata information and implement related reading/writing functions with help of
arrow or other third-party libraries.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Currently, the low-level writers (VertexPropertyWriter
and EdgeChunkWriter
) only support to write Arrow tables, thus for the users, it is required to construct such tables before writing (e.g., writing the PageRank results saved in a std::vector into GAR files). For high-level writers (VerticesBuilder
and EdgesBuilder
), it is required to construct the Vertex
/Edge
firstly, which is the internal high-level data structure in GraphAr
Describe the solution you'd like
We are proposed to provide more built-in writing methods in C++ Writer SDK, to support additional data structures besides Arrow tables and GraphAr Vertex
/Edge
. A possible solution is to use containers from the STL, as Boost Graph Library does, including:
Is your feature request related to a problem? Please describe.
Add an github action to simplify the release process of GraphAr and add release tutorial for maintainer how to cut a version.
Describe the solution you'd like
simplify the process with tool like action-automatic-releases
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Since GraphAr already integrate to v6d to enable load graph from graphar. We should add some easy-to-use api to GraphScope client for users to easy to archive/load graph to/from graphar
Is your feature request related to a problem? Please describe.
Currently the examples of GraphAr are implement like unit test and they are not intuitive for user or developer beginner to know how to use GraphAr as example.
We need to revise the implement and make them more like an example and show case.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
related to #37
Is your feature request related to a problem? Please describe.
In real use cases, the graph data is usually continuously changing, including adding, deleting, and modifying vertices or edges. As part of incremental management functions, we intend to extend the GraphAr Spark tools to support adding new rows/columns conveniently and efficiently.
Describe the solution you'd like
Support to add new rows/columns for vertex/edge table and dump the new data by generating new GAR files or appending/rewriting existing GAR files.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Release version v0.1.0
Describe the solution you'd like
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
The GraphAr Spark tools can be applied to the scenarios where the graph format need to be transformed. It can also be used when taking GraphAr as data sources to execute SQLs or do graph processing. We can add some examples to show the use cases.
Describe the solution you'd like
Add examples that utilize the Spark tools to:
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
as titled.
Is your feature request related to a problem? Please describe.
Base on the graph information file design (example), the AdjList
of graph is containing the informations about align
, edge chunk file type, property_groups of edge. But in C++ library, the AdjList is only an enum type, and the other informations are stored in EdgeInfo
with map.
This is not align to the yaml file design. To address the problem, maybe we should add a middle structure AdjListInfo
between PropertyGroup
and EdgeInfo
to keep track of the adj list information of graph.
Describe the solution you'd like
The AdjListInfo
could be like: (just a proposal)
class AdjListInfo {
FileType file_type_;
std::string prefix_;
std::vector<PropertyGroup> property_groups_;
public:
// Constructor
AdjListInfo(FileType file_type, std::string prefix);
// some add methods
void AddPropertyGroup(pg);
// some getter methods
FileType GetFileType() const;
}
Then, use AdjListInfo
objects as member variables to update the implementation of EdgeInfo
.
Add these:
Is your feature request related to a problem? Please describe.
GraphAr supports the file formats of CSV, ORC and Parquet currently, and it's going to support more file types such as json, hdf5 and avro. For enhancing the efficiency of reading/writing and storing of the data, the features of different file formats should be considered and fully utilized, for example, applying the most appropriate compression and encoding scheme to the data, or enable filter pushdown to improve query performance.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.