joaofelipe / apted Goto Github PK
View Code? Open in Web Editor NEWPython APTED algorithm for the Tree Edit Distance
License: MIT License
Python APTED algorithm for the Tree Edit Distance
License: MIT License
My tree is include dict and list, for example: data1 = {"a": [{"b": []}, {"c": [{"h":[]}]}]}, data2 = {"a": [{"b": [{"m":[]}]}, {"c": [{"h":[{"k":[]}]}]}]}, the edit distance always 1. The code:
from apted import APTED
from apted.helpers import Tree
data1 = {"a": [{"b": []}, {"c": [{"h":[]}]}]}
data2 = {"a": [{"b": [{"m":[]}]}, {"c": [{"h":[{"k":[]}]}]}]}
tree1 = Tree(data1)
tree2 = Tree(data2)
apted = APTED(tree1, tree2)
res = apted.compute_edit_distance()
I am trying to use this library to compare ASTs of python code (specifically, I'm comparing an incorrect solution to a problem with several correct solutions, to pick a closest one).
I created a custom Config to specify how to get children and node names out of an ast node(below). This seems to work, except that I sometimes get negative edit distances; and other times I get edit distances that are clearly too small.
Below is a complete example, including all my logic for dealing with python ASTs. It returns -8 for the edit distance. Please advise.
import ast
from apted import APTED, Config
# for the purpose of tree differencing, we consider both the ast.Node objects and various literals representing values, ids, etc. to be nodes in the AST.
# therefore, we assume that all nodes that are passed to functions in astConfig are either ast.Node objects, or literals of some kind
class AstConfig(Config):
def get_node_name(node):
if hasattr(node, '_fields'):
# this is an ast.Node object - return its type
return type(node).__name__
else:
# this must be a literal, e.g. variable name, const value, etc.
return str(node)
def children(self, node):
if hasattr(node, '_fields'):
# this is a Node object. It has all its children listed in _fields.
# a child could be:
# a) a literal, e.g. the name of the variable
# b) a list of nodes, e.g. list of expressions in the body of a function
# c) a direct child node.
# we combine these into one flat list.
children = []
for childname in node._fields:
child = getattr(node, childname)
if isinstance(child, list):
children.extend(child)
else:
children.append(child)
return children
else:
# this node itself is a literal; it has no children.
return []
def rename(self, node1, node2):
name1 = AstConfig.get_node_name(node1)
name2 = AstConfig.get_node_name(node2)
return 1 if name1 != name2 else 0
# test apted+ast:
p1 = ast.parse('''def helloWorld():
print ('Hello World!')''')
p2 = ast.parse('''def helloWorld():
return 'Hello World!'
''')
apted = APTED(p1, p2, AstConfig())
print(apted.compute_edit_distance())
print(apted.compute_edit_mapping())
The mappings do seem to be reasonable, and picking the smallest edit distance does seem to choose the closest correct solution; so whatever is going on, it at least seems to be consistent across different pairings.
How do you get the "Maximum" possible TED ?
The TED is integer.
I want to get in some way the "maximum" TED between two trees.
The reason is I need a score with value between 0 and 1 i.e. score = distance/max
Is this feasible ?
Hi there
Thanks for your excellent repo!
My question is that if I want to normalize the computed tree edit distance into a certain range (e.g., between 0 and 1), what is the best way to do that? Or more mathematically, what is the upper bound of tree edit distance between arbitrary two trees?
Thanks in advance:)
Does it work with python 2.7 ?
Hello!
Thank you for making this! Do you happen to have a programmatic hello world example?
I want to try the algorithm, but I'm not sure how to go about creating a tree in python
to hand off to your edit distance calculation.
Thank you :)
In the docs you say : class CustomConfig(Config):
class CustomConfig(Config): ...
but my tree is :
class Tree(list): ....
how do I solve that ?
Example :
['root',
[['comam', [['qecer', [['dsvek', [['t', ['t']], ['h', ['h']]]], ['c', ['c']]]], ['e', ['e']]]],
['oflgd', [['yvsro', [['dptbp', [['d', ['d']], ['d', ['d']]]], ['tjbpc', [['c', ['c']], ['g', ['g']]]]]], ['f', ['f']]]],
['fqtgd', [['yvsro', [['dptbp', [['d', ['d']], ['d', ['d']]]], ['tjbpc', [['c', ['c']], ['g', ['g']]]]]], ['t', ['t']]]],
['dablv',
[['yvsro', [['dptbp', [['d', ['d']], ['d', ['d']]]], ['tjbpc', [['c', ['c']], ['g', ['g']]]]]], ['qecer', [['dsvek', [['t', ['t']], ['h', ['h']]]], ['c', ['c']]]]]]]]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.