Comments (5)
Hi Marco,
It seems that there is a bug on the SubSetTreeKernel.
Since t1=t2, we expect that SSTK(t1,t2) = SSTK(t1,t1) = SSTK(t2,t2) = 5.
I am investigating on this bug, I will fix it asap.
Best,
Simone
from kelp-full.
Hi Marco,
I should have solved the bug (which was actually affecting both STK and SSTK, although with your toy example it was visible only on the SSTK). After additional sanity check we will release the new version. For the moment, please use the development branch of the last version of KeLP to get the problem solved.
Cheers,
Simone
from kelp-full.
Hello @SimoneFilice ,
thank you very much for your reply and solution.
I'm still struggling to understand some results, but I think it's due to something that I still can't get in the definition of the different kernel trees.
I'm considering the definitions reported in KeLP website (these) and I'm executing these lines of code:
float lambda = 1f;
float mu = 1f;
int terminal_factor = 1;
float threshold = 0.001f;
TreeRepresentation t1 = new TreeRepresentation();
t1.setDataFromText("(a (b) (c (d)))");
TreeRepresentation t2 = new TreeRepresentation();
t2.setDataFromText("(a (b) (c (d (e))))");
SubSetTreeKernel subSetTreeKernel = new SubSetTreeKernel(lambda, "tree");
SubTreeKernel subTreeKernel = new SubTreeKernel(lambda, "tree");
PartialTreeKernel partialTreeKernel = new PartialTreeKernel(lambda, mu, terminal_factor, "tree");
System.out.println("Similarity t1-t2 SubTreeKernel res = " + subTreeKernel.kernelComputation(t1, t2));
System.out.println("Similarity t1-t2 SubSetTreeKernel res = " + subSetTreeKernel.kernelComputation(t1, t2));
System.out.println("Similarity t1-t2 PartialTreeKernel res = " + partialTreeKernel.kernelComputation(t1, t2));
The output I get is:
Similarity t1-t2 SubTreeKernel res = 3.0
Similarity t1-t2 SubSetTreeKernel res = 4.0
Similarity t1-t2 PartialTreeKernel res = 10.0
I can understand why I get 10 as PartialTreeKernel similarity result, but I can't figure out the other results.
I'm reasoning as follows:
SubTree(s) I caculate SubTrees as "nodes with their complete descendancy":
subTrees for t1
(a (b) (c (d))), (c (d)), (b), (d)
subTrees for t2
(a (b) (c (d (e)))), (c (d (e))), (d (e)), (b), (e)
so I expect to get "1" as similarity value between t1 and t2
SubSetTree(s) I caculate SubSetTrees as "nodes with either all their children or none of them":
subSetTrees for t1
(a (b) (c (d))), (c (d)), (a (b) (c)), (a), (b), (c), (d)
subSetTrees for t2
(a (b) (c (d))), (c (d)), (a (b) (c)), (a), (b), (c), (d), (a (b) (c (d (e)))), (c (d (e))), (d (e)), (e)
so I expect to get "7" as similarity value between t1 and t2
As partialTees (considered as "nodes with their partial production and descendancy") I expect to find all subSetTrees plus:
additional partialTrees for t1
(a (b)), (a (c)), (a (c (d))
additional partialTrees for t2
(a (b)), (a (c)), (a (c (d)), (a (c (d (e))))
so I expect to get "10" as similarity value between t1 and t2
Am I wrong?
Thanks for your attention.
Best regards
Marco
from kelp-full.
Hi Marco,
I'm sorry for the late answer, but solving the issue required a bit. Again, please use the development branch until we release a new version of KeLP.
Let's clarify how the Subtree Kernel and SubSet Tree Kernel work in their standard setting. They both search for common productions in trees. This means that individual nodes are not considered valid matching fragments.
So, given the trees t1=(a (b) (c (d))) and t2=(a (b) (c (d (e)))) , we have the following kernel computations:
SUB TREE KERNEL
STK(t1,t2) = 0, as no common productions occur.
STK(t1,t1) = 2, and the matching fragments are:
- (a (b) (c (d)))
- (c (d))
SUBSET TREE KERNEL
SSTK(t1,t2) = 3, and the matching fragments are the followings:
- (a (b) (c))
- (a (b) (c (d)))
- (c (d))
SSTK(t1,t1) = 3, and the matching fragments are:
- (a (b) (c))
- (a (b) (c (d)))
- (c (d))
Please consider that by default the KeLP implementations of the STK and SSTK also perform a matching between tree leaves. In constituency trees where leaves are words, this corresponds to combine the tree kernel with a linear kernel on a Bag-of-Word representation. To disable this additional analysis you need to use the setIncludeLeaves method:
subTreeKernel.setIncludeLeaves(false);
subSetTreeKernel.setIncludeLeaves(false);
If you don't disable it you will observe the following kernel results:
STK(t1,t2) = 1
STK(t1,t1) = 4
SSTK(t1,t2) = 4
SSTK(t1,t1) = 5
Thanks for your patience. We really appreciated your questions that made us discover this bug we were not aware of.
Best,
Simone
from kelp-full.
Hello Simone,
thank you for the answer and the explanation, now everything is clearer.
I close the issue since it has been solved.
Thanks again,
regards
Marco
from kelp-full.
Related Issues (20)
- To make it possible to create abstract trees with abstract labels for nodes HOT 2
- To make it possible to use a forest of trees for learning HOT 3
- Normalized PTK HOT 1
- Bug in method isCompatible of SequenceRepresentation HOT 2
- Error running PTK with large trees HOT 7
- how to generate syntactic and semantic structural representations of sentence HOT 14
- Error when creating maven project HOT 3
- Getting tp, tn, fp, and fn for each class in the case of Multi-class classification HOT 2
- NegativeArraySizeException HOT 2
- CGRCT creation and CSPTK similarity computation HOT 4
- Incremental training HOT 1
- How to classify a document? HOT 3
- Could kelp-input-generator process Chinese word ? HOT 3
- FixIndexKernelCache HOT 1
- Can KeLP kernel functions be used with abstract syntax trees? HOT 4
- different number of open and close parentheses HOT 2
- Parenthesis within String Literals HOT 1
- unrecognized structureElement StringLiteral HOT 2
- Printing subtrees or subset trees of a sentence HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kelp-full.