Coder Social home page Coder Social logo

Comments (5)

SimoneFilice avatar SimoneFilice commented on May 28, 2024

Hi Marco,

It seems that there is a bug on the SubSetTreeKernel.
Since t1=t2, we expect that SSTK(t1,t2) = SSTK(t1,t1) = SSTK(t2,t2) = 5.
I am investigating on this bug, I will fix it asap.

Best,
Simone

from kelp-full.

SimoneFilice avatar SimoneFilice commented on May 28, 2024

Hi Marco,
I should have solved the bug (which was actually affecting both STK and SSTK, although with your toy example it was visible only on the SSTK). After additional sanity check we will release the new version. For the moment, please use the development branch of the last version of KeLP to get the problem solved.

Cheers,
Simone

from kelp-full.

mgravina1 avatar mgravina1 commented on May 28, 2024

Hello @SimoneFilice ,

thank you very much for your reply and solution.

I'm still struggling to understand some results, but I think it's due to something that I still can't get in the definition of the different kernel trees.

I'm considering the definitions reported in KeLP website (these) and I'm executing these lines of code:

float lambda = 1f; 
float mu = 1f; 
int terminal_factor = 1;
float threshold = 0.001f;
		
TreeRepresentation t1 = new TreeRepresentation();
t1.setDataFromText("(a (b) (c (d)))");
TreeRepresentation t2 = new TreeRepresentation();
t2.setDataFromText("(a (b) (c (d (e))))");
				
SubSetTreeKernel subSetTreeKernel = new SubSetTreeKernel(lambda, "tree");
SubTreeKernel subTreeKernel = new SubTreeKernel(lambda, "tree");
PartialTreeKernel partialTreeKernel = new PartialTreeKernel(lambda, mu, terminal_factor, "tree");
		
System.out.println("Similarity t1-t2 SubTreeKernel res = " + subTreeKernel.kernelComputation(t1, t2));
System.out.println("Similarity t1-t2 SubSetTreeKernel res = " + subSetTreeKernel.kernelComputation(t1, t2));
System.out.println("Similarity t1-t2 PartialTreeKernel res = " + partialTreeKernel.kernelComputation(t1, t2));

The output I get is:
Similarity t1-t2 SubTreeKernel res = 3.0
Similarity t1-t2 SubSetTreeKernel res = 4.0
Similarity t1-t2 PartialTreeKernel res = 10.0

I can understand why I get 10 as PartialTreeKernel similarity result, but I can't figure out the other results.
I'm reasoning as follows:


SubTree(s) I caculate SubTrees as "nodes with their complete descendancy":

subTrees for t1
(a (b) (c (d))), (c (d)), (b), (d)

subTrees for t2
(a (b) (c (d (e)))), (c (d (e))), (d (e)), (b), (e)

so I expect to get "1" as similarity value between t1 and t2


SubSetTree(s) I caculate SubSetTrees as "nodes with either all their children or none of them":

subSetTrees for t1
(a (b) (c (d))), (c (d)), (a (b) (c)), (a), (b), (c), (d)

subSetTrees for t2

(a (b) (c (d))), (c (d)), (a (b) (c)), (a), (b), (c), (d), (a (b) (c (d (e)))), (c (d (e))), (d (e)), (e)

so I expect to get "7" as similarity value between t1 and t2


As partialTees (considered as "nodes with their partial production and descendancy") I expect to find all subSetTrees plus:

additional partialTrees for t1
(a (b)), (a (c)), (a (c (d))

additional partialTrees for t2
(a (b)), (a (c)), (a (c (d)), (a (c (d (e))))

so I expect to get "10" as similarity value between t1 and t2


Am I wrong?

Thanks for your attention.

Best regards
Marco

from kelp-full.

SimoneFilice avatar SimoneFilice commented on May 28, 2024

Hi Marco,

I'm sorry for the late answer, but solving the issue required a bit. Again, please use the development branch until we release a new version of KeLP.
Let's clarify how the Subtree Kernel and SubSet Tree Kernel work in their standard setting. They both search for common productions in trees. This means that individual nodes are not considered valid matching fragments.
So, given the trees t1=(a (b) (c (d))) and t2=(a (b) (c (d (e)))) , we have the following kernel computations:

SUB TREE KERNEL

STK(t1,t2) = 0, as no common productions occur.

STK(t1,t1) = 2, and the matching fragments are:

  1. (a (b) (c (d)))
  2. (c (d))

SUBSET TREE KERNEL

SSTK(t1,t2) = 3, and the matching fragments are the followings:

  1. (a (b) (c))
  2. (a (b) (c (d)))
  3. (c (d))

SSTK(t1,t1) = 3, and the matching fragments are:

  1. (a (b) (c))
  2. (a (b) (c (d)))
  3. (c (d))

Please consider that by default the KeLP implementations of the STK and SSTK also perform a matching between tree leaves. In constituency trees where leaves are words, this corresponds to combine the tree kernel with a linear kernel on a Bag-of-Word representation. To disable this additional analysis you need to use the setIncludeLeaves method:
subTreeKernel.setIncludeLeaves(false);
subSetTreeKernel.setIncludeLeaves(false);

If you don't disable it you will observe the following kernel results:
STK(t1,t2) = 1
STK(t1,t1) = 4
SSTK(t1,t2) = 4
SSTK(t1,t1) = 5

Thanks for your patience. We really appreciated your questions that made us discover this bug we were not aware of.

Best,
Simone

from kelp-full.

mgravina1 avatar mgravina1 commented on May 28, 2024

Hello Simone,

thank you for the answer and the explanation, now everything is clearer.

I close the issue since it has been solved.

Thanks again,

regards
Marco

from kelp-full.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.