The compound-pcfg from harvardnlp

a fast implementation of C-PCFGs

Hi Yoon! Thank you for sharing the amazing code. I re-implemented C-PCFGs based on Torch-Struct. It is faster (~25min / epoch) and slightly more accurate (~55.7% F1). I am happy to share it. It may help people explore the potential of C-PCFGs more easily. Here is my implementation.

-Yanpeng

Inquiry on some minor implementation details

Hello, thanks for sharing your code, first of all, which is well-structured and easy to understand.
I have a few questions about the details of your implementation.

In your paper, you said

We employ a curriculum learning strategy (Bengio et al., 2009) where we train only on sentences of length up to 30 in the first epoch, and increase this length limit by 1 each epoch.

But in the code, I found that the maximum of the lengths of sentences increases as
30 -> 40 -> 41 -> ... even though I expected this should change like 30 -> 31 -> 32 -> ...
Maybe this undesirable? working seems coming from the below line in train.py.

compound-pcfg/train.py

Line 130 in 43359b8

args.max_length = max(args.final_max_length, args.max_length + args.len_incr)

Is there any reason you used max instead of min?
It seems right to change the max function into the min function on the basis of your paper.

And, I'm just curious about why we should consider the EOS token ('</s>') in this model by explicitly adding 1 to the number of words in a sentence as below.

compound-pcfg/train.py

Line 105 in 43359b8

    
           num_words += batch_size * (length + 1) # we implicitly generate </s> so we explicitly count it

Thanks again for opening your code!

How to get PCFG rules and rule probabilities?

Is there a way to get the whole set of PCFG rules and rule probabilities while training the model? I found there's a tensor called rule_scores with shape b x NT x (NT+T) X (NT+T), but I don't see how to get rules and rule probabilities from there, and where are rules going from pre-terminals to terminals.

guide papers to understanding

Hi @yoonkim
Could you introduce me to the guide papers to understand the paper (not all the references)? I have been tried several times to understand your PCFG paper, but it is not easy.

(I know studying all the references of this paper is the best way to understand that.. However, I'm just a beginner who studies the CS224N which is an NLP course at Stanford University. Reading all the references for each reference paper is almost impossible for me.)

If you don't mind, could you share the experiences about OpenNMT to me? I hope to know what am I have to do if I want to make Speech recognition and NMT in the future.

Thank you

No root symbol in MAP parse trees

The root symbol is not considered when getting the MAP parse tree both in the paper and in the code:

compound-pcfg/eval.py

Lines 176 to 189 in 1c0078c

    
           for i in range(length): 
        
             tag = "T-" + str(int(argmax_tags[i].item())+1)  
        
             pred_tree[i] = "(" + tag + " " + sent_orig[i] + ")" 
        
           for k in np.arange(1, length): 
        
             for s in np.arange(length): 
        
               t = s + k 
        
               if t > length - 1: break 
        
               if binary_matrix[s][t] == 1: 
        
                 nt = "NT-" + str(int(label_matrix[s][t])+1) 
        
                 span = "(" + nt + " " + pred_tree[s] + " " + pred_tree[t] +  ")" 
        
                 pred_tree[s] = span 
        
                 pred_tree[t] = span 
        
           pred_tree = pred_tree[0] 
        
           pred_out.write(pred_tree.strip() + "\n")

But according to the Viterbi algorithm and the majority of golden parse trees in the treebank, there should be a root symbol (although I haven't looked at the Viterbi implementation here in PCFG.py). Why don't we have root symbol in MAP trees?

Extra indents in PCFG.py?

Are these extra indents in line 70-71 in PCFG.py?

compound-pcfg/PCFG.py

Lines 70 to 72 in 47a7b16

    
             log_Z = self.beta[:, 0, n-1, :self.nt_states] + root_scores 
        
             log_Z = self.logsumexp(log_Z, 1) 
        
           return log_Z

Should it be:

    log_Z = self.beta[:, 0, n-1, :self.nt_states] + root_scores
    log_Z = self.logsumexp(log_Z, 1)
    return log_Z

harvardnlp / compound-pcfg Goto Github PK

compound-pcfg's People

Contributors

Stargazers

Watchers

Forkers

compound-pcfg's Issues

a fast implementation of C-PCFGs

Inquiry on some minor implementation details

How to get PCFG rules and rule probabilities?

guide papers to understanding

No root symbol in MAP parse trees

Extra indents in PCFG.py?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	for i in range(length):
	tag = "T-" + str(int(argmax_tags[i].item())+1)
	pred_tree[i] = "(" + tag + " " + sent_orig[i] + ")"
	for k in np.arange(1, length):
	for s in np.arange(length):
	t = s + k
	if t > length - 1: break
	if binary_matrix[s][t] == 1:
	nt = "NT-" + str(int(label_matrix[s][t])+1)
	span = "(" + nt + " " + pred_tree[s] + " " + pred_tree[t] + ")"
	pred_tree[s] = span
	pred_tree[t] = span
	pred_tree = pred_tree[0]
	pred_out.write(pred_tree.strip() + "\n")

	log_Z = self.beta[:, 0, n-1, :self.nt_states] + root_scores
	log_Z = self.logsumexp(log_Z, 1)
	return log_Z