clarkkev / attention-analysis Goto Github PK

View Code? Open in Web Editor NEW

449.0 449.0 82.0 822 KB

License: MIT License

Jupyter Notebook 88.85% Python 11.15%

attention-analysis's People

Contributors

Stargazers

Watchers

Forkers

hesamalian puncoz-bookmarks shubhampachori12110095 jbdatascience chirayukong fudanchenjiahao mqrshiyan tanfei2007 lunayach bramblexu jon-chun bcmi220 vyraun merajat graced03 lingershaw manba036 bin2000 mgheini embeddedsamurai kirianguiller swcwang cgibson6279 narinek statisticallyfit bhargaviparanjape euhkim scruz03 gongheguoyingpai xcs224-summary-team eliza166 inbalbeka white61717 bhavdeep98 andresrosso joel1794 loneknightz seuevan aiquinones swirlingcloud gsgoncalves arnaudhillen tamersoliman amaeda1967 evrial dishashur becky0909 christophsk jungyitsai aaaeeee nicolay-r abhimanyu08 liugangdao masoud-karami zuohaoyu anshiquanshu66 panford vickicui iknott mhany90 masihsultani techthiyanes jryanshue xiuquan0418 nremeikis vishalpallagani liujie40 sitiporn mgorenstein schetudiante kamelgaanoun gregorywu oalacam sepehr-kamahi lechauanh yscope75 takuma1229 kiminh ajaykc7 anrgusc

attention-analysis's Issues

Support for PyTorch pertained models ?

Hi,

Thanks for sharing these excellent resources.

I wanted to ask if there is an easy fix in the model loading part and if we can extend the same attention analysis on pytorch pertained models also ?

Looking forward for your response.

Text Corpus on which the Attention was extracted

Is it possible to share the text file (not-tokenized, and in the correct format) which was used to obtain the attention values in unlabeled_attn.pkl?

Will be a big help to compare different model on the exact same corpus. (since different models use different tokenizers)

Any data donwloading url can access from china such as baidu pan

Where is the code for Figure 3 in paper?

for extract attention --word_level attention doesnt work

I have used preprocess_unlabelled to create my json file. Can you provide a preprocess file which would help word level attention.

When I use this file and when I pass the argument --word_level to extract_attention I get the following error:
Converting to word-level attention...
Traceback (most recent call last):
File "extract_attention.py", line 144, in
main()
File "extract_attention.py", line 134, in main
feature_dicts_with_attn, tokenizer, args.cased)
File "/home/manasi/Documents/BERT/attention-analysis-master/bpe_utils.py", line 74, in make_attn_word_level
words_to_tokens = tokenize_and_align(tokenizer, features["words"], cased)
KeyError: 'words'

Please help me fix it.

Interpretation of figures

Can you please clarify what does each number (8,10) stand for in "Head 8-10" in Figure 5 of your article, i.e. layer or head number?
Thank you very much for your clarification.

Typo in instructions

The instructions say:

We include two pre-processing scripts for going from a raw data file to JSON that can be supplied to attention_extractor.py

I think what is meant is the extract_attention.py script in the repo?

version of tf

I am trying this with tf 2.0 and am having a lot of issues. can you tell us what version you were using or maybe add a requirements.txt? .. thanks!

train.txt and dev.txt

Dear Kevin, hello! I'm trying to reproduce your work described at https://github.com/clarkkev/attention-analysis but it's not clear how to get the train.txt and dev.txt files used in preprocess_depparse.py. Can you clarify how I can have such files? I would like to take this opportunity to congratulate you on the experiment. Thank you very much!

How to deal with the OOV words

My sample contains the word 'Silsby', but it does not exist in the vocab.
How to deal with the OOV situation?

python extract_attention.py --preprocessed-data-file samples_10.json --bert-dir data/cased_L-12_H-768_A-12  --max_sequence_length 256 --word_level --cased

Creating examples...
Traceback (most recent call last):
  File "extract_attention.py", line 144, in <module>
    main()
  File "extract_attention.py", line 108, in main
    example = Example(features, tokenizer, args.max_sequence_length)
  File "extract_attention.py", line 29, in __init__
    self.input_ids = tokenizer.convert_tokens_to_ids(self.tokens)
  File "/Users/smap10/Project/attention-analysis-master/bert/tokenization.py", line 182, in convert_tokens_to_ids
    return convert_by_vocab(self.vocab, tokens)
  File "/Users/smap10/Project/attention-analysis-master/bert/tokenization.py", line 143, in convert_by_vocab
    output.append(vocab[item])
KeyError: 'Silsby'

a question about preprocess_depparse.py

In readme ,I can see ''Each line in the files should contain a word followed by a space followed by <index_of_head>-<dependency_label> (e.g., 0-root).''

So how can I get the index_of_head, and what's the meaning of it?

Should I know which heads(12*12) in bert is for which syntactic function firstly ,and then judge it?
Sorry for my pool English.

Bug in adding dummy word_repr for root

It should have been inserted at 0th index in -1 dimension, currently its added at last index. Since attention approximated for ROOT by adding start/end tokens would be at 0th index, it would expect word rep for root should also be at 0.

By fixing this bug I got around 3% higher UAS.

word_reprs = tf.concat([word_reprs, tf.zeros((n_words, 1, 200))], 1) # dummy for ROOT

Should be replaced with,

word_reprs = tf.concat([tf.zeros((n_words, 1, 200)), word_reprs], 1) # dummy for ROOT

Coreference Resolution

Any plans of releasing the code for coreference analysis in the paper?

Alternatively, it is possible to explain the methodology? Mainly around how the "head" word is chosen, and what is the exact set of antecedents used.

File missing

First of I want to say this is a great insight into the BERT model. But there seems to be a file missing from General Analysis. The file as said can be used to generate one's own wiki data.
(create_wiki_data.py). It would be quite helpful if you provide it .

An explanation for head_distances.py

Thank you for releasing the codes.

An explanation for head_distances.py is missing in README.
Could you add the explanation?

A small question: in head_distances.py, the line utils.write_pickle(js_distances, args.outfile) should be outside the "for loop" for i, doc in enumerate(data):?

denominator for calculating avg_attns for sep_sep case

While extracting attention weights, set segment_ids all zeros ?

The code in this line means set segment ids to zeros for both segments, I don't know whether this is a bug?

attention-analysis/extract_attention.py

Line 30 in 7b4ed20

self.segment_ids = [0] * len(self.tokens)

Something wrong with the data for syntax analysis

Hi, I found that dev_attn.pkl and train_attn.pkl for syntax analysis have all heads of 0. And the relns are all 'root'.