Coder Social home page Coder Social logo

adaptive-knn-mt's People

Contributors

zhaoqianfeng avatar zhengxxn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

adaptive-knn-mt's Issues

Notice for any issues

As I have now become a full-time engineer, so I can't reply to the issues in time. I highly recommend to use NJUNLP/knn-box instead of this repo in the future, it can be more clear and easy to use, and can do some visualization, thanks for their works!

Data preprocessing steps used in the paper

Thanks a lot for this awesome work, and for releasing the code for the same!

I used your repository and was able to reproduce results for vanilla kNN-MT (K=8) and Adaptive kNN-MT (K=4) on the provided preprocessed data for the IT domain.

I have two queries:

  1. I would like to run your model on other datasets (e.g. WMT'19 as mentioned in section 4.1 of kNN-MT paper). Could you please mention the preprocessing scripts that I could use for the same?

  2. Could you also confirm that the K mentioned in table 2 in your paper is actually the max-k that the Meta-k network was trained on?

Thanks a lot!

在自己训练的翻译模型上训练adaptive-knn时无法收敛

作者你好!非常感谢您出色的工作。
我在尝试使用您的代码,在我自己训练的翻译模型基础上训练adaptive-knnmt,但是发现loss和ppl非常大,无法收敛。奇怪的是,如果不经过adaptive的训练,直接执行vanila knnmt的解码,就不会出现问题。
我是使用fairseq0.10.1训练的模型,请问您觉得有什么可能的原因吗?

Is the datastore built using the validation set?

While looking through the save_datastore.py, I found this line which reads rather odd together with dataset = task.dataset(subset). It seems like the datastore is built using the validation set or am I misinterpreting something here? It should be built using the training data, right?

Since the implementation is based on validate.py maybe this was falsely not adapted when creating the script. I believe it should be using args.train_subset instead.

Error when building faiss index.

I try to use a data whose dstore_size is 750101549 to build datastore and faiss index. It's ok build to datastore but it raise error when build the faiss index.
Is there any way to solve this problem?
The scripts is

#/bin/bash
work_dir=/data/root/knn
PROJECT_PATH=$work_dir/adaptive-knn-mt-main
SIGNATURE=data
DSTORE_PATH=$xianf_dir/datastore/$SIGNATURE
DSTORE_SIZE=750101549
export PYTHONPATH=$PROJECT_PATH:$PYTHONPATH                                              
   
CUDA_VISIBLE_DEVICES=0 python $PROJECT_PATH/train_datastore_gpu.py \
  --dstore_mmap $DSTORE_PATH \
  --dstore_size $DSTORE_SIZE \
  --dstore-fp16 \
  --faiss_index ${DSTORE_PATH}/knn_index \
  --ncentroids 3072 \
  --probe 32 \
  --dimension 1024
Traceback (most recent call last):
  File "/data/root/knn/adaptive-knn-mt-main/train_datastore_gpu.py", line 113, in <module>
    gpu_index.add_with_ids(to_add.astype(np.float32), np.arange(start, end))
  File "/usr/local/python3/lib/python3.6/site-packages/faiss/__init__.py", line 214, in replacement_add_with_ids
    self.add_with_ids_c(n, swig_ptr(x), swig_ptr(ids))
  File "/usr/local/python3/lib/python3.6/site-packages/faiss/swigfaiss.py", line 5756, in add_with_ids
    return _swigfaiss.GpuIndex_add_with_ids(self, n, x, ids)
RuntimeError: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /__w/faiss-wheels/faiss-wheels/faiss/faiss/gpu/StandardGpuResources.cpp:452: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type IVFLists dev 0 space Device stream 0x2d6cbc0 size 8388608 bytes (cudaMalloc error out of memory [2])

The training set of the model

Hi, xin.
I have a question about the training set of the model.
Do you directly use the dev set of IT domain to train model and use test set of IT domain to test.
This means we only need few in-domain samples to train the meta-network.

Nice Implementation!

Really cool implementation, a lot cleaner and better integrated within fairseq than the implementation provided by the original knn-MT work!

If one would want to adapt this to the multilingual task as well, do you have an idea how the forward_and_get_hidden_state_step method would look like and what would need to change in the save_datastore.py script?

meta-k network hyperparameters and training the base model on out-ouf-domain+in-domain data

Hello,

thanks for the nice work and the clean implementation, it is very easy to use and understand!

I have a couple of (maybe stupid) questions concerning the methodology:

  • In your paper you mention you use the valid set to train the meta-k network for about 5k steps. How was the number of steps chosen (since you are using the valid set as a training set)?
  • Would it be possible to train a base model using a combination of general open-source data + IT + Med + Koran + Law and then apply the adaptive-knn-mt technique on IT, Med, Koran, Law using this base model? I can use the in-domain training set to create the datastore and then the valid set to train the meta-k network. This should not cause any data leakage or am I missing something?

Thanks in advance for the attention,

Z

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.