Hello, thanks for your work~ I have some confusions about the experiments. <ol

confusion about the results in the paper about diverseevol HOT 1 OPEN

ofa-sys commented on May 29, 2024

confusion about the results in the paper

from diverseevol.

Comments (1)

danielwusg commented on May 29, 2024

Hi! Thank you very much for your questions! I hope the following explanations help clarify our methodology and findings.

Why did we train 3 epochs for each step?

In each "round" of our experiment, we fine-tune the LLaMA model from scratch, using a dataset that increases by n_query data points with every iteration. The decision to train for 3 epochs per round adheres to the Alpaca-Style hyperparameters, as detailed here: https://github.com/tatsu-lab/stanford_alpaca, ensuring consistency across all iterations.
Why didn’t we run experiments for more rounds, i.e., on more datapoints?

The primary goal of our research was to explore efficient instruction tuning with reduced training data. We found that beyond a certain point, adding more data to the training set didn’t significantly improve performance. In our initial trials, we did implement more steps, specifically 20-30, but observed only marginal performance gains, if any.

If you could refer to Figure 2 in our paper, it also shows diminishing returns on performance gain with more steps. This observation again supports our findings that a significantly smaller subset of data can be just as effective for instruction tuning as using more data. This is also in line with the conclusions of other studies as well, such as 'Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning'.
How does the choice of n_query impact the results?

We greatly appreciate your own experiments. The selection of n_query is indeed a crucial factor. In our design, considering a fixed overall subset budget, a smaller n_query paired with a larger number of iterations (n_round) allows the model to gradually improve its performance. This is because each iteration poses a manageable challenge (100) to the model, and over multiple iterations, these smaller additions accumulate to produce substantial improvements. In contrast, a larger n_query per iteration, similar to what was used in your experiment (500), could potentially overwhelm the model’s ability to optimally select data points at each step.

Our findings suggest that a lower n_query, like 100, in conjunction with more iterations, is more effective than a higher n_query with fewer iterations. For an extreme case of selecting the entire subset budget in one iteration versus our approach of 100 per round, please refer to Section 4.3 (Dynamic Iteration) in our paper.

from diverseevol.

confusion about the results in the paper about diverseevol HOT 1 OPEN

Comments (1)

Related Issues (2)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent