Comments (1)
Hi! Thank you very much for your questions! I hope the following explanations help clarify our methodology and findings.
-
Why did we train 3 epochs for each step?
In each "round" of our experiment, we fine-tune the LLaMA model from scratch, using a dataset that increases by n_query data points with every iteration. The decision to train for 3 epochs per round adheres to the Alpaca-Style hyperparameters, as detailed here: https://github.com/tatsu-lab/stanford_alpaca, ensuring consistency across all iterations.
-
Why didn’t we run experiments for more rounds, i.e., on more datapoints?
The primary goal of our research was to explore efficient instruction tuning with reduced training data. We found that beyond a certain point, adding more data to the training set didn’t significantly improve performance. In our initial trials, we did implement more steps, specifically 20-30, but observed only marginal performance gains, if any.
If you could refer to Figure 2 in our paper, it also shows diminishing returns on performance gain with more steps. This observation again supports our findings that a significantly smaller subset of data can be just as effective for instruction tuning as using more data. This is also in line with the conclusions of other studies as well, such as 'Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning'.
-
How does the choice of n_query impact the results?
We greatly appreciate your own experiments. The selection of n_query is indeed a crucial factor. In our design, considering a fixed overall subset budget, a smaller n_query paired with a larger number of iterations (n_round) allows the model to gradually improve its performance. This is because each iteration poses a manageable challenge (100) to the model, and over multiple iterations, these smaller additions accumulate to produce substantial improvements. In contrast, a larger n_query per iteration, similar to what was used in your experiment (500), could potentially overwhelm the model’s ability to optimally select data points at each step.
Our findings suggest that a lower n_query, like 100, in conjunction with more iterations, is more effective than a higher n_query with fewer iterations. For an extreme case of selecting the entire subset budget in one iteration versus our approach of 100 per round, please refer to Section 4.3 (Dynamic Iteration) in our paper.
from diverseevol.
Related Issues (2)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from diverseevol.