Coder Social home page Coder Social logo

Comments (1)

danielwusg avatar danielwusg commented on May 29, 2024

Hi! Thank you very much for your questions! I hope the following explanations help clarify our methodology and findings.

  1. Why did we train 3 epochs for each step?

    In each "round" of our experiment, we fine-tune the LLaMA model from scratch, using a dataset that increases by n_query data points with every iteration. The decision to train for 3 epochs per round adheres to the Alpaca-Style hyperparameters, as detailed here: https://github.com/tatsu-lab/stanford_alpaca, ensuring consistency across all iterations.

  2. Why didn’t we run experiments for more rounds, i.e., on more datapoints?

    The primary goal of our research was to explore efficient instruction tuning with reduced training data. We found that beyond a certain point, adding more data to the training set didn’t significantly improve performance. In our initial trials, we did implement more steps, specifically 20-30, but observed only marginal performance gains, if any.

    If you could refer to Figure 2 in our paper, it also shows diminishing returns on performance gain with more steps. This observation again supports our findings that a significantly smaller subset of data can be just as effective for instruction tuning as using more data. This is also in line with the conclusions of other studies as well, such as 'Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning'.

  3. How does the choice of n_query impact the results?

    We greatly appreciate your own experiments. The selection of n_query is indeed a crucial factor. In our design, considering a fixed overall subset budget, a smaller n_query paired with a larger number of iterations (n_round) allows the model to gradually improve its performance. This is because each iteration poses a manageable challenge (100) to the model, and over multiple iterations, these smaller additions accumulate to produce substantial improvements. In contrast, a larger n_query per iteration, similar to what was used in your experiment (500), could potentially overwhelm the model’s ability to optimally select data points at each step.

    Our findings suggest that a lower n_query, like 100, in conjunction with more iterations, is more effective than a higher n_query with fewer iterations. For an extreme case of selecting the entire subset budget in one iteration versus our approach of 100 per round, please refer to Section 4.3 (Dynamic Iteration) in our paper.

from diverseevol.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.