Comments (1)
modify your save_train_state_to_file function to include logging and handle large objects:
# Add logging to identify large objects
import logging
def save_train_state_to_file(self, key, value, fout):
try:
packed_value = to_bytes(value)
logging.info(f"Saving key: {key}, size: {len(packed_value)} bytes")
fout.write(packer.pack((key, packed_value)))
except ValueError as e:
logging.error(f"Error saving key: {key}, size: {len(packed_value)} bytes")
# Handle the error, potentially by splitting the object or taking other actions
raise e
If value is too large, consider a method to split it into smaller chunks. Here's a simplified example:
import msgpack
def save_train_state_to_file(self, key, value, fout):
try:
packed_value = to_bytes(value)
if len(packed_value) > SOME_THRESHOLD: # Define a threshold based on your needs
logging.info(f"Splitting large object: {key}")
chunks = [packed_value[i:i + CHUNK_SIZE] for i in range(0, len(packed_value), CHUNK_SIZE)]
for idx, chunk in enumerate(chunks):
fout.write(packer.pack((f"{key}_part_{idx}", chunk)))
else:
fout.write(packer.pack((key, packed_value)))
except ValueError as e:
logging.error(f"Error saving key: {key}, size: {len(packed_value)} bytes")
raise e
pip install --upgrade msgpack
from lwm.
Related Issues (20)
- How to setup conversation with vision chat? HOT 1
- Does support text to image and video geneation training ?
- Request for publicizing the LWM-1K/8K JAX or PyTorch model
- AttributeError: module 'jax.numpy' has no attribute 'DeviceArray' when run sample_video.sh
- RESOURCE_EXHAUSTED: XLA:TPU compile permanent
- DP FSDP & SP
- ValueError: Incompatible shapes for broadcasting: (2, 1, 1, 526464) and requested shape (2, 1, 32768, 32768) HOT 2
- Can it be used in the environment H100 ?
- Great work! Any plan for the vision-language models in Pytorch?
- Weight conversion scripts HOT 1
- Minimum GPU memory capacity required to run HOT 1
- vision model initialization
- what is the "_missing_keys"?
- Interesting Problems of Accuracy & Inference Speed with run_eval_needle.sh
- Question about loading LLaMA-2 7B on the LLM context extension stage
- vison-language model training data example for videos
- Any consideration on why use 4 sp & 32 tp?
- Quantize model weights
- Error while running bash command: run_sample_video.sh | Error: "TypeError: missing a required argument: 'segment_ids'" HOT 3
- Hang in vision_generation.py with newer versions of Jax
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lwm.