mini-sora / minisora Goto Github PK

View Code? Open in Web Editor NEW

1.1K 17.0 142.0 69.2 MB

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Home Page: https://github.com/mini-sora/minisora

License: Apache License 2.0

Python 97.86% Jupyter Notebook 1.00% Shell 0.96% CSS 0.18%

diffusion sora video-generation

minisora's Issues

[Add] - ICML 23 Paper AudioLDM for t2a task

Detailed Description

Content Name/Link: AudioLDM: Text-to-Audio Generation with Latent Diffusion Models.

Current Status/Issue: AudioLDM is missing in the section of audio related papers.

Update Details: Add the link of the paper , project, github , and etc.

Additional Information

Reason for Update: AudioLDM is the first TTA system that enables various text-guided audio manipulations (e.g., style transfer) in a zero-shot fashion.

Deadline (if any): ASAP

[Update] - add project and github link for paper FlowZero

FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax

https://flowzero-video.github.io/

https://github.com/aniki-ly/FlowZero

Please check the reasearch works that just have paper links

[Update] - Refering README.md to update README_CN.md

Detailed Description

I updated the README.md for better organization of contents, please follow the README.md to update the README_EN.md.

Additional Information

You need not to add the translated matrials to the English README_EN.md since it is meaningless to add Chinese translation matrials in a English Page.

[Update] - Audio related resources

Detailed Description

Content Name/Link: Stable Audio Paper and GitHub Link

Current Status/Issue: The paper link and GitHub repository link for Stable Audio are currently missing, and the content name "NaturalSpeech" has incorrectly used Chinese brackets instead of English brackets.

Update Details:

Add the correct English paper link for Stable Audio.
Add the correct English GitHub link for Stable Audio.
Replace the Chinese brackets with English brackets in "NaturalSpeech".

Additional Information

Reason for Update: The update is necessary to provide accurate and accessible resources for users interested in Stable Audio. Correcting the brackets ensures consistency and readability for an international audience.

Deadline (if any): ASAP

[Update] translate the notes/README.md in English

Issue description

The current notes/README.md is in Chinese, please refer other pages to translate it in Chinese.

Steps：

cope notes/README.md as notes/README_CN.md
translate the notes/README.md in English
add Chinese ([English](./README.md) | 简体中文 ) and English links(English | [简体中文](./README_CN.md) ) to notes/README_CN.md and notes/README.md

详情

目前的notes/README.md是中文的，请参考其他页面翻译成中文。

步骤：

将 notes/README.md 处理为 notes/README_CN.md
将notes/README.md翻译成英文
1. 在notes/README_CN.md中添加中文链接([English](./README.md) | 简体中文)和英文链接(English | [简体中文](./README_CN.md)) 注释/README.md

[Add] new sora techrxiv preprint

Detailed Description

Content Name/Link: Generate Impressive Videos with Text Instructions: A Review of OpenAI Sora, Stable Diffusion, Lumiere and Comparable

Current Status/Issue: The document lacks a comprehensive review focusing on OpenAI Sora, Stable Diffusion, and Lumiere.

Update Details: The document will be updated with a detailed review of the paper "Generate Impressive Videos with Text Instructions," which examines the architectures and models of the mentioned AI systems. The new paper will also address the challenges and implications of text-to-video AI, including trustworthiness, data transparency, and environmental sustainability.

Additional Information

Reason for Update: This update is necessary to provide the community with a thorough understanding of the current state and future potential of text-to-video AI technologies. It will help researchers, developers, and industry professionals to stay informed about the latest developments and their broader impacts.

Deadline (if any): ASAP

Linking to a wrong page.

It looks like this link is pointing to the wrong page. It points to the paper 'VideoLDM: Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models'.

[Update] - update papers of Audio Related Resource

Detailed Description

Content Name/Link: Make-An-Audio 2, Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners, Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding.

Current Status/Issue: these papers are necessary about Audio Related Resource

Update Details:

Add the correct English paper link for Audio related resources paper.
Add the correct English GitHub link for Audio related resource paper.

Additional Information

Reason for Update: The update is necessary to provide accurate and accessible resources for users interested in Audio Related Resource.

[Add] paper 'Taming Transformers for High-Resolution Image Synthesis' to 'Diffusion Transformer'

Add paper 'Taming Transformers for High-Resolution Image Synthesis' to 'Diffusion Transformer'

CVPR 21 paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Esser_Taming_Transformers_for_High-Resolution_Image_Synthesis_CVPR_2021_paper.pdf

Github: https://github.com/CompVis/taming-transformers

Project: https://compvis.github.io/taming-transformers/

[Update] - update papers of Audio Related Resource

Detailed Description

Content Name/Link: Make-An-Audio, AudioGPT, AudioLM, AudioGen, Audio-Visual LLM for Video Understanding, Macaw-LLM

Current Status/Issue: these papers is necessary about Audio Related Resource

Update Details:

Add the correct English paper link for Audio related resources paper.
Add the correct English GitHub link for Audio related resource paper.
Remove the extra commas for 'Layered Neural Atlases for Consistent Video Editing' in the en/zh ver.

Additional Information

Reason for Update: The update is necessary to provide accurate and accessible resources for users interested in Audio Related Resource. Correcting the commas ensures readability is necessary.

[Add] Prompt Engineering Papers

Detailed Description

Content Name/Link: Prompt Engineering paper.

Current Status/Issue: Empty content

[Update] - Standardize the Labels of arXiv Papers [更新] - 标准化arXiv论文的标签

Detailed Description

Content Name/Link: The labels of arxiv papers

Current Status/Issue: The readme file currently lists arXiv papers with inconsistent types, some are labeled as "Paper" and others are formatted as "Arxiv YY".

Update Details: The update involves standardizing the type designation for all arXiv papers listed in the readme.

Additional Information

Reason for Update: The uniform categorization of paper types will improve the clarity and navigability of the readme file for users. It will make it easier for the community to identify the nature of each paper at a glance, thus enhancing the overall user experience and utility of the resource.

Deadline (if any): It is recommended to complete this formatting update before the next content refresh to ensure all users have access to the most current and accurate information.

详细描述

内容名称/链接：arXiv论文的标签

当前状态/问题：readme文件目前列出了类型不一致的arXiv论文，有些被标记为“Paper”，其他则格式化为“Arxiv YY”。

更新详情：更新涉及标准化readme中列出的所有arXiv论文的类型标识。

附加信息

更新原因：统一的论文类型分类将提高readme文件的清晰度和可导航性，方便用户使用。这将使社区更容易一眼识别每篇论文的性质，从而提升整体用户体验和资源的实用性。

截止日期（如果有）：建议在下次内容更新之前完成此格式更新，以确保所有用户都能获得最新和最准确的信息。

[Add] - file description in docs/README.md

Detailed Description

[Add] - file description in docs/README.md

Now it is empty.

[Add] - ACL 2023 Paper Link to LAVIS project

Detailed Description

Content Name/Link: LAVIS - A Library for Language-Vision Intelligence

Current Status/Issue: The resource is missing the link to the associated paper.

Update Details: The paper titled "LAVIS - A Library for Language-Vision Intelligence" was presented at ACL 2023 and is currently available on the ACL Anthology. The link to the paper is https://aclanthology.org/2023.acl-demo.3.pdf. This link should be added to the resource page to provide direct access to the research for interested users.

Additional Information

Reason for Update: The update is necessary to ensure that users can easily access the full paper, which is a valuable resource for those interested in the field of language-vision intelligence. Providing the link will enhance the resource's utility and allow for better dissemination of the research findings.

Deadline (if any): There is no specific deadline mentioned for this update. However, it is recommended to perform the update as soon as possible to maintain the currency and relevance of the resource.

Add dev-branch to minisora /codes/

copy codes from OpenDiT、SiT、W.A.L.T
check if the added codes could be updated
multi-branches development may be need for keep tracking updating source codes and adding our improvements to replicate Sora: add dev-branch

Update an issue template which is more appropriate for this repo

Is your feature request related to a problem? Please describe.
The current template is more inclined to the template of the development project, and a template more suitable for the style of this project is needed.

Describe the solution you'd like
Type of issue may include:

Request to add / update the code repo, arxiv website, project website, blog, demo of a paper...
Add / Fix features of this repo.
Typographical / link / spelling issues...
Other discussions...

Describe alternatives you've considered
None.

Additional context
We can refer to openmmlab's template. And you can comment below to supplement your suggestions. I may finish this task tonight.

Upload the notes about the presentation of Video Diffussion Paper

[Update] Optimize English expression 优化英文表达

Issue Description

Optimize English expression (in README.md') for English page of README_CN.md's translation for each folder.

Requirements

Please ensure that the expression across all pages is consistent (use the same expression for the same meaning).
Aim to use scientific, concise, and professional vocabulary in your descriptions.
During the optimization process, feel free to modify the expressions in both Chinese and English simultaneously.
This task can be assigned to multiple people. When claiming the task, please specify which page you will optimize and leave a comment in the comment section.

问题描述

优化每个文件夹中 README_CN.md 的英文页面 (README.md) 的表达。

要求

请确保所有页面的表达一致（相同含义使用相同表达方式）。
尽量使用科学、简洁和专业的词汇描述。
在优化过程中，可以同时修改中文和英文的表达方式。
这个任务可以多人领取，请领取时，说明要优化哪个页面，在评论区评论

Sora有关论文复现小组人员招募

添加Sora有关论文复现小组微信二维码, 并在主页添加如下信息

复现论文主要有

DiT with OpenDiT
SiT
W.A.L.T

添加位置位于"近期圆桌讨论"上面

[Correct] - broken link to CONTRIBUTING_EN document in PR template and CONTRIBUTING File Name Update

New Issue Description

Change the CONTRIBUTING.md in Chinese and CONTRIBUTING_EN.md in English --> CONTRIBUTING.md in English and CONTRIBUTING_CN.md in Chinese

Detailed Description

Content Name/Link: CONTRIBUTING_EN.md

Current Status/Issue: The provided link for the CONTRIBUTING_EN document is not accessible or does not lead to the expected file.

Update Details: The issue needs to be resolved by either fixing the broken link or by providing the correct and functional link to the CONTRIBUTING_EN document.

Additional Information

Reason for Update: Ensuring that contributors can access the CONTRIBUTING_EN document is crucial for maintaining a clear and effective contribution process. A broken link can lead to confusion and hinder community engagement.

Deadline (if any): There is no specific deadline, but it would be beneficial to address this issue as soon as possible to minimize disruption to potential contributors.

add corresponding modification to readme_en.md with respect to PR 43

ref #43

README.md index to CONTRIBUTING.md is a error path

Describe the bug
can't jump into CONTRIBUTING.md from the README.md.

[docs] Move CONTRIBUTING_EN.md and CONTRIBUTING.md to docs folder

create a docs folder
move files
update links

Add [ConferenceName Year] to Each Paper in the colomn of `Links` or `链接`

For example：

Diffusion Model
论文	链接
1) [ICCV 23] StableVideo: Text-driven Consistency-aware Diffusion Video Editing	Paper, Github, Project
2) [CVPR 24] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding	Paper, Github, Project
3) DDPM: Denoising Diffusion Probabilistic Models	Paper, Github

This table in the above could be changed as the following:

Diffusion Model
论文	链接
1) StableVideo: Text-driven Consistency-aware Diffusion Video Editing	ICCV 23 Paper, Github, Project
2) MovieChat: From Dense Token to Sparse Memory for Long Video Understanding	CVPR 24 Paper, Github, Project
3) DDPM: Denoising Diffusion Probabilistic Models	NeurIPS 20 Paper, Github

Is there a relationship between video encoding standards and video compression in video generation?

I have some questions regarding the relationship between video encoding standards and video compression in video generation. From my understanding, video encoding standards such as H.261, H.262, H.263, H.264, and H.265 are used to compress digital videos, reducing file size or lowering bandwidth requirements. However, I would like to delve deeper into how these encoding standards are related to video compression in the context of video generation.

I would greatly appreciate more detailed information to gain a better understanding of the relationship between video encoding standards and video compression. This knowledge will help me grasp the technical intricacies involved in video generation and processing.

Thank you for your assistance!

Add star history to English version README.md

Add star history to English version README.md.

[Update] translate the `codes/README.md` in English

The current codes/README.md is in Chinese, please refer other pages to translate it in Chinese.

不成熟想法：复现模拟一个小世界的可能性？

1、选定一部动漫(例如侠盗罗宾汉)或者电视剧(例如南来北往)，复现表现的世界。用来验证技术路径。
2、第一有对应小说文本支撑，数据量足够，场景不多比较固定

[Update] - add papers related to `PIXART-Σ`

Description

add papers related to PIXART-Σ

project: https://pixart-alpha.github.io/PixArt-sigma-project/

papers and links

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
https://arxiv.org/pdf/2403.04692.pdf

PIXART-α: FAST TRAINING OF DIFFUSION TRANSFORMER FOR PHOTOREALISTIC TEXT-TO-IMAGE SYNTHESIS
https://arxiv.org/pdf/2310.00426.pdf

PIXART-δ: FAST AND CONTROLLABLE IMAGE GENERATION WITH LATENT CONSISTENCY MODEL
https://arxiv.org/pdf/2401.05252.pdf

Update equations in `./notes/SD3_zh-CN.md`

Some eqations could be interpreted by the Github, please check the reason and update it

[Add] - Improved DDPM to Diffusion model

Detailed Description

Content Name/Link: Improved Denoising Diffusion Probabilistic Models，github

Current Status/Issue: The README does not currently include the recent advancements in denoising diffusion probabilistic models, specifically the paper "Improved Denoising Diffusion Probabilistic Models" which introduces significant improvements to the field.

Update Details: The update will involve adding a new section or subsection within the Diffusion Models part of the README. This will include the title of the paper, and a link to the paper or its repository if available.

Additional Information

Reason for Update: This paper significantly enhances sample generation quality and efficiency through improved denoising diffusion probabilistic models and fosters further research and practical applications in the field by providing open-source code.

Deadline (if any): There is no strict deadline for this update; however, it is recommended to implement the changes as soon as possible to ensure the README remains up-to-date and relevant.

[Update] - hot news and remove unnecessary para in the link of paper

Detailed Description

Content Name/Link: hot news and the link of sora techrxiv preprint

Current Status/Issue: The first Sora survey paper is currently missing from the hot news section. Additionally, the provided link to the Sora TechRxiv preprint includes an unnecessary commit parameter.

Update Details: Add the first sora paper into the list of hot news. And remove the commit para in the link of the preprint paper.

Additional Information

Reason for Update: To ensure that the latest and most relevant content is featured in the hot news section, and to provide a clean and direct link to the Sora TechRxiv preprint for easier access and citation purposes.

Deadline (if any): ASAP

License?

Contributors might not be sure what they're allowed to do in this project.

Can you add a license preferably an open source license so we can be sure of what we are allowed to do?

[Update] - update format of papers in section of `Prompt Engineering`

Detailed Description

add github and project page link for these papers
format modification refering other sections

Additional Information

Deadline (if any): Before Monday

Update PR template so its title has different prefix tags.

Is your feature request related to a problem? Please describe.

The current PR submission title is not standardized, it is best to unify the title format to facilitate the subsequent review and retrieval process.

Describe the solution you'd like
Create multiple PR templates. User should select the appropriate template when submitting, and automatically fill in prefix labels.

Describe alternatives you've considered
None.

Additional context
We can refer to openmmlab's template. And you can comment below to supplement your suggestions. I may finish this task tonight.

Paper link is wrong

The link to the 8th paper leads to the 6th paper

The contributing guide of mini sora community

In order to better contribute to the community, we should establish an open source contribution specification

add new instruction to the contribution method in [CONTRIBUTING.md]

Refer #72 to add new instruction to the contribution method in CONTRIBUTING.md for English and Chinese Version README files.

Update Latte.md with the markdown equation content failed to render

Please refer modification in PR #69 to update Latte.md with the markdown equation content failed to render successful

add the two paper reading sources about Latte to the README_EN.md

please mention that, these two files about Latte in notes folder are written by Chinese.

Such as xx.pdf in Chinese

[Add] - Add a table of contents

Create a table of contents below Related Works for the different levels of headings in a document. such as

Related Works

Diffusion Models
Diffusion Transformer
Baseline Video Generation Models
.....

The link should also be added for English and Chinese README files [Note that, the links for them are different], for example.

In README.md, add https://github.com/mini-sora/minisora#diffusion-models

Diffusion Models

In README_zh-CN.md, add https://github.com/mini-sora/minisora/blob/main/README_zh-CN.md#diffusion-model

Diffusion Models

You can find the link when you check the websit in the front of <h3> tags

[Update] - Move two new survey papers into "最近更新", which should be located before '论文复现小组'

Detailed Description

Content Name/Link: State of the Art on Diffusion Models for Visual Computing

Current Status/Issue: The document may not include the latest advancements in diffusion models for visual computing.

Update Details: The "State of the Art on Diffusion Models for Visual Computing" paper will be incorporated.

Update

I think we could add these two new papers into "最近更新" to extract more attention for our SoraSurvey Team.

What's more, "最近更新" should also be moved to a place near the top part of README.md. For example, before '论文复现小组'

Additional Information

Reason for Update: This paper provides an intuitive starting point to explore video Diffusion model topic for researchers, artists, and practitioners alike.

[Add] - Add VideoMamba

Detailed Description

Content Name/Link: VideoMamba: State Space Model for Efficient Video Understanding

Current Status/Issue: This is a new paper/project/resource that has not been previously included in the repository.

Update Details: The addition of this paper to the repository will provide a new benchmark for video understanding and contribute to the field's advancement. The code and models for VideoMamba are available on GitHub for easy access and further exploration. The repository can be found at: https://github.com/OpenGVLab/VideoMamba

Additional Information

Reason for Update: The inclusion of the VideoMamba paper is crucial as it presents a significant advancement in the field of video understanding. By adding this resource, the community will gain access to a state-of-the-art model that can enhance the efficiency and comprehensiveness of video analysis.

Deadline (if any): There is no specific deadline for this update, but it is recommended to add the paper as soon as possible to keep the repository current and relevant.

add corresponding modification to readme_en.md with respect to PR 45

ref #45

[Update] - Synchronize the README_CN.md with the README.md. 同步中文页面README_CN.md内容到英文页面README.md

Task Announcement

Here, we are announcing tasks that we need assistance with. If you are interested, please let us know and become a part of our project's developer contributors.

The main task is to remove duplicates and synchronize content between the Chinese and English readme pages.
Next, we need to work on the Baseline Video Generation Models. This can be placed before Video Generation as a baseline model.
We need to include the current state-of-the-art papers and typical papers, and move less typical works to the Video Generation section.
Please mention this in the Chinese and English contributor's manual, and include the link to the contributor's manual in the PR template.

Some rules regarding the list format, which include the following points:

First, search to ensure that the literature is not already in the list to avoid duplication.
For typical papers or models, you can add an abbreviation before the paper's name.
For papers with a colon in the title, you can bold the model name before the colon.
For top conference papers and top journals, add the corresponding name in the Paper link, such as CVPR 23, and only bold the CVPR 23 ,-->[CVPR 23 paper], i.e., [**CVPR 23** paper] in markdown.

任务发布

这里发布下需要大家帮忙的任务, 感兴趣的可以提一下, 成为这个项目的开发者贡献者中的一员

主要是中英文readme页面的论文去重和内容同步,
再就是Baseline Video Generation Models, 这个可以放在 Video Generation前面, 作为baseline模型,
把目前soat的论文和典型论文放在里面, 不够典型的工作移动到Video Generation中
在中英文的贡献者手册中提一下, 并把贡献者手册的链接放在PR template中吧

规定下list格式问题, 包括以下要点

先搜索是否文献已经在list中, 不要重复,
典型论文或者模型, 可以在论文名前添加缩写名,
论文中有冒号的, 可以将冒号前的model名粗体
top会议论文和top期刊, 在Paper link中添加对应名字, 如 CVPR 23, 并只对CVPR 23 进行粗体表示 ,-->[CVPR 23 paper], 即用markdown语法表示为 [**CVPR 23** paper]

[Add] Open-Sora Project to Repo

Add Open-Sora Project to Repo

Detailed Description

Content Name/Link: Open-Sora (https://github.com/hpcaitech/Open-Sora)

Current Status/Issue: Open-Sora is not currently listed in the repository.

Update Details: I propose to add Open-Sora to the repository as it is a high-performance open-source project that provides a development pipeline for Sora-like applications, powered by Colossal-AI. The project includes a complete architecture solution from data processing to training and deployment, supports dynamic resolution training, multiple model structures, various video compression methods, and multiple parallel training optimizations.

Additional Information

Reason for Update: Adding Open-Sora to the repository will benefit the community by providing access to a robust and versatile tool for developing and training multimodal AI models. It will also promote the use of Colossal-AI and contribute to the advancement of AI research and development in the field of video processing and multimodal learning.

Deadline (if any): There is no specific deadline for this update, but it would be beneficial to include it in the next repository update cycle.

Create a pull request template and an issue template for the repository

[Update] - update papers of Audio Related Resource

Detailed Description

Content Name/Link: Diffsound, AudioLDM2, TANGO, MusicGen, LauraGPT

Current Status/Issue: these papers is necessary about Audio Related Resource

Update Details:

Add the correct English paper link for Audio related resources paper.
Add the correct English GitHub link for Audio related resource paper.

Additional Information

Reason for Update: The update is necessary to provide accurate and accessible resources for users interested in Audio Related Resource.

add corresponding modification to readme_en.md with respect to PR 42

ref #42

add corresponding modification to readme_en.md with respect to PR 44

ref #44

mini-sora / minisora Goto Github PK

minisora's Issues

Detailed Description

Additional Information

Detailed Description

Additional Information

Detailed Description

Additional Information

Issue description

详情

Detailed Description

Additional Information

Detailed Description

Additional Information

Detailed Description

Additional Information

Detailed Description

Detailed Description

Additional Information

详细描述

附加信息

Detailed Description

Detailed Description

Additional Information

Issue Description

Requirements

问题描述

要求

New Issue Description

Detailed Description

Additional Information

Diffusion Model

Diffusion Model

Description

Detailed Description

Additional Information

Detailed Description

Additional Information

Detailed Description

Additional Information

Related Works

Detailed Description

Update

Additional Information

Detailed Description

Additional Information

Task Announcement

Some rules regarding the list format, which include the following points:

任务发布

规定下list格式问题, 包括以下要点

Add Open-Sora Project to Repo

Detailed Description

Additional Information

Detailed Description

Additional Information

Recommend Projects

Recommend Topics

Recommend Org