Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About video fragments #61

Open
sameerKgp opened this issue May 14, 2024 · 11 comments
Open

About video fragments #61

sameerKgp opened this issue May 14, 2024 · 11 comments

Comments

@sameerKgp
Copy link

Hi thanks for providing the code of your work. In the code what is the video_fragment. Is it for the breakpoint mode? How to create these fragments? Also in the src/video_fragment, you have provided a clip from a different video (GOT) than the Cooking_cake one.

@Espere-1119-Song
Copy link
Collaborator

video_fragment stores the video clip read by the sliding window, and it will be created and automatically updated. Also I didn't find the GOT video, can u point out the exact path? We didn't upload Cooking_cake since it is too big to upload on Github.

@sameerKgp
Copy link
Author

Thanks for the reply. The cooking_cake video I got from the link provided in 15th issue. The GOT video is src/video_fragment/output.mp4

@HTD1016
Copy link

HTD1016 commented Jul 9, 2024

I still don't know how to create the video fragment if I use my own video. There're no such functions that I can found in "Class chat". Maybe in "global mode", video fragment is also the original video? That means I need to store the same video in "video fragment path" as in the "video path"??

@Espere-1119-Song
Copy link
Collaborator

You just need to choose one video as the initialized video fragment at the beginning, and the others video fragments will be created automatically.

@HTD1016
Copy link

HTD1016 commented Jul 10, 2024

Thanks for the reply. I used the MovieChat package in PyPI (version 0.6.3), and I carefully checked the code in the package.
In /anaconda/envs/MovieChat/lib/python3.9/site-packages/MovieChat/models/chat_model.py:

for i in range(num_frames): 
    print(f"current processed frames: {i+1} / {num_frames}")
    video_fragment = self.parse_video_fragment(video_path=video_path, video_length=video_length, n_stage=i)         
    video_fragment, msg = self.load_video(
        video_path=fragment_video_path,
        n_frms=4, 
        height=224,
        width=224
    )
    video_fragment = self.vis_processor.transform(video_fragment) 
    video_fragment = video_fragment.unsqueeze(0).to(self.device)

where the function self.parse_video_fragment() is used for create the video fragment, then the next function self.load_video() can be able to read the video fragment in from fragment_video_path. But it can be seen from here that function self.parse_video_fragment() should save the video fragment locally.
Now take a look at the self.parse_video_fragment() function:

def parse_video_fragment(self, video_path, fragment_video_path, video_length, n_stage = 0):
    decord.bridge.set_bridge("torch")
    per_video_length = video_length / self.n_samples
    fragment_video = self.capture_video(video_path, per_video_length, n_stage)
    fragment_video.write_videofile(fragment_video_path)  # This code was added by me, as well as the parameter "fragment_video_path"
    return fragment_video

So I think there is a missing sentence of code here. After I added this sentence of code, the code can work normally. And I noticed that the author's code repository also provides a local version of MovieChat, which includes this sentence of code.
However, due to the time cost for the Moviepy to write videos, the inference time of the entire code also becomes very long

@Espere-1119-Song
Copy link
Collaborator

Thank you very much for discovering this issue. We will recheck our code and update the MovieChat package as soon as possible to resolve this problem.

@ywh187
Copy link

ywh187 commented Sep 2, 2024

for i in range(num_frames):
print(f"current processed frames: {i+1} / {num_frames}")
video_fragment = self.parse_video_fragment(video_path=video_path, video_length=video_length, n_stage=i)
video_fragment, msg = self.load_video(
video_path=fragment_video_path,
n_frms=4,
height=224,
width=224
)
video_fragment = self.vis_processor.transform(video_fragment)
video_fragment = video_fragment.unsqueeze(0).to(self.device)

I noticed that the video_fragment variable is assigned a value in line 3, but then immediately overwritten in line 4. It seems like the assignment in line 3 might be redundant since its value is not used before it's reassigned.

@Espere-1119-Song
Copy link
Collaborator

I understand what you mean. During implementation, we found that some versions of ffmpeg may not support initializing a blank video fragment, so we used an unrelated video clip for initialization.

@allent4n
Copy link

@HTD1016 You are just amazing!!!

@oximi123
Copy link

video_fragment stores the video clip read by the sliding window, and it will be created and automatically updated. Also I didn't find the GOT video, can u point out the exact path? We didn't upload Cooking_cake since it is too big to upload on Github.

Hi, I have two little questions for these two hyperparameters in run_inference_qa_msvd.py:

MAX_INT = 8 N_SAMPLES = 32

According to my understanding, does the N_SAMPLES specify how many fragments (or sliding windows) will be created for each video, and the MAX_INT specify how many frames we will use for encoding as LLM input for each fragment/sliding window?

@Espere-1119-Song
Copy link
Collaborator

Sorry for the confusion. N_SAMPLES specifies how many fragments (or sliding windows) will be created for each video. However, MAX_INT is not utilized in the current implementation. In our code, the number of frames included within each sliding window corresponds to the length of the short-term memory window used for encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants