facebookresearch / inversecooking Goto Github PK

Recipe Generation from Food Images

License: MIT License

Python 93.68% Jupyter Notebook 6.32%

inversecooking's Introduction

Inverse Cooking: Recipe Generation from Food Images

Code supporting the paper:

Amaia Salvador, Michal Drozdzal, Xavier Giro-i-Nieto, Adriana Romero. Inverse Cooking: Recipe Generation from Food Images. CVPR 2019

If you find this code useful in your research, please consider citing using the following BibTeX entry:

@InProceedings{Salvador2019inversecooking,
author = {Salvador, Amaia and Drozdzal, Michal and Giro-i-Nieto, Xavier and Romero, Adriana},
title = {Inverse Cooking: Recipe Generation From Food Images},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}

Installation

This code uses Python 3.6 and PyTorch 0.4.1 cuda version 9.0.

Installing PyTorch:

$ conda install pytorch=0.4.1 cuda90 -c pytorch

Install dependencies

$ pip install -r requirements.txt

Pretrained model

Download ingredient and instruction vocabularies here and here, respectively.
Download pretrained model here.

Demo

You can use our pretrained model to get recipes for your images.

Download the required files (listed above), place them under the data directory, and try our demo notebook src/demo.ipynb.

Note: The demo will run on GPU if a device is found, else it will use CPU.

Data

Download Recipe1M (registration required)
Extract files somewhere (we refer to this path as path_to_dataset).
The contents of path_to_dataset should be the following:

det_ingrs.json
layer1.json
layer2.json
images/
images/train
images/val
images/test

Note: all python calls below must be run from ./src

Build vocabularies

$ python build_vocab.py --recipe1m_path path_to_dataset

Images to LMDB (Optional, but recommended)

For fast loading during training:

$ python utils/ims2file.py --recipe1m_path path_to_dataset

If you decide not to create this file, use the flag --load_jpeg when training the model.

Training

Create a directory to store checkpoints for all models you train (e.g. ../checkpoints and point --save_dir to it.)

We train our model in two stages:

Ingredient prediction from images

python train.py --model_name im2ingr --batch_size 150 --finetune_after 0 --ingrs_only \
--es_metric iou_sample --loss_weight 0 1000.0 1.0 1.0 \
--learning_rate 1e-4 --scale_learning_rate_cnn 1.0 \
--save_dir ../checkpoints --recipe1m_dir path_to_dataset

Recipe generation from images and ingredients (loading from 1.)

python train.py --model_name model --batch_size 256 --recipe_only --transfer_from im2ingr \
--save_dir ../checkpoints --recipe1m_dir path_to_dataset

Check training progress with Tensorboard from ../checkpoints:

$ tensorboard --logdir='../tb_logs' --port=6006

Evaluation

Save generated recipes to disk with python sample.py --model_name model --save_dir ../checkpoints --recipe1m_dir path_to_dataset --greedy --eval_split test.
This script will return ingredient metrics (F1 and IoU)

License

inversecooking is released under MIT license, see LICENSE for details.

inversecooking's People

Contributors

Stargazers

Watchers

Forkers

github30 ai-jie01 jbdatascience jonathanfly karthickn210 amir22010 santhu45482 uwolzenburg wkryst toriola laksh9950 jdc08161063 shyamalschandra stjordanis josephch405 awesome-archive liuyang9536 qqq-tech longcuirong akashravichandran iidarkknightii shanyas10 chomolungma smsalaken codeaudit josephgesnouin shubhampachori12110095 pandinosaurus jimmyhsu1010 thecooltechguy sprinterzzj roza tarsbase allen15rg lynnsamsonww bluezj peterzhousz wenh123 gridl deep-learningg haibinzheng zhangmaomaomao jig21nesh repson theodoreaouad amaiasalvador ikitozen debasishmaji forrest-li anonymous0429 mikeismike leynard007 ganeshkumartk oevasque harshal-gawai mindaugasvaitkus2 jaiswalvik thorphan thanhduyluu escuccim eganlau reedguo balaprasanna smartmeat-unb wirehack tdevries lotayou7355608 vadimiljin mikewlange ghostlyfeng anupgoenka jingxian0320 dimodimi adhbh tomasbasham ding3820 icaros-usc reckrock riddhinn99 abjelodar jagadesh-ram ayynayak svycka ravitejabhukya cherylkkk altovate jochanhee chiragkyal riship99 cooksane ramaneswaran cndavy dli-ps varun-chitale cainiaoluo feiyang2010jin arjun23112 shafaypro xiangbaosong exarchou

inversecooking's Issues

Are you planing to release the code for other baselines for ingr_decoder

In the paper, you have multiple choices of baselines for ingr_decoder. Do you have plan to release these.

about training time

hello, how many are gpus used and how long is the training time?

thank you very much!

Tensor device mismatch

Getting this traceback from the 10th cell when running the notebook:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-11-ffb79fff26a4> in <module>
     25         with torch.no_grad():
     26             outputs = model.sample(image_tensor, greedy=greedy[i], 
---> 27                                    temperature=temperature, beam=beam[i], true_ingrs=None)
     28 
     29         ingr_ids = outputs['ingr_ids'].cpu().numpy()

D:\Users\Corentin\Documents\Programming\Projects\InverseCooking\src\model.py in sample(self, img_inputs, greedy, temperature, beam, true_ingrs)
    205                                                                   beam=-1,
    206                                                                   img_features=img_features, first_token_value=0,
--> 207                                                                   replacement=False)
    208 
    209             # mask ingredients after finding eos

D:\Users\Corentin\Documents\Programming\Projects\InverseCooking\src\modules\transformer_decoder.py in sample(self, ingr_features, ingr_mask, greedy, temperature, beam, img_features, first_token_value, replacement, last_token_value)
    355             # forward
    356             outputs, _ = self.forward(ingr_features, ingr_mask, torch.stack(sampled_ids, 1),
--> 357                                       img_features, incremental_state)
    358             outputs = outputs.squeeze(1)
    359             if not replacement:

D:\Users\Corentin\Documents\Programming\Projects\InverseCooking\src\modules\transformer_decoder.py in forward(self, ingr_features, ingr_mask, captions, img_features, incremental_state)
    295 
    296         # embed tokens and positions
--> 297         x = self.embed_scale * self.embed_tokens(captions)
    298 
    299         if self.embed_positions is not None:

E:\Python37\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    491             result = self._slow_forward(*input, **kwargs)
    492         else:
--> 493             result = self.forward(*input, **kwargs)
    494         for hook in self._forward_hooks.values():
    495             hook_result = hook(self, input, result)

E:\Python37\lib\site-packages\torch\nn\modules\sparse.py in forward(self, input)
    115         return F.embedding(
    116             input, self.weight, self.padding_idx, self.max_norm,
--> 117             self.norm_type, self.scale_grad_by_freq, self.sparse)
    118 
    119     def extra_repr(self):

E:\Python37\lib\site-packages\torch\nn\functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1504         # remove once script supports set_grad_enabled
   1505         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1506     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   1507 
   1508 

RuntimeError: Expected object of backend CPU but got backend CUDA for argument #3 'index'

My Compliments To The Chef

I don't think I've ever opened an issue to give a compliment. Well done. Love this project.

It's not only conceptually brilliant, it's modeled and coded well. And it works. Right away. You've provided all the tools to learn the research, which is the inherent point of research from the get go. And when we're researching things that work with code and are based on code, the code should bloody damn well work. So much is lost on so many when theory is not applied. That's why 80% of people hate math. Theory it taught to young. I digress, back to research.

Why this is never criteria for acceptance in a call for papers is beyond me. Corporate research is usually good, which is not taking away from the unique awesomeness of this project. But academic researchers should model their cs research on this. We schmucks out here in the private sector who deliver products or don't eat always release our open source ready to go. Should be one command or click deployment. Anyway, thanks for doing it right. And the dual encoders were spot on. Along with the seamless gpu/cpu integration.

I should start a company that does nothing but complete research papers with working commercial grade software. And for free. An Apache for the little guy. One offs. There has to be funding for that somewhere. In the days of "do no wrong today or yesterday", there has to be and entity with money that agrees. I work for a tiny startup reg a+ fund who gives away 10% of all earnings to keep people in their homes because we buy them outright. There is money is giving it away. After coding product after product for 25 years, it would be nice to build someone eases research for once. Yea, all ideas are stolen, but never branded that way.

I know, "wrong place for these comments". But we programmers are humans too. We're doing a damnedest to supplant them, but until then we need praise and justification for our work. I'll take it wherever i can. And they are never in my issue queue.

Keep up the good work ladies and gents.

Can't procced to backward.loss process

Found a bug in line 390

from src/modules/transformer_decoder.py logits = torch.stack(logits, 1).data --> logits = torch.stack(logits, 1)

As such, backward.loss process without the require_grad being True, creates error in training.

RuntimeError: expected device cpu and dtype Byte but got device cpu and dtype Bool

Hello, I get the following exception when running the last cell in the demo.ipynb.

I am using an Intel i7-4700HQ 8 threads, Arch Linux. I haven't changed any bit of the code. Is there something I should've done?

Thanks

During the test process, why input the ground truth recipe to generated recipes and evaluate it?

Dear author，I have a problem. During the test process, why input the ground truth recipe to generated recipes and evaluate it?
why not use the recipes in the sample process and calculate its perplexity？

Requires_grad is False

Hi,

I guess the code needs to have some change in line 390 from src/modules/transformer_decoder.py
logits = torch.stack(logits, 1).data --> logits = torch.stack(logits, 1)

This will enable the require_grad to remain true. The training cannot do the backward.loss process without the require_grad being True.

Return instruction metrics (F1 and IoU)

Would you please release instruction metrics (F1 and IoU).

Undefined name 'attn_dict' in transformer_decoder.py

flake8 testing of https://github.com/facebookresearch/inversecooking on Python 3.7.1

$ flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics

./src/modules/transformer_decoder.py:319:17: F821 undefined name 'attn_dict'
                attn_dict[key][p] = attn[key]
                ^
1     F821 undefined name 'attn_dict'
1

Uploading the `args.pkl` used to build the `modelbest.ckpt`

Hi, I would like to use the model checkpoint and fine tune it on my data. However, I am unable to load the state dictionary, merge_model, because I don't know what args.pkl is used to construct the modelbest.ckpt. Is it possible for the args.pkl to be uploaded as well? Thank you so much!

perminv_ingrs argument not found in the code

What is the use of perminv_ingrs argument? And in case it is useful, it is not included in args.py. train.py is only working without this argument.