Reproducing the Detectron implementation of RetinaNet
joshvarty / pytorch-retinanet Goto Github PK
View Code? Open in Web Editor NEWReproducing the Detectron implementation of RetinaNet
License: MIT License
Reproducing the Detectron implementation of RetinaNet
License: MIT License
This is most of what runs when you run
train_net.py --cfg /home/josh/git/detectron/configs/12_2017_baselines/retinanet_R-50-FPN_1x.yaml OUTPUT_DIR /tmp/detectron-output-debug
High level:
main()
train_model()
create_model()
model_builder.create(model_type_func, train=False, gpu_id=0)
DetectionModelHelper()
get_func(func_name)
retinanet(model)
build_generic_retinanet_model(model, add_conv_body_func, freeze_conv_body=False)
_single_gpu_build_func(model)
add_fpn_ResNet50_conv5_body(model)
add_fpn_onto_conv_body(model, conv_body_func, fpn_level_info_func, P2only=False)
add_ResNet50_conv5_body(model)
add_ResNet_convX_body(model, block_counts)
add_fpn(model, fpn_level_info)
get_min_max_levels()
add_topdown_lateral_module(model, fpn_top, fpn_lateral, fpn_bottom, dim_top, dim_lateral)
add_fpn_retinanet_outputs(model, blobs_in, dim_in, spatial_scales)
get_retinanet_bias_init(model)
add_fpn_retinanet_losses(model)
setup_model_for_training(model, weights_file, output_dir)
add_model_training_inputs(model)
combined_roidb_for_training(cfg.TRAIN.DATASETS, cfg.TRAIN.PROPOSAL_FILES)
get_roidb(dataset_name, proposal_file)
JsonDataset(self, name)
_init_keypoints(self)
get_roidb(self,gt=False,proposal_file=None,min_proposal_size=2,proposal_limit=-1,crowd_filter_thresh=0)
_prep_roidb(self,entry)
_add_gt_annotations(self,entry)
_add_class_assignments(roidb)
extend_with_flipped_entries(roidb, dataset)
filter_for_training(roidb)
is_valid(entry)
add_bbox_regression_targets(roidb)
computer_bbox_regression_targets(entry)
_compute_and_log_stats(roidb)
add_training_inputs(model, roidb=None)
RoIDataLoader(self, roidb, num_loaders=4, minibatch_queue_size=64, blobs_queue_capacity=8)
get_minibatch_blob_names(is_training=True)
_shuffle_roidb_inds()
create_threads()
get_minibatch_blob_names(is_training=True)
initialize_gpu_from_weights_file(model, weights_file, gpu_id=0)
load_object(file_name)
broadcast_parameters(model)
dump_proto_files(model, output_dir)
start(self, prefill=False)
TrainStats(self,model)
has_stopped()
get_lr_at_iter(it)
UpdateWorkspaceLr(self, cur_iter, new_lr)
_SetNewLr(self, cur_lr, new_lr)
_CorrectMomentum(self, correction):
RunNet
UpdateIterStats()
LogIterStats(self, cur_iter, lr)
The primary codepath starts a number of threads that loads images from disk in minibatches.
The minibatch loader codepath is much smaller, but the individual functions are often more involved and not always immediately clear.
minibatch_loader_thread(self)
get_next_minibatch()
_get_next_minibatch_inds()
_get_minibatch(roidb)
get_mini_batch_blob_names()
get_retinanet_blob_names(is_training=True)
_get_image_blob(roidb)
❗️prep_im_for_blob(im, pixel_means, target_size, max_size)
❗️im_list_to_blob(ims)
❗️add_retinanet_blobs(blobs, im_scales, roidb, image_width, image_height)
❗️get_field_of_anchors(stride, anchor_sizes, anchor_aspect_ratios, octave=None, aspect=None)
generate_anchors(stride, sizes, aspect_ratios)
❗️_generate_anchors(base_size, scales, aspect_ratios)
_ratio_enum(anchor, ratios)
_scale_enum(anchor, scales)
FieldOfAnchors()
❗️_get_retinanet_blobs(foas, all_anchors, gt_boxes, gt_classes, im_width, im_height)
bbox_overlaps(anchors, gt_boxes)
compute_targets(ex_rois, gt_rois, weights=(1.0, 1.0, 1.0, 1.0))
❗️bbox_transform_inv(boxes, gt_boxes, weights=(1.0, 1.0, 1.0, 1.0))
unmap(data, count, inds, fill=0)
coordinated_put(coordinator, queue, element)
Do we actually need unmap
in _get_retinanet_blobs()
? It doesn't seem to do anything...
I'd like to match our learning rate to Detectrons.
In the config they define:
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.00125
GAMMA: 0.1
MAX_ITER: 720000
STEPS: [0, 480000, 640000]
main()
train_model()
get_lr_at_iter(it)
lr_func_steps_with_decay(cur_iter)
get_step_index(cur_iter)
UpdateWorkspaceLr(it)
We're starting to get close but some differences remain. Currently my network occasionally gets exploding gradients near the start of training.
Let's start by taking a look at each model to ensure things look correct.
Since there's so much going on, we'll break it into different pieces and compare those one at a time.
I'm unsure where the information stored in im_info
is used. See: #2 (comment)
What happens when we delete this key, do things break?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.