cvlab-epfl / multiview_calib Goto Github PK

Single and multiple view camera calibration tool

Python 13.43% Jupyter Notebook 86.57%

extrinsic-calibration intrinsic-calibration multiview-geometry

multiview_calib's Introduction

Multiple view Camera calibration tool

This tool allows to compute the intrinsic and extrinsic camera parameters of a set of synchronized cameras with overlapping field of view. The intrinsics estimation is based on the OpenCV's camera calibration framework and it is used on each camera separately. In the extrinsics estimation, an initial solution (extrinsic parameters) is computed first using a linear approach then refined using bundle adjustment. The output are camera poses (intrinsic matrix, distortion parameters, rotations and translations) w.r.t. either the first camera or a global reference system.

Prerequisites

numpy
scipy
imageio
matplotlib
OpenCV

Installation

cd MULTIVIEW_CALIB_MASTER
pip install .

Usage

Intrinsics estimation

Compute intrinsic parameters:

Print the following checkerboard and make sure the rectangles are 3x3cm. If they are not make sure to remove any configuration of the printer i.e. autofit https://markhedleyjones.com/storage/checkerboards/Checkerboard-A4-30mm-8x6.pdf

The inner corner of the checkerboard are the calibration points.

Take a video of the checkerboard. The objective is to acquire a set of images (30-200, 2min video) of the checkerboard from different viewpoints by making sure that the distribution of the calibration points covers the whole image, corner comprises!

Extract the frames:

ffmpeg -i VIDEO -r 0.5 frames/frame_%04d.jpg

Run the following script:

python compute_intrinsics.py --folder_images ./frames -ich 6 -icw 8 -s 30 -t 24 --debug

The script outputs several useful information for debugging purposes. One of them is the per keypoint reprojection error, another the monotonicity of the distortion function. If the distortion function is not monotonic, we suggest to sample more precise points on the corner of the image first. If this is not enought, try the Rational Model (-rm) instead. The Rational Model is a model of the lens that is more adapted to cameras with wider lenses. To furter understand if the calibration went well, you should perform a visual inspection of the undistorted images that have been saved. The lines in the images should be straight and the picture must look like a normal picture. In case of failure try to update Opencv or re-take the video/pictures.

Extrinsics estimation

Synchronization:

In the case the phyiscal landmarks that you want to use to calibrate the camera are not static, you have to synchronize the cameras. We do this by extracting the frames from each one of the videos using the exact same frame rate (the higher the better), then, we look for a fast and recognizible event in the videos (like a hand clap) that allow us to remove the time offset in term of frame indexes from the sequences. Once the offset is removed you can then locate the landmarks in each sequence.

To extract the frames:

ffmpeg -i VIDEO -vf "fps=30" frames/frame_%06d.jpg

It is a good idea to extarct the frame arround the native frame rate. Increasing the fps w.r.t the original fps would not make the synchornization more precise.

Compute relative poses:

To recover the pose of each one of the cameras in the rig w.r.t. the first camera we first compute relative poses between pairs of views and then concatenate them to form a tree. To do so, we have to manually define a minimal set of pairs of views that connect every camera. This is done in the file setup.json.

-Note: do not pair cameras that are facing each other! Recovering proper geometry in this specifc case is difficult.

The file named landmarks.json contains precise image points for each view that are used to compute fundamental matrices and poses. The file ìntrinsics.json contains the intrinsic parameters for each view that we have computed previously. The file filenames.json contains a filename of an image for each view which are is used for visualisation purposes. Check section Input files for more details on the file formats.

python compute_relative_poses.py -s setup.json -i intrinsics.json -l landmarks.json -f filenames.json --dump_images

The result of this operation are relative poses up to scale (the translation vector is unit vector).

The following command is an alternative. It computes the final relative pose from view1 to view2 as an average of relative poses computed using N other and different views.

python compute_relative_poses_robust.py -s setup.json -i intrinsics.json -l landmarks.json -m lmeds -n 5 -f filenames.json --dump_images

Concatenate relative poses:

In this step we concatenate/chain all the relative poses to obtain an approximation of the actual camera poses. The poses are defined w.r.t the first camera. At every concatenation we scale the current relative pose to match the scale of the previous ones. This to have roughly the same scale for each camera. The file relative_poses.json is the output of the previous step.

python concatenate_relative_poses.py -s setup.json -r relative_poses.json --dump_images

Bundle adjustment:

Nonlinear Least squares refinement of intrinsic and extrinsic parameters and 3D points. The camera poses output of this step are up to scale. The file poses.json is the output of the previous step (Concatenate relative poses).

python bundle_adjustment.py -s setup.json -i intrinsics.json -e poses.json -l landmarks.json -f filenames.json --dump_images -c ba_config.json

Transformation to the global reference system:

The poses and 3D points computed using the bundle adjustment are all w.r.t. the first camera and up to scale. In order to have the poses in the global/world reference system we have to estimate the rigid transformation between the two reference systems. To do so we perform a rigid allignement of the 3D points computed using bundle adjustment and their corresponding ones in global/world coordinate (at least 4 non-symmetric points). These must be defined in the file landmarks_global.json and have the same ID of the points defined in landmarks.json. Note that there is no need to specify the global coordinate for all landmarks defined in landmarks.json; a subset is enough. Given these correspondeces, the following command will find the best rigid transform in the least squares sense between the two point sets and then update the poses computed by the bundle adjustment. The output are the update poses saved in global_poses.json. NOTE: make sure the points used here are not symmetric nor close to be symmetric as this implies multiple solutions whcih is not handeled!

python global_registration.py -s setup.json -ps ba_poses.json -po ba_points.json -l landmarks.json -lg landmarks_global.json -f filenames.json --dump_images

If the global landmarks are a different set of points than the one used during the optimization, you can use the following command to compute the ba_points.json.

python triangulate_image_points.py -p ba_poses.json -l landmarks.json --dump_images

Input files

The file setup.json contains the name of the views and the minimal number of pairs of views that allows to connect all the cameras togheter. minimal_tree is a tree and is single component, therefore, it cannot for loops and all views are connected.

{
 "views": [ "cam0", "cam1", "cam2", "cam3"], 
 "minimal_tree": [["cam0","cam1"], ["cam1","cam2"], ["cam3","cam0"]]
}

The file landmarks.json contains the image points use to compute the poses. An image point is the projection of a landmark that exist in the physcal space. A unique ID is associated to each landmark. If the same landmark is visible in other views the same ID should be used. If the landmark is a moving object, make sure your cameras are synchronized and that you assign a different ID from frame to frame. Have a look at the examples if this is not clear enough.

{
 "cam0":{"landmarks": [[530.1256, 877.56], [2145.5564, 987.4574], ..., [1023, 126]],
         "ids": [0, 1, ..., 3040]},
 ...
 "cam3":{"landmarks": [[430.1256, 377.56], [2245.5564, 387.4574], ..., [2223, 1726]], 
         "ids": [1, 2, ..., 3040]}         
}

The file landmarks_global.json contains 3D points defined in the "global" reference system. These defines the global location of all or a subset of the landmarks in the file landmarks.json. The IDs in this file must therefore allign with the IDs in the file landmarks.json but is not required that all landmarks have a global coordinate. The global points can be GPS coordinates in UTM+Altitude format or simply positions w.r.t. any other reference the you want. The global point can be noisy.

{
 "landmarks_global": [[414278.16, 5316285.59, 5], [414278.16, 5316285.59, 5.5], ..., [414278.16, 5316285.59, 5.2]],
 "ids": [0, 1, ..., 3040]}       
}

The file intrinsics.json contains the instrinsics parameters in the following format:

{
 "cam0": { "K": [[1798.760123221333, 0.0, 1947.1889719803005], 
                  [0.0, 1790.0624403935456, 1091.2910152343356],
                  [ 0.0, 0.0, 1.0]],
            "dist": [-0.22790810,0.0574260,0.00032600,-0.00047905,-0.0068488]},
 ...           
 "cam3": { "K": [[1778.560123221333, 0.0, 1887.1889719803005], 
                  [0.0, 1780.0624403935456, 1081.2910152343356],
                  [ 0.0, 0.0, 1.0]],
            "dist": [-0.2390810,0.0554260,0.00031600,-0.00041905,-0.0062488]}
}

The file filenames.json contains one filename for each view. It is used for visualisation purposes only:

{
 "cam0": "somewhere/filename_cam0.jpg",
 ...           
 "cam3": "somewhere/filename_cam3.jpg",
}

The file ba_config.json contains the configuration for the bundle adjustment. A typical configuration is the following:

{
  "each_training": 1
  "each_visualisation": 1,
  "th_outliers_early": 1000.0,
  "th_outliers": 50,
  "optimize_points": true,
  "optimize_camera_params": true,
  "bounds": true,  
  "bounds_cp": [ 
    0.3, 0.3, 0.3,
    2, 2, 2,
    10, 10, 10, 10,
    0.01, 0.01, 0, 0, 0
  ],
  "bounds_pt": [
    1000,
    1000,
    1000
  ],
  "max_nfev": 200,
  "max_nfev2": 200,
  "ftol": 1e-08,
  "xtol": 1e-08,  
  "loss": "linear",
  "f_scale": 1,
  "output_path": "output/bundle_adjustment/",
}

License

multiview_calib's People

Contributors

Stargazers

Watchers

Forkers

kelvintao chenxinfeng4 tuskaw tecount wangxihao xghe35 nivesh48 lcit farzad-ziaie

multiview_calib's Issues

feature: to create a visulization for cameras

I've created a better visulization for camera setups using plotly.

Would you like to merge this feature in your code? Or I keep it my own.

The code is something like this, and will be polished.

import numpy as np
import pickle
import plotly
from multiview_calib import CalibPredict #simple class for camera model (ba_poses)
import plotly.graph_objects as go
pklfile = '/mnt/liying.cibr.ac.cn_Data_Temp/xxx/2023-06-30_15-40-12carlball.matcalibpkl'
pkldata = pickle.load(open(pklfile, 'rb'))
calibobj = CalibPredict(pkldata) 
camp3d = calibobj.get_cam_pos_p3d()

plotly.offline.init_notebook_mode()



def get_cam_pose_vert(center, rotate):
    rotation_matrix = rotate.T
    scale = 20

    # 提取边缘的坐标
    edge_x = []
    edge_y = []
    edge_z = []

    vertices = [
        [-1, -1, 0],   # 底部中心点
        [1, -1, 0],   # 顶部顶点1
        [1, 1, 0],  # 底部顶点2
        [-1, 1, 0], # 底部顶点3
        [-1, -1, 3],   # 底部中心点
        [1, -1, 3],   # 顶部顶点1
        [1, 1, 3],  # 底部顶点2
        [-1, 1, 3], # 底部顶点3
        [-5, -3, 5],   # 底部中心点
        [5, -3, 5],   # 顶部顶点1
        [5, 3, 5],  # 底部顶点2
        [-5, 3, 5], # 底部顶点3
    ]
    vertices = np.array(vertices) * scale
    vertices[:,-1]
    edges = [
        (0, 1), (1, 2), (2, 3), (0, 3),  # 底部边缘
        (0+4, 1+4), (1+4, 2+4), (2+4, 3+4), (0+4, 3+4),   # 顶部的边缘
        (0, 4), (1, 1+4), (2, 2+4), (3, 3+4),   # 底到顶
        (0+8, 1+8), (1+8, 2+8), (2+8, 3+8), (0+8, 3+8),   # 顶部的边缘
        (4, 8), (5, 1+8), (6, 2+8), (7, 3+8),   # 顶部的边缘
    ]
    x, y, z = zip(*vertices)
    x, y, z = rotation_matrix @ np.array([x, y, z]) + center[:,None]
    # 提取边缘的坐标
    edge_x = []
    edge_y = []
    edge_z = []
    for s, e in edges:
        edge_x += [x[s], x[e], None]  # 在边缘两个顶点之间添加None，以绘制单独的线段
        edge_y += [y[s], y[e], None]
        edge_z += [z[s], z[e], None]
    
    return edge_x, edge_y, edge_z

nview_direct = calibobj.get_cam_direct_p3d() #X,Y,Z direction for camera skeleton.
nview_direct /= np.linalg.norm(nview_direct, axis=-1, keepdims=True)
cam_pos = calibobj.get_cam_pos_p3d()
edge_x, edge_y, edge_z = [], [], []
for i in range(9):
    edge_x_, edge_y_, edge_z_ = get_cam_pose_vert(camp3d[i], nview_direct[i])
    edge_x.extend(edge_x_)
    edge_y.extend(edge_y_)
    edge_z.extend(edge_z_)
    

trace2 = go.Scatter3d(
    x=edge_x,
    y=edge_y,
    z=edge_z,
    mode='lines',
    line=dict(color='blue', width=4)  # 线段属性
)

edge_x, edge_y, edge_z = [], [], []
for i in range(9):
    x,y,z=camp3d[i]
    edge_x.extend([x, x, None])
    edge_y.extend([y, y, None])
    edge_z.extend([z, 0, None])
    
trace3 = go.Scatter3d(
    x=edge_x,
    y=edge_y,
    z=edge_z,
    mode='lines',
    line=dict(color='black', width=2)  # 线段属性
)
    

#3. ball trace
ball_3d = np.squeeze(pkldata['keypoints_xyz_ba'])[32*30:41*30:3]
trace4 = go.Scatter3d(
    x=ball_3d[:,0],
    y=ball_3d[:,1],
    z=ball_3d[:,2],
    mode='markers',  # 设置为散点模式
    marker=dict(
        size=4,  # 散点大小
        color='#e87518',  # 颜色设置为橘黄色
        opacity=1  # 设置透明度
    )
)

# Create the figure and add both the trace and the grid to it
fig = go.Figure(data=[trace2,  trace3, trace4])

# Set the layout
fig.update_layout(
    width=1000,
    height=800,
    scene=dict(
        xaxis_title='',
        yaxis_title='',
        zaxis_title='',
        xaxis = dict(
            tickvals=list(range(-600, 600, 200))+[600],
            showticklabels=False   # Empty list to remove xtick labels
        ),
        yaxis = dict(
            tickvals=list(range(-600, 600, 200))+[600],
            showticklabels=False   # Empty list to remove ytick labels
        ),
        zaxis = dict(
            tickvals=list(range(0, 600, 200))+[600],
            showticklabels=False  # Empty list to remove ztick labels
        )
))
# Show the plot

# Render the plot.
plotly.offline.iplot(fig)

Question about the fixed_scale value

Hi, thanks for your contributions.
I am trying to use your calibration tool, but find the problem in the final global registration step. There is an error in Line 110-114 of "multiview_calib/multiview_calib/point_set_registration.py", i.e.,

def point_set_registration(src, dst, fixed_scale=None, verbose=True):
...
if fixed_scale is not None:

    _, R, t, _ = procrustes_registration(_src*fixed_scale, _dst)

    scale = fixed_scale

else:

    scale, R, t, _ = procrustes_registration(_src*fixed_scale, _dst)

Error messages:

File "/Users/wuchengpeng/HandGesture/Code/multiview_calib/multiview_calib/point_set_registration.py", line 114, in point_set_registration
scale, R, t, _ = procrustes_registration(_src*fixed_scale, _dst)
TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

I thick the if-else sentences may have some problems. Can you give some suggestions?

Cite your work in paper.

The multiview calibration code really works for me. I've customed some functions on my own to support further study.

So, I will give my thanks to your great work, and I want to cite the this. Do you have any paper related to this work?

HOW to export the fitted Bundle Adjustment Model to predict new_landmark.json

I have ground-true data to fit a Bundle Adjustment Model, and want to predict in the following dataset. How to do that.

build_input is too slow

I have 6-camera x 200k frames x 10 points to calculate the 3D points. The triangulate_image_points.py may take my 10 hours to finsh. Can you speed up the build_input by numpy functions?

Optimal intrinsic computation guideline

Hi @lcit / @YinlinHu / @etrulls ,
I'm using your repo for calibration but during Intrinsics computation with uniform keypoint detection distribution the reprojection error is coming very high ~ 3.88

Montonicity:

Intrinsics:
intrinsics.json

Detected keypoints:

Please suggest what should be the recipe to get best intrinsics result.

space coordinates.

thank you very much for your work
i've tested the code in the "simple box" part my objective is to have information on the max and min depth of the real scene after calibration to generate a file that gathers this shape and i'm very interested by the last part of the json Space file.

`{
"Cameras": [
{
"CameraId": 4,
"ExtrinsicParameters": {
"Rotation": [
-0.9194681518368479,
-0.21118621900988643,
0.33163036449945144,
-0.3897800999539663,
0.6000722379215305,
-0.698559076210253,
-0.051476124928036215,
-0.771565739374264,
-0.6340634971248846
],
"Translation": [
-2195.9698549389072,
2582.5504911658027,
2490.1038987563416
]
},
"IntrinsicParameters": {
"Fx": 307.77099609375,
"Fy": 307.788330078125,
"Cx": 319.5019938151042,
"Cy": 184.57267252604166
}
},
{
"CameraId": 1,
"ExtrinsicParameters": {
"Rotation": [
-0.4439462674972459,
0.460895247104731,
-0.7684317033878365,
0.8959696690776023,
0.24005144953184204,
-0.3736491050044587,
0.012250047682980547,
-0.8543716245135546,
-0.5195181070548308
],
"Translation": [
2161.74275709583,
1844.4107072755498,
2436.118150240347
]
},
"IntrinsicParameters": {
"Fx": 307.72467041015625,
"Fy": 307.7620035807292,
"Cx": 320.15032958984375,
"Cy": 182.17032877604166
}
},
{
"CameraId": 2,
"ExtrinsicParameters": {
"Rotation": [
0.36895802400698796,
0.5464693605319308,
-0.7518252553091587,
0.9273944609865834,
-0.16273486022980177,
0.3368336072861343,
0.06172106813934608,
-0.821516039566976,
-0.5668349905236343
],
"Translation": [
2510.495174013393,
-1799.946577189888,
2503.553348475469
]
},
"IntrinsicParameters": {
"Fx": 304.98834228515625,
"Fy": 304.95933024088544,
"Cx": 318.70200602213544,
"Cy": 182.6723429361979
}
},
{
"CameraId": 3,
"ExtrinsicParameters": {
"Rotation": [
0.6863996729195561,
-0.5275711519231292,
0.5005238942092867,
-0.7271213831076989,
-0.48629312754448173,
0.4845755754581046,
-0.012246764612468042,
-0.6965541427351567,
-0.7173997093636277
],
"Translation": [
-2700.9686290823065,
-2593.2363175675873,
2573.2628666117216
]
},
"IntrinsicParameters": {
"Fx": 305.26633707682294,
"Fy": 305.1856689453125,
"Cx": 317.8646647135417,
"Cy": 183.9866739908854
}
}
],
"Space": {
"MaxU": 2860,
"MinU": -4200,
"MaxV": 3180,
"MinV": -2960,
"MaxW": 2100,
"MinW": 100,
"VoxelSizeInMM": 20
}

}`

Question about the landmarks.json

Hi, Leonardo. Thanks for your explanations last time.
I am trying to use your calibration tool now. However, I am a little confused about the generation of "landmarks.json". From the tutorial, it seems I need to annotate thousands of points manually and align these points at multiple views, which keeps me from getting started. Can you give me some suggestions about how to obtain the "landmarks.json" file conveniently, or some other auxiliary tools? If so, I will appreciate it so much.

Fix a bug in estimate_scale_point_sets()

Fix a bug in multiview_calib/point_set_registration.py estimate_scale_point_sets().

def estimate_scale_point_sets(src, dst, max_est=50000):
    
    idxs = np.arange(len(src))
    np.random.shuffle(idxs)
    
    # computes cross ratios between all pairs of points
    scales = []
    for i, (j,k) in enumerate(itertools.combinations(idxs, 2)):
        d1 = np.linalg.norm(src[j]-src[k])
        d2 = np.linalg.norm(dst[j]-dst[k])
        scales.append(d2/d1)
        
        if i>max_est:
            break
        
    return np.nanmedian(scales), np.nanstd(scales)

Sometimes x/0.0 leads to ERROR, None, Inf. That will lead np.nanstd = nan. Using numpy can fix that.

2023-05-30 18:53:54,160 [root] Concatenating relative poses for pair: (1, 7)
2023-05-30 18:53:54,160 [root] Relative scale to (1, 6) n_points=342: 1.129+-nan

def estimate_scale_point_sets(src, dst, max_est=50000):
    
    idxs = np.arange(len(src))
    np.random.shuffle(idxs)
    
    # computes cross ratios between all pairs of points
    idx_pairs = np.array(list(itertools.combinations(idxs, 2)))
    d1 = np.linalg.norm(src[idx_pairs[:,0]]-src[idx_pairs[:,1]], axis=1)
    d2 = np.linalg.norm(dst[idx_pairs[:,0]]-dst[idx_pairs[:,1]], axis=1)
    scales = d2/d1
    scales_clean = scales[(~np.isnan(scales)) & (~np.isinf(scales)) & (scales<max_est)]
    return np.median(scales_clean), np.std(scales_clean)

I should create a pull request, but I'm not good at git.

Question about the descriptions in docs/main.pdf

First, in the last paragraph of page 3, you said the example checkerboard has 6 x 8 inner corners (also in the intrinsics estimation parameters "-ich 6 -ics 8"), but the caption of figure 4 points out the number of inner corners is 6 x 9, which is shown in the image above. So can you give me a clear answer which one is right?

Second, in the equation (6) of page 5, the projection function term in the object function of bundle adjustment is "Q(C_i, X_j)-x_j", which I think may be "Q(C_i, X_i)-X_j". I am not familiar with BA method, so may be you can re-check the equation.

Also, thanks for your immediate response about the issue yesterday.