Coder Social home page Coder Social logo

qabot-zh / pal_pepper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from soseazi/pal_pepper

0.0 1.0 0.0 18.04 MB

Perception-Action-Learning System for Mobile Social-Service Robots using Deep Learning

Home Page: https://goo.gl/Pxnf1n

License: GNU General Public License v3.0

Python 4.90% Lua 11.25% Shell 0.08% Jupyter Notebook 74.62% CSS 0.03% JavaScript 0.33% HTML 0.25% CMake 0.50% C++ 5.44% Objective-C 0.01% C 2.61%

pal_pepper's Introduction

pal_pepper

This is the code for the paper

Perception-Action-Learning System for Mobile Social-Service Robots using Deep Learning

This code has been used in Robocup@Home2017 and won 1st Place in Social Standard Platform (SSPL) AUPAIR - https://www.robocup2017.org/file/awards/Awards_RoboCup_athome.pdf

Youtube link: https://goo.gl/Pxnf1n

If you find this code useful in your research, please cite: (will be available after Feb, 06, 2018)

@inproceedings{PALs2018,
  title={Perception-Action-Learning System for Mobile Social-Service Robots using Deep Learning},
  author={Lee, Beom-Jin and  Choi, Jinyoung and Lee, Chung-Yeon and Park, Kyung-Wha and Choi, Sungjun and Han, Cheolho and Han, Dong-Sig  and Baek, Christina and Emaase, Patrick and Zhang, Byoung-Tak },
  booktitle={AAAI},
  year={2018}
}

Upgraded version of perception module is on: https://github.com/gliese581gg/IPSRO

PAL System Ver 1.0 (2017.11.18) by Beom-Jin Lee

Use tmux!

install tmux

sudo apt-get update
sudo apt-get install -y python-software-properties software-properties-common
sudo add-apt-repository -y ppa:pi-rho/dev
sudo apt-get update
sudo apt-get install -y tmux=2.0-1~ppa1~t

Requirements

  • UBUNTU 14.04 for ROS-INDIGO
  • After cloning this repository
-- download https://pjreddie.com/media/files/yolo.weights
-- put it into python_script/bin
-- download http://posefs1.perception.cs.cmu.edu/Users/ZheCao/pose_iter_440000.caffemodel
-- put it into python_script/pose_model/model/_trained_COCO
-- download http://posefs1.perception.cs.cmu.edu/Users/ZheCao/pose_iter_146000.caffemodel
-- put it in src/pose_model/model/_trained_MPI/
-- download https://www.dropbox.com/s/ae071mfm2qoyc8v/pose_model.pth?dl=0
-- put it in src/pose_model/
-- download http://cs.stanford.edu/people/jcjohns/densecap/densecap-pretrained-vgg16.t7.zip
-- unzip it
-- put unzipped file in src/captioning_model/data/models/densecap/

Install below from their websites.

  • NVIDIA driver 375.20
  • CUDA 8.0
  • CUDNN 5.1
  • Tensorflow r1.1

Install below.

  • Torch from http://torch.ch/docs/getting-started.html (choose 'yes' when installer asks something about path)

  • SpeechRecognition

    pip install SpeechRecognition
    
  • ROS indigo desktop full from http://blog.naver.com/gliese581gg/220645607537 or http://wiki.ros.org/indigo/Installation/Ubuntu

  • naoqi (add below to ~/.bashrc)

    export PYTHONPATH=${PYTHONPATH}:~/'YOUR_PATH'/naoqi/lib/python2.7/site-packages
    
  • sshpass

    apt-get install sshpass
    
  • others

    sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose
    
    pip install Cython
    
    sudo apt-get install libhdf5-dev libblas-dev liblapack-dev gfortran
    
    pip install h5py
    
    pip install keras
    
    darkflow (https://github.com/thtrieu/darkflow) (intall option 3)
    
    apt-get install ros-indigo-navigation
    
    apt-get install ros-indigo-gmapping
    
    apt-get install ros-indigo-pepper-*
    
    luarocks install nn
    
    luarocks install image
    
    luarocks install lua-cjson
    
    luarocks install https://raw.githubusercontent.com/qassemoquab/stnbhwd/master/stnbhwd-scm-1.rockspec
    
    luarocks install https://raw.githubusercontent.com/jcjohnson/torch-rnn/master/torch-rnn-scm-1.rockspec
    
    luarocks install cutorch
    
    luarocks install cunn
    
    luarocks install cudnn
    
    luarocks install md5
    
    luarocks install --server=http://luarocks.org/dev torch-ros
    
    pip install http://download.pytorch.org/whl/cu80/torch-0.1.12.post2-cp27-none-linux_x86_64.whl 
    
    pip install torchvision
    
    pip install nltk
    
    (open python and) import nltk ; nltk.download('punkt')
    
    pip install pattern
    
    pip install google-cloud
    
    
  • tmux

    sudo apt-get update
    
    sudo apt-get install -y python-software-properties software-properties-common
    
    sudo add-apt-repository -y ppa:pi-rho/dev
    
    sudo apt-get update
    
    sudo apt-get install -y tmux=2.0-1~ppa1~t
    

Catkin Compile

merge the included catkin_ws src to yours and catkin_make

merge catkin_ws src
catkin_make

How to use PIO

Import PIO modules

import pepper_io

pio = pepper_io.pepper_io()
  • Note that only 1 pio instance should be made. Do not make multiple instances
  • You must put your script in the included 'python_script' folder

Before you run it

  • Do every command with sudo or become root.
sudo -s
  • Launch modules using Tmux

 # 1. launch modules
cd {YOUR_PATH}/python_scripts
./run_pepper.sh {pepper's ip}

 #(example : ./run_pepper.sh 192.168.1.176)

 # 2. run your script in most right screen
python ~~.py --ip 192.168.1.176

  • Manually run each module
 #1. Start Driver
roslaunch pal_pepper pepper_start_jy.launch nao_ip:=192.168.1.~


 #2. Start navigation module
roslaunch pal_pepper pepper_navigation.launch map_file:=full/path/to/your_yaml_file 
or edit pepper_navigation.launch's default map loading file
roslaunch pal_pepper pepper_navigation.launch

 #You will see 2D pose Estimate. Set robot location and move it with 2D Nav Goal to some place.
 #Pepper will find where it is.

 #3. run deep learning modules
cd {YOUR_PATH}/python_scripts
python obj_detector.py --ip 192.168.1.~~
python reid_module.py
python pose_detector.py
cd captioning
th run_ros2.lua

Object class

  • See objs.msg to see which types are supported. For example,int32 x means x position.

    string class_string : object class
    int32 x : center of X of bounding box
    int32 y : center Y of bounding box
    int32 h : height of bounding box
    int32 w : width of bounding box
    float64 confidence : class score
    sensor_msgs/Image cropped : cropped image
    sensor_msgs/Image cropped_cloud : cropped point cloud
    geometry_msgs/Pose pose_wrt_robot : position of the object wrt the robot
    geometry_msgs/Pose pose_wrt_map : position of the object wrt the map
    geometry_msgs/Pose pose_wrt_odom : position of the object wrt the odometry
    int32 person_id : id of a person after person identification
    string person_name : (deprecated) name of a person
    float64 reid_score : confidence of re-identifiation of a person
    int32 isWaving : (deprecated) will be 1 if a person is waving his/her hands, 0 o.w.
    int32 isSitting : (deprecated) will be 1 if a person is sitting 
    int32 isRwaving : (deprecated) 1 if the person waving right hand
    int32 isLwaving : (deprecated) 1 if the person waving right hand
    int32 isLying : (deprecated) 1 if the person is lying
    int32 isLpointing : (deprecated) 1 if the person pointing right hand
    int32 isRpointing : (deprecated) 1 if the person pointing right hand
    string ucolor : (deprecated) color of the person's upper cloth
    string lcolor : (deprecated) color of the person's lower cloth
    float64[] joints : list of floats, represents the position of joints the indexes are for
    
    0 1 nose 2 3 neck 4 5 r_shoulder 6 7 r_elbow 8 9 r_wrist 10 11 l_shoulder 12 13 l_elbow 14 15 l_wrist 16 17 r_pelvis 18 19 r_knee 20 21 r_anckle 22 23 l_pervis 24 25 l_knee 26 27 l_ankle 28 29 r_eye 30 31 l_eye 32 33 r_ear 34 35 l_ear
    
    string[] captions : captions generated by captioning module
    string[] tags : contains useful tags for objects
    
    in current version, tags contain class of the object, 'waving','rwaving','lwaving','man','woman','sitting','lying','lpointing','rpointing','blue','green','red','white','black',person's name
    

    Deprecated variables are still working but I highly recommend to use the 'tags' instead of them

  • See obj_array.msg.

    Header header : Contains timestamp and ETC, No need to modify
    objs[] objects : List of *objs.msg*
    int32 msg_idx : index of message (used for perception integration)
    sensor_msgs/Image scene_rgb : rgb image of whole scene
    sensor_msgs/Image scene_cloud : point cloud of whole scene
    
  • And pose class which is returned from pos_wrt_map , pose_wrt_robot and pose_wrt_odom.

    # geometry_msgs/Pose
    Point position
    Quaternion orientation
    

    position consists of

    float64 x
    float64 y
    float64 z
    

    orientaiton consists of

    float64 x
    float64 y
    float64 z
    float64 w
    

Example python script

def main():
    pio = pepper_io(ip='192.168.1.176')
    test_objects = pio.get_perception()
    header = test_objects.header
    objets = test_objects.objects
    for obj in objects:
        print obj.class_string # print class of the object.
        pose = obj.pos_wrt_map # print position wrt map
        position_x = pose.position.x
        orientation_x = pose.orientation.x
        # cropped image is rosimg. you have to convert this to numpy img.
        cropped_image = pio.rosimg_to_numpyimg(obj.cropped) # this is numpy image
        tags = obj.tags
        
    return None

Functions

get perception

pio.get_perception(fil = None,reid=True,pose=True,captioning=True,time_limit = 3.0)
  • fil: list of classes to find. find all classes if None (example : fil=['person','chair'])
  • reid,pose,captioning : if True, output contains those information
  • time_limit : return empty obj_array if latest information is older than this seconds

get integrated perception (objects, people's name, pose, captions)

returns objs_array instance.

get object information from individual modules (deprecated)

pio.get_objects(fil = None,time_limit = 3.0)
pio.get_people_identified(waving_only=False,,time_limit = 3.0)
pio.get_people_wavings(name = None,time_limit = 3.0)
  • fil: list of classes to find. find all classes if None (example : fil=['person','chair'])
  • waving_only : if True, only waving people will be returned
  • name : list of strings, if not None, only find specified people
  • time_limit : return empty obj_array if latest information is older than this seconds

get object information from individual modules (object detection, identification, pose detection)

get captions of whole scene

pio.get_captions()

get captions of whole scene. captions for people are automatically stored in obj.captions so don't use this to extract people's captions

save waypoints

pio.save_waypoint(filename)
  • filename: a file name without path. E.g.: tour_guid.txt

save waypoints to a file.

load waypoints

pio.load_waypoints(filename)
  • filename: a file name without path. E.g.: tour_guid.txt

reset current waypoints and load waypoints from a file.

add a waypoint

pio.add_waypoint(name, location=None)
  • name: sring
  • location: a tuple, list or numpy array consts of three floats for x, y, direction in a map. The default value is the robot's location.

add a waypoint.

go to a waypoint

pio.go_to_waypoint(name, wait=True, clear_costmap=False)
  • name: sring
  • wait: the function returns immediately if False. else blocked until the robot gets to the goal.
  • clear_costmap: reset temporally detected obstacles if True, else use the previous costmap.

go to a waypoint.

get location

pio.get_loc(p=np.array([0,0,0],o=np.array[0,0,0,0],source='CameraTopFrame',target='map')
  • p: position tuple, list or numpy array consts of three floats for x, y, z wrt the robot. z must always be 0.
  • o: orientation tuple, list or numpy array consts of four floats for x, y, z, w wrt the robot. this is quaternion coordinate system
  • source: source frame
  • target: target frame

convert the location in source coordinate system to the location in target coordinate system

orientation must be quaternion (see also pio.yaw_to_quat and pio.quat_to_yaw)

global localization

pio.global_localization()

stabilize the robot's position by rotating several times.

automatic speech recognition

pio.init_speech_recognition(sensitivity=0.3)
  • sensitivity: 0 ~ 1

initialize speech recognition. Afther this, speeches are detected countinousely.

See also

pio.set_sound_sensitivity(sensitivity=0.9)

manual speech recognition

pio.start_recording(reset=False, base_duration=3.0)
  • reset: start a new recoding session for True, or add to an existing one if False
  • 'base_duration': minimum length of duration. Only for reset is True.

Detect a speech from now manually.

get rgb

pio.get_rgb()

Returns camera image to numpy array.

See also pio.get_depth() and pio.get_point_cloud.

The size of image is always 320(width) * 240(height).

get depth

pio.get_depth()

returns depth image numpy array

get point cloud

pio.get_point_cloud()

returns point cloud numpy array

get position from point cloud

pio.get_pos_wrt_robot(x,y,size=10,scan_len=50,scan='point')
  • x, y: position of a pixel of interest
  • size : if scan is 'point', the function returns nearest point in surrounding size area of x,y
  • scan_len : if scan is 'line', the function returns nearest point in vertical line from x,y to x+scan_len,y
  • scan : 'point' or 'line'

Get a position of specific pixel in meters.

scan surrounding area or vertical line.

animate a gesture

pio.do_anim(command)
 #example : pio.do_anim('Gestures/Hey_1')
  • command: a name of the gesture in string.

Animate a gesture. The kinds of gestures are from http://doc.aldebaran.com/2-4/naoqi/motion/alanimationplayer-advanced.html#alanimationplayer-advanced.

change the posture of the robot

pio.do_pose(command)
  • command: a name of the posture in string among Crouch, LyingBack, LyingBelly, Sit, SitRelax, Stand, StandInit, StandZero.

Change the posture of the robot.

move to a goal

pio.go_to_goal(x, y, theta, wait=True, clear_costmap=False)
  • x, y, theta: the position of the goal wrt the map. theta is in radian.
  • wait: the function returns immediately if False. else blocked until the robot gets to the goal.
  • clear_costmap: reset temporally detected obstacles if True, else use the previous costmap.

Move the robot to a location.

set velocity

pio.set_velocity(x y, theta, duration=-1.)
  • x, y: the velocity of forward-backward diretion and left-right direction.
  • theta: the angular velocity of rotation.
  • duration: the duration of the movement. NOTE: if not specified, the robot moves forever in the specified direction. Colision!

Move the robot in a speciiec direction for a duration and velocity.

stop

pio.stop()

Stop ongoing movement.

keyboard control

pio.activate_keyboard_contol()

Control the robot using a keyboard.

usage : type following command and press enter.
		robot will maintain velocity unless you give another command.
commands:
	 w : forward
	 s : stop
	 x : backward
	 a : strafe left
	 d : strafe right
	 q : turn left
	 e : turn right
	 say : say next input
	 waypoint : add current location to waypoints
	 save_waypoint : save all waypoints
	 add_waypoint : add current location to waypoints
	 go_waypoint : go to waypoint
	 add_reid_target : add reid target
	 save_reid_targets : save reid target
	 sr : speech recognition
	 c : exit

say something

pio.say(text)
  • text: string to say.

Say a text. Speech recognition is paused during saying.

convert image

pio.rosimg_to_numpyimg(img_msg)
  • img_msg: ros image to convert.

Convert ros image to numpy image.

See also

pio.numpyimg_to_rosimg(npimg)

find a word

pio.find_ord(word, source=None)
  • word: a word to find.
  • source: the source of speech to find. Use the latest one if not specified.

Find a word from a recorded speech.

follow a person

pio.follow_person(self,target,target_dist=1.0,fail_dist_threshold=1.0,dist_strict=False, \
					  timeout=60,stop_criterion='dist',use_reid=False,reid_name=None,stop_word='stop',\
					  reid_strict=False,reid_add=False,score_threshold=-10,max_fail_count = 20, short_mode_thr = 1.5)
  • target: a target to follow in objs instance.
  • target_dist: distance in meters.
  • timeout: fails if the stop criterion is not satisfied until timeout.
  • stop_criterion: 'dist' for distance or 'speech' for a word.
  • use_reid: follow the closest person in location if False, follow the identified person if True.
  • reid_name: follow the person whoes name is reid_name, else name the person as follow.
  • stop_word: the word to make the robot stop following if stop_criterion is 'speech'.

Follow a person until the stop criterion is satisfied.

Important

To use Google Speech Recognition, you will be needing a google account. 
Link: https://cloud.google.com/speech/docs/auth
Add the authorized jason file to speech_auth folder
Unindent self.speech_client = speech.Client.from_service_account_json('speech_auth/xxx.json') in pepper_io.py

Attribute

pio.speech_memory

the most recently recognized speech in the string.

pio.speech_hints

A list of strings used for speech recognition as hints.

pal_pepper's People

Contributors

soseazi avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.