Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Merge develop/pr2-noetic branch #444

Closed
wants to merge 20 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
6072701
[respeaker_ros] Publish audio_info
iory Jun 11, 2022
3f08b74
[respeaker_ros] Enable speech to text for multi channel.
iory Jun 11, 2022
04f6e1a
[respeaker_ros] Add audio_info arg in sample
iory Jun 11, 2022
5bbed14
[respeaker_ros] Add publish_multichannel option for publishing multi …
iory Jun 11, 2022
3340177
[respeaker_ros] Restore audio input to use argment's audio
iory Jun 12, 2022
d662b2c
[respeaker_ros] Add publish_multichannel option to fix launch file er…
iory Jun 13, 2022
6efa2fe
[respeaker_ros] Add roslaunch_add_file_check to test launch file format
iory Jun 13, 2022
b85bea5
[respeaker_ros] Set audio_info topic name and check length of it.
iory Jun 13, 2022
8bad183
[respeaker_ros] Add comment for why we add publish_multichannel option.
iory Jun 13, 2022
bb8d365
[respeaker_ros] Remove publish_multichannel option and publish raw mu…
iory Jun 13, 2022
6ed3f11
[respeaker_ros] Publish spech_audio_raw
iory Jun 13, 2022
d56eed7
[respeaker_ros] Add comment to know more defails
iory Jun 13, 2022
129a581
[respeaker_ros] Publish speech audio raw
iory Jun 13, 2022
cd490fb
[respeaker_ros] Add parameters for respaker ros
iory Jun 13, 2022
d6b1476
[respeaker_ros] Fixed publishing audio topic's namespace
iory Jun 13, 2022
9bb766f
[respeaker_ros] Add parameters for speech_to_text
iory Jun 13, 2022
5b6a376
[respeaker_ros] Avoid AudioInfo import for backward compatibility
iory Jun 14, 2022
6ac8a40
[respeaker_ros] Fixed bytes calculation 'self.n_channe - 2' -> '(self…
iory Jun 14, 2022
defbba8
use sub and not delete
knorth55 Mar 1, 2023
1e494d8
fix typo in dialogflow_client.py
knorth55 Mar 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ def speak_result(self, result):
volume=self.volume)

# for japanese or utf-8 languages
if self.language == 'ja-JP' and sys.version <= 2:
if self.language == 'ja-JP' and sys.version_info.major <= 2:
msg.arg = result.fulfillment_text.encode('utf-8')
else:
msg.arg = result.fulfillment_text
Expand Down
2 changes: 2 additions & 0 deletions respeaker_ros/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ catkin_install_python(PROGRAMS ${PYTHON_SCRIPTS}

if(CATKIN_ENABLE_TESTING)
find_package(rostest REQUIRED)
find_package(roslaunch REQUIRED)
add_rostest(test/sample_respeaker.test
DEPENDENCIES ${PROJECT_NAME}_generate_virtualenv
)
roslaunch_add_file_check(launch/sample_respeaker.launch)
endif()
145 changes: 145 additions & 0 deletions respeaker_ros/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,151 @@ A ROS Package for Respeaker Mic Array
a: 0.3"
```

## Parameters for respeaker_node.py

- ### Publishing topics

- `audio` (`audio_common_msgs/AudioData`)

Processed audio for ASR. 1 channel.

- `audio_info` (`audio_common_msgs/AudioInfo`)

Audio info with respect to `~audio`.

- `audio_raw` (`audio_common_msgs/AudioData`)

Micarray audio data has 4-channels. Maybe you need to update respeaker firmware.

If the firmware isn't supported, this will not be output.

- `audio_info_raw` (`audio_common_msgs/AudioInfo`)

Audio info with respect to `~audio_raw`.

If the firmware isn't supported, this will not be output.

- `speech_audio` (`audio_common_msgs/AudioData`)

Audio data while a person is speaking using the VAD function.

- `speech_audio_raw` (`audio_common_msgs/AudioData`)

Audio data has 4-channels while a person is speaking using the VAD function.

If the firmware isn't supported, this will not be output.

- `audio_merged_playback` (`audio_common_msgs/AudioData`)

Data that combines the sound of mic and speaker.

If the firmware isn't supported, this will not be output.

For more detail, please see https://wiki.seeedstudio.com/ReSpeaker_Mic_Array_v2.0/

- `~is_speeching` (`std_msgs/Bool`)

Using VAD function, publish whether someone is speaking.

- `~sound_direction` (`std_msgs/Int32`)

Direction of sound.

- `~sound_localization` (`geometry_msgs/PoseStamped`)

Localized Sound Direction. The value of the position in the estimated direction with `~doa_offset` as the radius is obtained.

- ### Parameters

- `~update_rate` (`Double`, default: `10.0`)

Publishing info data such as `~is_speeching`, `~sound_direction`, `~sound_localization`, `~speech_audio` and `~speech_audio_raw`.

- `~sensor_frame_id` (`String`, default: `respeaker_base`)

Frame id.

- `~doa_xy_offset` (`Double`, default: `0.0`)

`~doa_offset` is a estimated sound direction's radius.

- `~doa_yaw_offset` (`Double`, default: `90.0`)

Estimated DoA angle offset.

- `~speech_prefetch` (`Double`, default: `0.5`)

Time to represent how long speech is pre-stored in buffer.

- `~speech_continuation` (`Double`, default: `0.5`)

If the time between the current time and the time when the speech is stopped is shorter than this time,
it is assumed that someone is speaking.

- `~speech_max_duration` (`Double`, default: `7.0`)

- `~speech_min_duration` (`Double`, default: `0.1`)

If the speaking interval is within these times, `~speech_audio` and `~speech_audio_raw` will be published.

- `~suppress_pyaudio_error` (`Bool`, default: `True`)

If this value is `True`, suppress error from pyaudio.

## Parameters for speech_to_text.py

- ### Publishing topics

- `~speech_to_text` (`speech_recognition_msgs/SpeechRecognitionCandidates`)

Recognized text.

- ### Subscribing topics

- `audio` (`audio_common_msgs/AudioData`)

Input audio.

- ### Parameters

- `~audio_info` (`String`, default: ``)

audio_info (`audio_common_msgs/AudioInfo`) topic. If this value is specified, `~sample_rate`, `~sample_width` and `~channels` parameters are obtained from the topic.

- `~sample_rate` (`Int`, default: `16000`)

Sampling rate.

- `~sample_width` (`Int`, default: `2`)

Sample with.

- `~channels` (`Int`, default: `1`)

Number of channels.

- `~target_channel` (`Int`, default: `0`)

Target number of channel.

- `~language` (`String`, default: `ja-JP`)

language of speech to text service. For English users, you can specify `en-US`.

- `~self_cancellation` (`Bool`, default: `True`)

ignore voice input while the robot is speaking.

- `~tts_tolerance` (`String`, default: `1.0`)

time to assume as SPEAKING after tts service is finished.

- `~tts_action_names` (`List[String]`, default: `['sound_play']`)

If `~self_chancellation` is `True`, this value will be used.

When the actions are active, do nothing with the callback that subscribes to `audio`.

## Use cases

### Voice Recognition
Expand Down
6 changes: 5 additions & 1 deletion respeaker_ros/launch/sample_respeaker.launch
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,17 @@
<arg name="language" default="en-US"/>
<!-- self cancellation -->
<arg name="self_cancellation" default="true"/>
<!-- audio info topic name -->
<arg name="audio_info" default="audio_info"/>

<node if="$(arg publish_tf)"
name="static_transformer" pkg="tf" type="static_transform_publisher"
args="0 0 0 0 0 0 1 map respeaker_base 100"/>

<node if="$(arg launch_respeaker)"
name="respeaker_node" pkg="respeaker_ros" type="respeaker_node.py"
respawn="true" respawn_delay="10" />
respawn="true" respawn_delay="10" >
</node>

<node if="$(arg launch_soundplay)"
name="sound_play" pkg="sound_play" type="soundplay_node.py"/>
Expand All @@ -30,6 +33,7 @@
<remap from="audio" to="$(arg audio)"/>
<remap from="speech_to_text" to="$(arg speech_to_text)"/>
<rosparam subst_value="true">
audio_info: $(arg audio_info)
language: $(arg language)
self_cancellation: $(arg self_cancellation)
tts_tolerance: 0.5
Expand Down
1 change: 1 addition & 0 deletions respeaker_ros/package.xml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
<exec_depend>flac</exec_depend>
<exec_depend>geometry_msgs</exec_depend>
<exec_depend>std_msgs</exec_depend>
<exec_depend>sound_play</exec_depend>
<exec_depend>speech_recognition_msgs</exec_depend>
<exec_depend>tf</exec_depend>
<exec_depend condition="$ROS_PYTHON_VERSION == 2">python-numpy</exec_depend>
Expand Down
103 changes: 94 additions & 9 deletions respeaker_ros/scripts/respeaker_node.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,13 @@
import sys
import time
from audio_common_msgs.msg import AudioData
enable_audio_info = True
try:
from audio_common_msgs.msg import AudioInfo
except Exception as e:
rospy.logwarn('audio_common_msgs/AudioInfo message is not exists.'
' AudioInfo message will not be published.')
enable_audio_info = False
from geometry_msgs.msg import PoseStamped
from std_msgs.msg import Bool, Int32, ColorRGBA
from dynamic_reconfigure.server import Server
Expand Down Expand Up @@ -265,7 +272,6 @@ def __init__(self, on_audio, channel=0, suppress_error=True):
if self.channels != 6:
rospy.logwarn("%d channel is found for respeaker" % self.channels)
rospy.logwarn("You may have to update firmware.")
self.channel = min(self.channels - 1, max(0, self.channel))

self.stream = self.pyaudio.open(
input=True, start=False,
Expand Down Expand Up @@ -295,9 +301,8 @@ def stream_callback(self, in_data, frame_count, time_info, status):
data = np.frombuffer(in_data, dtype=np.int16)
chunk_per_channel = int(len(data) / self.channels)
data = np.reshape(data, (chunk_per_channel, self.channels))
chan_data = data[:, self.channel]
# invoke callback
self.on_audio(chan_data.tobytes())
self.on_audio(data)
return None, pyaudio.paContinue

def start(self):
Expand Down Expand Up @@ -333,21 +338,83 @@ def __init__(self):
self.pub_doa_raw = rospy.Publisher("sound_direction", Int32, queue_size=1, latch=True)
self.pub_doa = rospy.Publisher("sound_localization", PoseStamped, queue_size=1, latch=True)
self.pub_audio = rospy.Publisher("audio", AudioData, queue_size=10)
if enable_audio_info is True:
self.pub_audio_info = rospy.Publisher("audio_info", AudioInfo,
queue_size=1, latch=True)
self.pub_audio_raw_info = rospy.Publisher("audio_info_raw", AudioInfo,
queue_size=1, latch=True)
self.pub_speech_audio = rospy.Publisher("speech_audio", AudioData, queue_size=10)
# init config
self.config = None
self.dyn_srv = Server(RespeakerConfig, self.on_config)
# start
self.respeaker_audio = RespeakerAudio(self.on_audio, suppress_error=suppress_pyaudio_error)
self.n_channel = self.respeaker_audio.channels

self.speech_prefetch_bytes = int(
self.speech_prefetch * self.respeaker_audio.rate * self.respeaker_audio.bitdepth / 8.0)
1
* self.speech_prefetch
* self.respeaker_audio.rate
* self.respeaker_audio.bitdepth / 8.0)
self.speech_prefetch_buffer = b""
self.respeaker_audio.start()
self.info_timer = rospy.Timer(rospy.Duration(1.0 / self.update_rate),
self.on_timer)
self.timer_led = None
self.sub_led = rospy.Subscriber("status_led", ColorRGBA, self.on_status_led)

# processed audio for ASR
if enable_audio_info is True:
info_msg = AudioInfo(
channels=1,
sample_rate=self.respeaker_audio.rate,
sample_format='S16LE',
bitrate=self.respeaker_audio.rate * self.respeaker_audio.bitdepth,
coding_format='WAVE')
self.pub_audio_info.publish(info_msg)

if self.n_channel > 1:
# The respeaker has 4 microphones.
# Multiple microphones can be used for
# beam forming (strengthening the sound in a specific direction)
# and sound localization (the respeaker outputs the azimuth
# direction, but the multichannel can estimate
# the elevation direction). etc.

# Channel 0: processed audio for ASR
# Channel 1: mic1 raw data
# Channel 2: mic2 raw data
# Channel 3: mic3 raw data
# Channel 4: mic4 raw data
# Channel 5: merged playback
# For more detail, please see
# https://wiki.seeedstudio.com/ReSpeaker_Mic_Array_v2.0/
# (self.n_channel - 2) = 4 channels are multiple microphones.
self.pub_audio_raw = rospy.Publisher("audio_raw", AudioData,
queue_size=10)
self.pub_audio_merged_playback = rospy.Publisher(
"audio_merged_playback", AudioData,
queue_size=10)
if enable_audio_info is True:
info_raw_msg = AudioInfo(
channels=self.n_channel - 2,
sample_rate=self.respeaker_audio.rate,
sample_format='S16LE',
bitrate=(self.respeaker_audio.rate *
self.respeaker_audio.bitdepth),
coding_format='WAVE')
self.pub_audio_raw_info.publish(info_raw_msg)

self.speech_audio_raw_buffer = b""
self.speech_raw_prefetch_buffer = b""
self.pub_speech_audio_raw = rospy.Publisher(
"speech_audio_raw", AudioData, queue_size=10)
self.speech_raw_prefetch_bytes = int(
(self.n_channel - 2)
* self.speech_prefetch
* self.respeaker_audio.rate
* self.respeaker_audio.bitdepth / 8.0)

def on_shutdown(self):
self.info_timer.shutdown()
try:
Expand Down Expand Up @@ -386,14 +453,30 @@ def on_status_led(self, msg):
oneshot=True)

def on_audio(self, data):
self.pub_audio.publish(AudioData(data=data))
# take processed audio for ASR.
processed_data = data[:, 0].tobytes()
self.pub_audio.publish(AudioData(data=processed_data))
if self.n_channel > 1:
raw_audio_data = data[:, 1:5].reshape(-1).tobytes()
self.pub_audio_raw.publish(
AudioData(data=raw_audio_data))
self.pub_audio_merged_playback.publish(
AudioData(data=data[:, 5].tobytes()))
if self.is_speeching:
if len(self.speech_audio_buffer) == 0:
self.speech_audio_buffer = self.speech_prefetch_buffer
self.speech_audio_buffer += data
if self.n_channel > 1:
self.speech_audio_raw_buffer = self.speech_raw_prefetch_buffer
self.speech_audio_buffer += processed_data
if self.n_channel > 1:
self.speech_audio_raw_buffer += raw_audio_data
else:
self.speech_prefetch_buffer += data
self.speech_prefetch_buffer += processed_data
self.speech_prefetch_buffer = self.speech_prefetch_buffer[-self.speech_prefetch_bytes:]
if self.n_channel > 1:
self.speech_raw_prefetch_buffer += raw_audio_data
self.speech_raw_prefetch_buffer = self.speech_raw_prefetch_buffer[
-self.speech_raw_prefetch_bytes:]

def on_timer(self, event):
stamp = event.current_real or rospy.Time.now()
Expand Down Expand Up @@ -433,13 +516,15 @@ def on_timer(self, event):
elif self.is_speeching:
buf = self.speech_audio_buffer
self.speech_audio_buffer = b""
buf_raw = self.speech_audio_raw_buffer
self.speech_audio_raw_buffer = b""
self.is_speeching = False
duration = 8.0 * len(buf) * self.respeaker_audio.bitwidth
duration = duration / self.respeaker_audio.rate / self.respeaker_audio.bitdepth
duration = duration / self.respeaker_audio.rate / self.respeaker_audio.bitdepth / self.n_channel
rospy.loginfo("Speech detected for %.3f seconds" % duration)
if self.speech_min_duration <= duration < self.speech_max_duration:

self.pub_speech_audio.publish(AudioData(data=buf))
self.pub_speech_audio_raw.publish(AudioData(data=buf_raw))


if __name__ == '__main__':
Expand Down
Loading