correct object
Recently Published Documents


TOTAL DOCUMENTS

31
(FIVE YEARS 9)

H-INDEX

6
(FIVE YEARS 1)

Author(s):  
Wilka Carvalho ◽  
Anthony Liang ◽  
Kimin Lee ◽  
Sungryull Sohn ◽  
Honglak Lee ◽  
...  

Learning how to execute complex tasks involving multiple objects in a 3D world is challenging when there is no ground-truth information about the objects or any demonstration to learn from. When an agent only receives a signal from task-completion, this makes it challenging to learn the object-representations which support learning the correct object-interactions needed to complete the task. In this work, we formulate learning an attentive object dynamics model as a classification problem, using random object-images to define incorrect labels for our object-dynamics model. We show empirically that this enables object-representation learning that captures an object's category (is it a toaster?), its properties (is it on?), and object-relations (is something inside of it?). With this, our core learner (a relational RL agent) receives the dense training signal it needs to rapidly learn object-interaction tasks. We demonstrate results in the 3D AI2Thor simulated kitchen environment with a range of challenging food preparation tasks. We compare our method's performance to several related approaches and against the performance of an oracle: an agent that is supplied with ground-truth information about objects in the scene. We find that our agent achieves performance closest to the oracle in terms of both learning speed and maximum success rate.


Author(s):  
Van-Quang Nguyen ◽  
Masanori Suganuma ◽  
Takayuki Okatani

There is a growing interest in the community in making an embodied AI agent perform a complicated task while interacting with an environment following natural language directives. Recent studies have tackled the problem using ALFRED, a well-designed dataset for the task, but achieved only very low accuracy. This paper proposes a new method, which outperforms the previous methods by a large margin. It is based on a combination of several new ideas. One is a two-stage interpretation of the provided instructions. The method first selects and interprets an instruction without using visual information, yielding a tentative action sequence prediction. It then integrates the prediction with the visual information etc., yielding the final prediction of an action and an object. As the object's class to interact is identified in the first stage, it can accurately select the correct object from the input image. Moreover, our method considers multiple egocentric views of the environment and extracts essential information by applying hierarchical attention conditioned on the current instruction. This contributes to the accurate prediction of actions for navigation. A preliminary version of the method won the ALFRED Challenge 2020. The current version achieves the unseen environment's success rate of 4.45% with a single view, which is further improved to 8.37% with multiple views.


2021 ◽  
Author(s):  
Dora Kampis ◽  
Helle Lukowski Duplessy ◽  
Victoria Southgate

Adults and children sometimes commit ‘egocentric errors’, failing to ignore their own perspective, when interpreting others’ communication. Training imitation-inhibition reduces these errors in adults, facilitating perspective-taking. This study tested whether imitation-inhibition training may also facilitate perspective-taking in 3-6-year-olds, an age where the egocentric perspective may be particularly influential. Children participated in a 10-minute imitation-inhibition, imitation, or non-social-inhibition training (white, n=25 per condition, 33 female), and subsequently the communicative-perspective-taking Director task. Training had a significant effect (F(2, 71)= 3.268, p= .044, η2= .084): on critical trials the imitation-inhibition group selected the correct object more often than the imitation and non-social-inhibition training groups. The imitation-inhibition training thus specifically enhanced the perspective-taking process, indicating that perspective-taking from childhood onwards involves managing self-other representations.


2021 ◽  
pp. 136700692110286
Author(s):  
Giovanna Morini ◽  
Rochelle S. Newman

Aims and objectives: The purpose of this study was to examine whether differences in language exposure (i.e., being raised in a bilingual versus a monolingual environment) influence young children’s ability to comprehend words when speech is heard in the presence of background noise. Methodology: Forty-four children (22 monolinguals and 22 bilinguals) between the ages of 29 and 31 months completed a preferential looking task where they saw picture-pairs of familiar objects (e.g., balloon and apple) on a screen and simultaneously heard sentences instructing them to locate one of the objects (e.g., look at the apple!). Speech was heard in quiet and in the presence of competing white noise. Data and analyses: Children’s eye-movements were coded off-line to identify the proportion of time they fixated on the correct object on the screen and performance across groups was compared using a 2 × 3 mixed analysis of variance. Findings: Bilingual toddlers performed worse than monolinguals during the task. This group difference in performance was particularly clear when the listening condition contained background noise. Originality: There are clear differences in how infants and adults process speech in noise. To date, developmental work on this topic has mainly been carried out with monolingual infants. This study is one of the first to examine how background noise might influence word identification in young bilingual children who are just starting to acquire their languages. Significance: High noise levels are often reported in daycares and classrooms where bilingual children are present. Therefore, this work has important implications for learning and education practices with young bilinguals.


2021 ◽  
Author(s):  
Umesh Patil ◽  
Sol Lago

We propose a retrieval interference-based explanation of a prediction advantage effect observed in Stone et al. (2021). They reported two dual-task eye-tracking experiments in which participants listened to instructions involving German possessive pronouns, e.g. ‘Click on his blue button’, and were asked to select the correct object from a set of objects displayed on screen. Participants’ eye movements showed predictive processing, such that the target object was fixated before its name was heard. Moreover, when the target and the antecedent of the pronoun matched in gender, predictions arose earlier than when the two genders mismatched — a prediction advantage. We propose that the prediction advantage arises due to similarity-based interference during antecedent retrieval, such that the overlap of gender features between the antecedent and possessum boosts the activation level of the latter and helps predict it faster. We report an ACT-R model supporting this hypothesis. Our model also provides a computational implementation of the idea that prediction can be thought of as memory retrieval. In addition, we provide a preliminary ACT-R model of how linguistic processes could drive changes in visual attention.


Author(s):  
Rebecca L. Monk ◽  
Lauren Colbert ◽  
Gemma Darker ◽  
Jade Cowling ◽  
Bethany Jones ◽  
...  

Abstract Background Theory of mind (ToM), the ability to understand that others have different knowledge and beliefs to ourselves, has been the subject of extensive research which suggests that we are not always efficient at taking another’s perspective, known as visual perspective taking (VPT). This has been studied extensively and a growing literature has explored the individual-level factors that may affect perspective taking (e.g. empathy and group membership). However, while emotion and (dis)liking are key aspects within everyday social interaction, research has not hitherto explored how these factors may impact ToM. Method A total of 164 participants took part in a modified director task (31 males (19%), M age = 20.65, SD age = 5.34), exploring how correct object selection may be impacted by another’s emotion (director facial emotion; neutral × happy × sad) and knowledge of their (dis)likes (i.e. director likes specific objects). Result When the director liked the target object or disliked the competitor object, accuracy rates were increased relative to when he disliked the target object or liked the competitor object. When the emotion shown by the director was incongruent with their stated (dis)liking of an object (e.g. happy when he disliked an object), accuracy rates were also increased. None of these effects were significant in the analysis of response time. These findings suggest that knowledge of liking may impact ToM use, as can emotional incongruency, perhaps by increasing the saliency of perspective differences between participant and director. Conclusion As well as contributing further to our understanding of real-life social interactions, these findings may have implications for ToM research, where it appears that more consideration of the target/director’s characteristics may be prudent.


2020 ◽  
Author(s):  
Cornelia Schulze ◽  
David Buttelmann

Interpreting a speaker’s communicative acts is a challenge children are facing permanently in everyday live. In doing so, they seem to understand direct communicative acts more easily than indirect communicative acts. The present study investigated which step in the processing of communicative acts might cause difficulties in understanding indirect communication. To assess the developmental trajectory of this phenomenon, we tested 3- and 5-year-old children (N=105) using eyetracking and an object-choice task. The children watched videos that showed puppets during their every-day activities (e.g., pet care). For every activity, the puppets were asked which of two objects (e.g., rabbit or dog) they would rather have. The puppets responded either directly (“I want the rabbit”) or indirectly (“I have a carrot”). Results showed that children chose the object intended by the puppets more often in the direct- than in the indirect-communication condition, and 5-year-olds chose correctly more than 3-year-olds. However, even though we found that children’s pupil size increased while hearing the utterances, we found no effect for communication type before children had already decided on the correct object during object selection by looking at it. Only after this point, that is, only in children’s further fixation patterns and reaction times did differences for communication type occur. Thus, although children’s object-choice performance suggests that indirect communication is harder to understand than direct communication, the cognitive demands during processing both communication types seem similar. We discuss theoretical implications of these findings for developmental pragmatics in terms of a dual-process account of communication comprehension.


2019 ◽  
Author(s):  
Sho Tsuji ◽  
Nobuyuki Jincho ◽  
Reiko Mazuka ◽  
Alejandrina Cristia

Is infants’ word learning boosted by nonhuman social agents? An on-screen virtual agent taught infants word–object associations in a setup where the presence of contingent and referential cues could be manipulated using gaze contingency. In the study, 12-month-old Japanese-learning children (N = 36) looked significantly more to the correct object when it was labeled after exposure to a contingent and referential display versus a noncontingent and nonreferential display. These results show that communicative cues can augment learning even for a nonhuman agent, a finding highly relevant for our understanding of the mechanisms through which the social environment supports language acquisition and for research on the use of interactive screen media.


2019 ◽  
Vol 286 (1894) ◽  
pp. 20182332 ◽  
Author(s):  
Sarah A. Jelbert ◽  
Rachael Miller ◽  
Martina Schiestl ◽  
Markus Boeckle ◽  
Lucy G. Cheke ◽  
...  

Humans use a variety of cues to infer an object's weight, including how easily objects can be moved. For example, if we observe an object being blown down the street by the wind, we can infer that it is light. Here, we tested whether New Caledonian crows make this type of inference. After training that only one type of object (either light or heavy) was rewarded when dropped into a food dispenser, birds observed pairs of novel objects (one light and one heavy) suspended from strings in front of an electric fan. The fan was either on—creating a breeze which buffeted the light, but not the heavy, object—or off, leaving both objects stationary. In subsequent test trials, birds could drop one, or both, of the novel objects into the food dispenser. Despite having no opportunity to handle these objects prior to testing, birds touched the correct object (light or heavy) first in 73% of experimental trials, and were at chance in control trials. Our results suggest that birds used pre-existing knowledge about the behaviour exhibited by differently weighted objects in the wind to infer their weight, using this information to guide their choices.


2018 ◽  
Author(s):  
Lihui Wang ◽  
Fariba Sharifian ◽  
Jonathan Napp ◽  
Carola Nath ◽  
Stefan Pollmann

AbstractThe perception gained by retina implants (RI) is limited, which asks for a learning regime to improve patients’ visual perception. Here we simulated RI vision and investigated if object recognition in RI patients can be improved and maintained through training. Importantly, we asked if the trained object recognition can be generalized to a new task context, and to new viewpoints of the trained objects. For this purpose, we adopted two training tasks, a naming task where participants had to choose the correct label out of other distracting labels for the presented object, and a discrimination task where participants had to choose the correct object out of other distracting objects to match the presented label. Our results showed that, despite of the task order, recognition performance was improved in both tasks and lasted at least for a week. The improved object recognition, however, can be transferred only from the naming task to the discrimination task but not vice versa. Additionally, the trained object recognition can be transferred to new viewpoints of the trained objects only in the naming task but not in the discrimination task. Training with the naming task is therefore recommended for RI patients to achieve persistent and flexible visual perception.


Sign in / Sign up

Export Citation Format

Share Document