Alexa analyzes the semantic content of utterances according to the categories domain, intent, and slot. “Domain” describes the type of application — or “skill” — that the utterance should invoke; for instance, mapping skills should answer questions about geographic distance. “Intent” describes the particular function the skill should execute, such as measuring driving distance. And “slots” are variables that the function acts upon, such as point of origin and destination.
In the slot-filling paradigm, where a user can refer back to slots in the context during a conversation, the goal of the contextual understanding system is to resolve the referring expressions to the appropriate slots in the context. In large-scale multi-domain systems, this presents two challenges — scaling to a very large and potentially unbounded set of slot values, and dealing with diverse schemas.
Developers and researchers from Amazon Alexa Machine Learning present a neural network architecture that addresses the slot value scalability challenge by reformulating the contextual interpretation as a decision to carry over a slot from a set of possible candidates. To deal with heterogeneous schemas, researchers introduce a simple data-driven method for transforming the candidate slots. Experiments show that the approach can scale to multiple domains and provides competitive results over a strong baseline.