Component description / functionalities
Dataset-Multilingual Dialogues is generated starting the original SODA dataset. Original dataset is a triples regarding social interactions extracted and contextualized to get a short narrative, which is used as a prompt to generate everyday conversations. The We generated 12,000 synthetic dialogues per language in French, German, Italian and Spanish. This results in a multilingual dataset that mirrors the SODA style while being fully based on open-source generation methods