Multi-Dialogues

Component description / functionalities

Dataset-Multilingual Dialogues is generated starting the original SODA dataset. Original dataset is a triples regarding social interactions extracted and contextualized to get a short narrative, which is used as a prompt to generate everyday conversations.  The We generated 12,000 synthetic dialogues per language in French, German, Italian and Spanish. This results in a multilingual dataset that mirrors the SODA style while being fully based on open-source generation methods

IPCEI CIS Reference Architecture

AI Layer
Data Layer

Open source license

ODC-BY-1.0
Keywords
multilingual-dialogue
synthetic-dialogue
conversational-dataset
chat-dataset
instruction-style-dialogue
dialogue-generation
LLM-generated-data
open-dataset
text-generation-dataset
multilingual-nlp
european-languages
italian-language
french-language
german-language
spanish-language
english-language
cross-lingual-data
language-adaptation
synthetic-data-generation
SODA-pipeline
commonsense-reasoning
social-commonsense
narrative-to-dialogue
GPT-generated-data
filtered-synthetic-data
LLM-curated-dataset
chat-model-training
conversational-ai-training
instruction-finetuning
dialogue-systems
chatbot-training-data
response-generation
LLM-fine-tuning
multilingual-chatbot
multi-speaker-dialogue
6-to-8-turn-conversations
structured-dialogue
cleaned-dataset
deduplicated-data
multi-turn-dialogue