Dataset of Japanese task対話ツイートのツイートIDのみのペア。ツイート本文はなし。
The input data will be those randomly sampled from tweets in the year 2015. The pool of tweets (the target for extraction) is the randomly sampled tweet pairs (mention-reply pairs) in the year 2014. The size of the pool is just over one million; that is 500K pairs.
The following data will be provided from the organizers:
(1) Twitter data (by using their IDs) 1M in size
(2) Development data. Input samples and output samples annotated with reference labels. Here, the number of annotators is ten.
2015-11-21
NTCIR-12 Task on Short Text Conversation
http://ntcir12.noahlab.com.hk/japanese/stc-jpn.htm