Bookmarks: NTCIR-12 Task on Short Text Conversation

2015-11-21

NTCIR-12 Task on Short Text Conversation

http://ntcir12.noahlab.com.hk/japanese/stc-jpn.htm

Dataset of Japanese task

The input data will be those randomly sampled from tweets in the year 2015. The pool of tweets (the target for extraction) is the randomly sampled tweet pairs (mention-reply pairs) in the year 2014. The size of the pool is just over one million; that is 500K pairs.

The following data will be provided from the organizers:
(1) Twitter data (by using their IDs) 1M in size
(2) Development data. Input samples and output samples annotated with reference labels. Here, the number of annotators is ten.

対話ツイートのツイートIDのみのペア。ツイート本文はなし。