Dataset 1: Mobile Text Dataset

Files

Download

Download (624.0 MB)

Description

This zip file contains the sentences mined from public web forums and blogs. Additional details about the dataset:

  • The data is split into training, development, and test sets based on the original domain name the text was mined from.
  • The sent_*.txt files are tab-delimited and contain one sentence parsed from a particular post. Each line contains the device name, forum software, device form factor (tablet or phone), and device input (touch or touch+key) associated with the post it was obtained from.
  • The set's subdirectory contains the groupings used in Section 2.
  • 64K word list (used in the paper), 5K and 20K word lists used on Forum-only models found here: https://digitalcommons.mtu.edu/mobiletext/3/
  • Various word lists used.
  • Posts and Email development and test sets.

For further details, please see the forthcoming paper.

Publication Date

2019

Keywords

mobile text, text mining, modeling text

Disciplines

Computer Sciences

Dataset 1: Mobile Text Dataset

Share

COinS