Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke
Zettlemoyer. 2017. Triviaqa: A large scale distantly
supervised challenge dataset for reading comprehen-
sion. In ACL.
Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov,
Richard Zemel, Raquel Urtasun, Antonio Torralba,
and Sanja Fidler. 2015. Skip-thought vectors. In
Advances in neural information processing systems,
pages 3294–3302.
Quoc Le and Tomas Mikolov. 2014. Distributed rep-
resentations of sentences and documents. In Inter-
national Conference on Machine Learning, pages
1188–1196.
Hector J Levesque, Ernest Davis, and Leora Morgen-
stern. 2011. The winograd schema challenge. In
Aaai spring symposium: Logical formalizations of
commonsense reasoning, volume 46, page 47.
Lajanugen Logeswaran and Honglak Lee. 2018. An
efficient framework for learning sentence represen-
tations. In International Conference on Learning
Representations.
Bryan McCann, James Bradbury, Caiming Xiong, and
Richard Socher. 2017. Learned in translation: Con-
textualized word vectors. In NIPS.
Oren Melamud, Jacob Goldberger, and Ido Dagan.
2016. context2vec: Learning generic context em-
bedding with bidirectional LSTM. In CoNLL.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-
rado, and Jeff Dean. 2013. Distributed representa-
tions of words and phrases and their compositional-
ity. In Advances in Neural Information Processing
Systems 26, pages 3111–3119. Curran Associates,
Inc.
Andriy Mnih and Geoffrey E Hinton. 2009. A scal-
able hierarchical distributed language model. In
D. Koller, D. Schuurmans, Y. Bengio, and L. Bot-
tou, editors, Advances in Neural Information Pro-
cessing Systems 21, pages 1081–1088. Curran As-
sociates, Inc.
Ankur P Parikh, Oscar T
¨
ackstr
¨
om, Dipanjan Das, and
Jakob Uszkoreit. 2016. A decomposable attention
model for natural language inference. In EMNLP.
Jeffrey Pennington, Richard Socher, and Christo-
pher D. Manning. 2014. Glove: Global vectors for
word representation. In Empirical Methods in Nat-
ural Language Processing (EMNLP), pages 1532–
1543.
Matthew Peters, Waleed Ammar, Chandra Bhagavat-
ula, and Russell Power. 2017. Semi-supervised se-
quence tagging with bidirectional language models.
In ACL.
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt
Gardner, Christopher Clark, Kenton Lee, and Luke
Zettlemoyer. 2018a. Deep contextualized word rep-
resentations. In NAACL.
Matthew Peters, Mark Neumann, Luke Zettlemoyer,
and Wen-tau Yih. 2018b. Dissecting contextual
word embeddings: Architecture and representation.
In Proceedings of the 2018 Conference on Empiri-
cal Methods in Natural Language Processing, pages
1499–1509.
Alec Radford, Karthik Narasimhan, Tim Salimans, and
Ilya Sutskever. 2018. Improving language under-
standing with unsupervised learning. Technical re-
port, OpenAI.
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and
Percy Liang. 2016. Squad: 100,000+ questions for
machine comprehension of text. In Proceedings of
the 2016 Conference on Empirical Methods in Nat-
ural Language Processing, pages 2383–2392.
Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and
Hannaneh Hajishirzi. 2017. Bidirectional attention
flow for machine comprehension. In ICLR.
Richard Socher, Alex Perelygin, Jean Wu, Jason
Chuang, Christopher D Manning, Andrew Ng, and
Christopher Potts. 2013. Recursive deep models
for semantic compositionality over a sentiment tree-
bank. In Proceedings of the 2013 conference on
empirical methods in natural language processing,
pages 1631–1642.
Fu Sun, Linyang Li, Xipeng Qiu, and Yang Liu.
2018. U-net: Machine reading comprehension
with unanswerable questions. arXiv preprint
arXiv:1810.06638.
Wilson L Taylor. 1953. Cloze procedure: A new
tool for measuring readability. Journalism Bulletin,
30(4):415–433.
Erik F Tjong Kim Sang and Fien De Meulder.
2003. Introduction to the conll-2003 shared task:
Language-independent named entity recognition. In
CoNLL.
Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010.
Word representations: A simple and general method
for semi-supervised learning. In Proceedings of the
48th Annual Meeting of the Association for Compu-
tational Linguistics, ACL ’10, pages 384–394.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz
Kaiser, and Illia Polosukhin. 2017. Attention is all
you need. In Advances in Neural Information Pro-
cessing Systems, pages 6000–6010.
Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and
Pierre-Antoine Manzagol. 2008. Extracting and
composing robust features with denoising autoen-
coders. In Proceedings of the 25th international
conference on Machine learning, pages 1096–1103.
ACM.
Alex Wang, Amanpreet Singh, Julian Michael, Fe-
lix Hill, Omer Levy, and Samuel Bowman. 2018a.
Glue: A multi-task benchmark and analysis platform