Yusuke Matsubara

I'm Yusuke Matsubara. I work at the Decentralized Big Data Team at Riken, Tokyo, Japan.

I have worked on statistical modeling of natural language and its applications to support document authoring. Recent projects I have worked on include mining of less frequent patterns in a text, semantic parsing of Japanese clinical texts, quantitative analysis of Wikipedia contributors. My research interests cover other areas of computational linguistics at large.

In my spare time, I enjoy contributing to free software and free content movement. If you are interested in those activities, see also my other online presence.

Publications

(refereed)

- Matsubara, Yusuke and Koiti Hasida. "K-repeating Substrings: a String-Algorithmic Approach to Privacy-Preserving Publishing of Textual Data". [slides] Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing (PACLIC 28). 2014.

Matsubara, Yusuke and Jun'ichi Tsujii. "Large-vocabulary lexical choice with rich context features". International Journal of Computational Linguistics and Applications, vol. 2, no. 1-2, pp.9--24. 2011.
Wailok Tam, Koiti Hasida, Yusuke Matsubara, Eiji Aramaki, Mai Miyabe, Motoyuki Takaai, Hirosi Uozaki and Yo Sato. "Proper and Efficient Treatment of Anaphora and Long-Distance Dependency in Context-Free Grammar". Proceedings of the First Workshop on Natural Language Processing for Medical and Healthcare Fields. 2013.

(not refereed)

- Matsubara, Yusuke, Mizuki Morita and Koiti Hasida. "BARY at the NTCIR-11 MedNLP-2 Task for Complaints and Diagnosis Recognition". [slides] Proceedings of the NII Testbeds and Community for Information access Research (NTCIR-11). 2014.
- Wai Lok Tam, 松原勇介, 橋田浩一, 鷹合基行, 荒牧英治, 宇於崎弘. "Linking a Grammar to an Ontology". 言語処理学会第19回年次大会(NLP2013). 2013.
- 松原勇介, 宮尾祐介, 辻井潤一. "大語彙の同義語集合からの文脈に応じた語彙選択". 言語処理学会第16回年次大会(NLP2010). 2010.
- 岡野原, 大輔, 松原勇介, 辻井潤一. "階層木言語モデルの音声認識への適用". 日本音響学会2009年春季研究発表会. 2009.

Matsubara, Yusuke, Jun Ogata and Masataka Goto. "Improvements on Podcast Speech Recognition: Language modeling with Web Keywords maintained by Mass Knowledge" (Original title: "ポッドキャスト音声認識の性能向上手法: 集合知によって更新されるWebキーワードを活用した言語モデリング"). 音声言語情報処理研究会研究報告 2008-SLP-71-6. 2008(46). pp. 39--44. 情報処理学会. May 2008. (in Japanese)
松原勇介, 宮尾祐介, 辻井潤一. "重複する素性を持つNグラム言語モデル". 言語処理学会第14回年次大会(NLP2008). 2008.
松原勇介, 秋葉友良, 辻井潤一. "最小記述長原理に基づいた日本語話し言葉の単語分割". 言語処理学会第16回年次大会(NLP2007). 2007.

Software

- Growthring - a Scala implementation of k-repeating substrings for efficiently identifying less frequent substrings.

Contact

Please feel free to e-mail me at:

yusuke (at) matsubara.name

Page updated

Google Sites

Report abuse