To remove any frames surrounding this page, click here

Mark Johnson's research interests

Interdisciplinary research and training:

The things that I'm most excited about currently is research that lies at the intersection of Linguistics, Computer Science, Statistics and Neuroscience. Computational Linguistics and machine learning are good examples of topics that lie at this nexus. We're lucky at Brown to have received generous external support for research and training in these areas. I am Principal Investigator of an NSF IGERT training program in this area. I am also involved with the Brain Science initiative at Brown. We're always looking for good graduate students and post-docs as well as undergraduates (it's easier if you are US citizens or permanent residents). If you're interested in being part of an exciting interdisciplinary team, email us!

Why computational linguistics?

My area of research is computational linguistics. Linguistic theory focusses primarily on the structures involved in natural language, but in my opinion the structures alone are just a small part of the story. Language is active and dynamic; the processes of language learning, comprehension and production are what really bring language to life. That is, I believe that modern generative linguistic theories of syntax, semantics and phonology are on the right track as far as they go, but that they are missing a large part of the story because they focus on static representations, rather than the processes which create and manipulate these representations. Put rather crassly: representations just sit there, processes actually do something

There are many different ways to study these processes, but to me one of the fascinating challenges is to develop theories that are consistent with and build on the structures that standard linguistic theory provides. I also think that we want theories of these linguistic processes which are clear and explicit, in much the same way as certain generative approaches to linguistics formulate clear, explicit and precise grammar fragments in order to present and test their hypotheses. Manipulating information-bearing symbols is what computation is all about, so we want to understand the processes of language in computational terms.

Computational linguistics is a truly interdisciplinary subject. It is a scientific discipline with important industrial and engineering applications (just like some areas of physics or chemistry). Intellectually it draws primarily on linguistics and computer science, and these days it draws heavily on statistics. But it also has growing contacts with psycholinguistics (the experimental study of human linguistic behaviour), language acquisition (the experimental study of how humans learn language) and I think it should also have more contact with neurolinguistics (how language is realized in the brain).

Industrially, computational linguistics is currently booming (just look at the jobs for computational linguists advertised on the Web). As more texts are available on-line, we need effective ways of searching, summarizing and translating it. The applied side of computational linguistics, called Natural Language Processing, develops methods for automatically manipulating and extracting information from natural language. Several major corporations have declared that natural language processing will be a key technology in 21st century information processing; this coupled with the exponential growth of the Web has fuelled a similiar expansion in job opportunities for computational linguists in natural language processing.

We have an active, rapidly growing group of Computational Linguists at Brown. BLLIP (the Brown Laboratory for Linguistic Information Processing) is a group of undergraduate and graduate students, post-docs and faculty that meets weekly to discuss current research: everyone is welcome to attend!

Research interests

Traditionally computational linguistics has worked with boolean "all or none" divisions, like the linguistic theory it was based on. But beginning with engineering work in speech recognition, it has become clear that there is a tremendous amount of valuable information in the statistical distribution of words and other linguistic constructs. While we still don't know exactly what information is available to a child learning their first language or a person understanding a sentence, it seems that both anguage learning and comprehension can reliably occur even though the information available is incomplete, uncertain or noisy. For example, by tracking whether a construction is more likely or less likely in a particular context, the comprehension process can be guided to the correct interpretation. Most of my recent papers are about using statistical information to guide interpretation in this sort of way.

The process of identifying the structure of a sentence is called parsing. My research attempts to answer questions such as: What kinds of parsing algorithms can construct the various kinds of linguistic representations posited by different linguistic theories? How does their performance vary on different constructions? Which kinds of parsing algorithms perform in a way that mimics human language understanding (and might actually be used by humans)?

As well as the ``hard'' constraints of the kind posited by conventional linguistic theory, human language exhibits a variety of ``soft'' statistical regularities. Even though their origin is usually unknown (e.g., pragmatic), it may be that the human language processing mechanism exploits these regularities. Johnson's research in this area is directed towards discovering parsing algorithms that can exploit all of the ``hard'' and ``soft'' constraints in a systematic way.

Part of an attribute-value structure representation of the sentence "Who chases dogs?"