Sunday, July 14, 2013

Scientifically Approaching Language


Let's take a look at a quote from a journalistic piece, from The American Scholar:
The science behind search may change how linguists view natural language. For one, linguists may move away from modeling language using formal grammars, Chomskyan or otherwise. “What you end up finding is that the things that people do with language very rarely fit into the formal grammar that you carved out from the outset,” says Mailhot. “The data are what they are and people do what they do and the best strategy is to make inferences based on what people do rather than carve something out ahead of time and shoehorn the data into it.” 
This hit home for Mailhot while briefly working on a project using data from Twitter, where “people just do the absolute most fascinating stuff with language.” Take the ironic construction X much—as in Jealous much? or Hypocritical much? “There are no grammars that will give you something out of this,” he says, “but you have to know that X much is a thing that people are doing."
In a specific instance, Mailhot is right: as a scientist working on language, you do have to know that "X much?" is a thing that people are doing.  However, and more generally, his comments are misinformed at best, and ignorant at worst.

Allow me to summarize how Mailhot's last comments come across to a linguist: "A theory that's been built up over half a century by some of the world's greatest minds that have ever considered how language works needs to be completely thrown out because I found some things it can't account for!"

This line of thinking is flawed in two ways: in the disconnect between model and system, and in how to scientifically deal with counterexamples.



First, the problem regarding models and systems.

Whether a model can (not) successfully predict actual behavior is not an indicator of how well the model reflects the actual system that it is trying to describe.

A good working model of how humans actually walk is not necessarily a good reflection of the system of walking and how it is represented in the brain. Similarly, a good working model of the system of walking and how it is represented in the brain is not necessarily a good reflection of how humans actually walk.

In the same way, a good working model of how humans actually use language is not necessarily a good reflection of the system of language and how it is represented in the brain. Similarly, a good working model of the system of language and how it is represented in the brain is not necessarily a good reflection of how humans actually use language.

Linguistic engineers approach language by looking at outwardly observable data as it actually occurs in the world, and move towards a stochastic generalized model of usage.  This bottom-up-only, usage-based approach is especially helpful for trying to improve performance on computational models, like those used by search engines.  This kind of approach, in which data *is* the theory, is thus very useful, but it does not necessarily inform us about the system.

Theoretical linguists approach language somewhat differently. It's not that we do not start with observable data.  Of course we do, otherwise how would we know where to start?  What's different is that we use both bottom-up and top-down approaches in our work.  We look at outwardly observable data, make (potentially huge) abstractions/inferences, and generate hypotheses about the basic principles that determine of how the system works.  Theorists focus on the latter part, and only use the data to support/modify the hypothesis.  Data plays a secondary role, informing the theoretical model.  Developing theoretical models is important for helping us understand why and how different languages are different/the same, and it helps us understand how the basic principles of language.

Theoretical approaches, in the baby stages of theoretical study (where linguistics is in its scientific lifetime), have more limited applicability in computational models with direct real-world applications.  For this reason, theoretical linguistics is often not paid much attention by search engine engineers (perhaps rightfully!).  They prefer more directly-useful computational models on probabilistic language patterns (but computational models that take grammatical approaches do exist, and are very successful in their own way).

In order to form a good model of a system, we need more than just data and a collection of generalizations.  We need to understand the basic principles of the system, which we come to through creating and refining hypotheses with actual language data.



Second, the problem of how to scientifically approach counterexamples.

Let us start with the specific (purported) counterexamples at hand.  Mailhot seems to confidently believe that a grammatical model, chomskyan or otherwise, canNOT predict "X much?" or other such linguistic innovations.  But how did he come to this conclusion?  And what do we do if it is truly something that is unpredictable in any grammatical model?

It is possibly true that no one has attempted to use a grammatical approach to predict this behavior in published work, but that is very different from all grammatical accounts predicting it to be impossible.  I would put large amounts of money on the idea that a grammatical account could account for this, and without any substantial changes to the framework.  (In fact, through the magic of facebook, I have seen theoretical linguists offer good first-hypotheses of how it is already predicted possible in a chomskyan grammatical approach.)

But let us suppose (and I believe this is not true) and it might be true that the model as it stands might now cannot account for "X much?" et alia.  In other words, let us suppose that Mailhot has found a *true* counterexample to all current "grammatical" approaches.  What is the conclusion we can make?

Does this mean that the grammatical approach is just wrong?  Does it mean that the models we are using are truly and basically flawed?  Should we just throw it all away, because we have some counterexamples?

That kind of extreme reaction is not how theoretical science deals with problems.  A well-founded theory that makes lots of attested predictions is not dismissed every time a problem is found.  Instead, we make incremental changes to the theory: we revise it and increase its complexity so that the discovered problem is no longer a problem.  When those changes are on the right track, further observations of the data show that changes have corrected other problems -- as an indicator that we've got a successful theory.  If further observations show that there are new (worse) problems, we change the theory in a new way.  Afterwards, the theory's empirical predictions better match up with the actual data, and we will have a better understanding of the system, through a model.

This is called the scientific method.  It is literally how science is defined.

Eventually, we may make so many changes that the model we started with only vaguely resembles the actual model. This is how science gets rid of models: making incremental changes until the changes have eventually created an entirely new model.  Starting over because of some empirical problems with a model that otherwise gets a lot right is unjustifiably rash.



I have often pondered why it is that people have such strongly negative feelings about how theoretical linguists must be wrong about how language works.

This is something I frequently encounter from people who don't actually study language, but have taken a linguistics course, or have taken a course in some other department in which modern, mainstream linguistic approaches are (briefly) discussed.

My guess is that people see a simplified version of the theory, and rightfully find problems with it.  Wrongfully, however, they assume that they simplified version of the theory is the theory, and there is no other way that the same ideas can be manifested.

So, I would suppose that Mailhot was exposed to an elementary/simplified grammatical approach (like the ones taught in introductory classes), and rightfully noticed that it has no actual way of dealing with things like "X much?".  The mistake was in then inferring that all grammatical approaches will similarly have no way of dealing with "X much?".

But consider the fact that the model of genetics that you learned in high school (Mendelian inheritance) can't explain even the most basic facts about how genetics actually works.  In fact, Mendel's theory cannot actually determine pea-color as easily as he predicted.  Would you say that that model is totally broken?  No.  He had to start with the basic observations -- he made a theory and model of how inheritance works -- and he probably knew that it couldn't account for every empirical fact about inheritance.

He put his theory out there, though, because it was (in its message) basically right, and so that he or others could make refinements to the model after making more observations.  Over time, the most correct model of inheritance has become more complicated -- so complicated that you can't teach it in an introductory class, and you've gotta start somewhere, so you start with the basic-but-flawed basic idea.

The same is true for linguistics.  You start with the theory whose general ideas (constituency, hierarchy, basic principles) are correct in the beginning, while recognizing that it is incomplete, and while working to make it more complete.  That's how science works.

Tuesday, May 28, 2013

Beating Them There

For a good number of years now, I've lived in LA. The land of never-ending traffic. As a walker/biker, I often find myself walking/biking faster than the cars. Which is actually not that hard, but it's always made me very happy. (Except in the times that I'm one of the suckers in a car.)

Anyway, I remember one of the first times this happened to me, when I first moved to LA. I came home and told my boyfriend at the time about all the cars in traffic and how I walked from one place to another faster than them. I proudly said:

  1. I beat them there.

I remember that this is what I said because he laughed and asked how I could be so violent. And the syntactic ambiguity alarm went off and I filed this away for later laughs/analysis.

The analysis is actually not so complicated. And it will end up telling us cool things about the syntax-prosody interface.


In English, the PP "there" can replace a large variety of different kinds of PPs: those headed by "in"/"at"/"on"/"into"/"onto"/etc. (Though it can't replace all of them: e.g. those headed by "from".) Notably, "there" can replace PP Goals and PP Locations (Wikipedia does a decent job describing "Goal" and "Location"), leading to ambiguity. The "Goal" reading of (1) is "I beat them to that location" and the "Location" reading is "I beat them at that location".

QED. Nothing more to say. Right? Of course not.

Goal PPs are demonstrably lower in syntactic structure than Location PPs. First of all, if you have both kinds of PPs in a single clause, typical word order involves the Goal PP being closer to the verb. So in (2) and (3), "to campus" acts as a Goal PP, and "on a bike" acts as a Location PP:

  1. I beat them to campus on a bike.
  2. ?? I beat them on a bike to campus.

Secondly, constituency tests show that "beat them to campus" is a constituent to the exclusion of "in a car", (4), and "beat them" cannot be replaced by "do so" when there is a Goal PP, as in (5).

  1. I [beat them to campus] on a bike, and John did so in a car.
  2. * I [beat them] to campus on a bike, and John did so to the store in a car.

This is evidence that the Goal PP forms a smaller constituent with "beat them", to the exclusion of the Location PP "in a car". Thus the structure for a sentence like (2) is like (6):

  1. [ [verb object goal] location ]

Something like this is argued for in Larson's 1988 paper, "On the double object construction", in Linguistic Inquiry 19. In fact, more articulated structures structures argue, on the basis of more complex evidence, that the structre is more like (7), with the object asymmetrically c-commanding the goal:

  1. [ [verb [ object goal ] ] location ]

So, back to (1). With the Goal reading, "there" is in a very low structural position (lower than the verb), and with the Location reading, it is rather high in the structure (higher than the verb).

Syntactic analysis of the ambiguity of (1) is now done.

QED. Nothing more to say. Right? Of course not.

The reason I've been thinking about (1) recently is not only that I still frequently encounter this situation, but also because I realized that th sentence can actually be disambiguated with prosody. And moreover, with the appropriate theory of how prosody is influenced by syntax, we have further support for (7).

First, let's establish that there is a phenomenon called "default sentential stress". When all the information in a sentence is new (the speaker assumes the listener(s) don't have any background knowledge, and is not trying to create any implicatures), one word in a sentence will receive extra prominence: sentential stress. I've marked this stress with an acute accent (´) and underlining:

  1. A naked man sang me several sóngs.

Note that (9) is a fine sentence, but speaker assumes knowledge on the listener's part (e.g. that a naked man sang someone several songs) or is trying to create an implicature (e.g. that the speaker is the last person you'd expect to have a naked man sing several songs):

  1. A naked man sang several songs.

A very basic generalization that works rather well is that "the rightmost" word in English sentences bear sentential stress -- this idea goes back at least to Chomsky and Halle's 1968 The Sound Pattern of English. If you want to know more about phrasal stress, see Zubizarreta and Vergnaud's 2006 chapter in The Blackwell Companion to Syntax.

Now notice that the location of the sentential stress is different for the Goal and Location readings of (1)) Namely, the Goal reading structure is pronounced as (10), and the Location reading structure is pronounced as (11).

  1. I beat them thére. (Goal)
  2. I béat them there. (Location)

How do we derive the distinction? If sentential stress is determined by word order, this is totally mysterious. In fact, the difference between (10) and (11) is structural, as we saw in (7). Actually, despite what I said earlier about "rightmost", many researchers would find the stress location of (11) to be expected -- because there is a generalization going back to Bresnan 1972 that pronominal/anaphoric phrases (such as a pronoun "there") typically avoid sentential stress. That is, maybe a better rule is "assign sentential stress to the rightmost non-pronoun word". However, researchers that employ such a rule would be hard pressed to describe why the pronoun "there" does in fact bear sentential stress in (10).

Sentential stress more complex than word order generalizations could predict.

Notice, however, that the generalization seems to be correct as far as the pronoun "them" is concerned. If we change "them" to a non-pronoun, this non-pronoun will bear the sentential stress in the Location reading. However, we still get a pattern where "there" bears sentential stress in the Goal reading but not in the Location reading:

  1. I beat the drivers thére. (Goal)
  2. I beat the drívers there. (Location)

So the generalization remains, Location "there" is prosodically different from Goal "there". This is actually predictable, if we assume that prosody is affected -- not by word order -- by syntactic structure. Namely, theories such as those presented in Cinque 1993 (Linguistic Inquiry article) and Zubizarreta 1998 (Linguistic Inquiry monograph) say that the most deeply embedded constituent bears phrasal stress.

This is exactly what our theory in (7) predicts. In the Goal reading, "there" is the most deeply embedded word -- allowing it to bear sentential stress -- and in the Location reading "there" is too high to bear sentential stress.

(In the Location reading "them" would seem to be the most deeply embedded. Maybe the generalization on pronouns doesn't hold for "there" but it does hold for "them". Or, if you want to know about how the generalization about pronouns works out in this framework, I recommend Wagner's 2006 paper in the proceedings of SALT XVI.)


So. The syntax-phonology interface (namely the sentential stress assignment mechanism) cares about depth of embedding, and not word order. And this depth of embedding difference between Goal and Location PPs is corroborated by both constituency tests and prosodic evidence.

QED.

Wednesday, May 22, 2013

(Dis)allowing Infinitival Clause Complements

The other day, I misheard a friend, thinking she said (1):
  1. * We'll disagree to do it.
I remember this because (1) is ungrammatical and I was immediately struck by its ungrammaticality. In actuality, when she had said (2):
  1.   We'll just agree to do it.
Setting aside the fact that (1) and (2) would seem to mean entirely different things, why should (1) be ungrammatical? It doesn't seem to be just something about meaning, since (4) and (5), which are both grammatical, can be interpreted in a way that is practically synonymous (3):
  1. The FDA (*dis)approved the new drug to potentially treat that disease.
  2. The FDA (dis)approved of the new drug potentially treating that disease.
  3. The FDA did (not) approve the new drug to potentially treat that disease.
So then the big question is, what kind of structure is available for (2) that isn't available for (1)?

The structure in (2) is actually a sub-type of one of the most well-studied phenomena in generative syntax: raising and (obligatory) control. Let's take a moment to birefly discuss them.

When a predicate (which can be a verb, adjective, preposition, noun, ...) has a non-finite clausal complement:
  1. They were happy to grab some drinks.
  2. It is about to fall.
  3. Jack didn't persuade the intern to stay.
  4. Liz might want it to rain.
In each of these cases in (6)-(9), the bolded word introduces the nonfinite clause as its complement. There are important differences between these. These differences have been given names: (6) is a Subject Control predicate, (7) is a Raising-to-Subject predicate, (8) is a Object Control predicate, and (9) is an Raising-to-Object (or ECM) predicate. The exact nature of these differences could span books, so let's not focus on them here. All that matters is that there are four types of these predicates.

Back to the original data at hand. We've seen the "dis-" prefix interfere with letting the verb "agree" tkae its nonfinite clause complement. That is, "agree" can function as a Subject Control predicate, but "disagree" cannot. We've answered our first question.

Now another question arises, do all types of raising/control pattern this way with the "dis-" prefix? The answer I have come to (after looking through the dictionary for words starting with "dis-" and then consulting with myself and other speakers to determine grammaticality), is a clear and resounding "yes".  (10)-(12) are clearly Subject Control, (11) is ambiguous between Subject Control and Raising-to-Subject, (13)-(14) are clearly Raising-to-Subject, (15)-(16) are clearly Object Control, (17)-(22) are clearly Raising-to-Object:
  1.   We would not agree to go.
    * We would disagree to go.
  2.   We would not like to go.
    * We would dislike to go.
  3.   We would not qualify to go.
    * We would disqualify to go.
  4.   We would not continue to go.
    * We would discontinue to go.
  5.   We would not prove to go.
    * We would disprove to go.
  6.   We would not encourage John to be in control.
    * We would discourage John to be in control.
  7.   We would not empower John to be in control.
    * We would disempowered John to be in control.
  8.   We would not trust John to be in control.
    * We would distrust John to be in control.
  9.   We would not approve John to be in control.
    * We would disapprove John to be in control.
  10.   We would not allow John to be in control.
    * We would disallow John to be in control.
  11.   We would not believe John to be in control.
    * We would disbelieve John to be in control.
  12.   We would not like John to be in control.
    * We would dislike John to be in control.
  13.   We would not prove John to be in control.
    * We would disprove John to be in control.
Generalization: addition of the morpheme "dis-" to a verb removes its ability to perform as a raising/control verb and license a non-finite clause complement. This is entirely surprising,

If you believe morphemes like "dis" are added onto verbs outside of the syntax (e.g. in the lexicon, or some other morphological component), with no syntactic effect, then this is not only mysterious but underivable as a generalization. In other words, unless morphology interacts with syntax on a fundamental level (e.g. having the ability to license certain types of complements), this would have to be an accident. It doesn't seem like an accident.

If it's not an accident, from what does this pattern derive? It's hard for me to say at the moment, but it would be worth investigating what other syntactic properties a morpheme like "dis-" has. The first property that came to my attention is the following: the verb "disqualify" is obligatorily transitive (in the active voice) whereas "qualify" is obligatorily intransitive:
  1. The judge disqualified *(the athlete).
  2. The judge qualified (*the athlete).
Note that "(dis)qualify" was one of the verbs we saw before in (12), but we didn't consider what would happen if there were a potential object:
  1. * We would not qualify John to go.
    * We would disqualify John to go.
Only "qualify" can act as a raising/control predicate, and only when it doesn't have an object. The other possibilities ("qualify" as a raising/control predicate with an object, and "disqualify" as a raising/control predicate with/without an object) are not possible.

Thus it might seem the syntactic property of licensing objects that "dis" interacts with, (23)-(24), is also related to the syntactic property of being a raising/control predicate, (12) & (25).


That's as much thinking as I've done on the topic, so I leave this to you, faithful reader, to ponder further.