coca flex (variable length) queries

 

LIST display: flex (variable length) queries

You can now do searches where there are a variable number of "slots". For example, the search:

PUT (NOUN){3} away  (click to run the query)

would find strings with PUT at the beginning and away at the end, with up to three words between, at least one of which has to be a NOUN. In other words, it would do the following seven searches, one right after another, and would then display the results for all of the searches on one page.

  Searches (done one right after another) Matching strings

1

PUT     away

put away  (no words in between)

2

PUT  NOUN  away

put toys away

3

PUT  * NOUN  away

put the toys away

4

PUT  NOUN *  away

put toys far away

5

PUT  * * NOUN  away

put the fun toys away

6

PUT  * NOUN *  away

put the toys far away

7

PUT  NOUN * *  away

put toys and crayons away

In terms of search syntax, note that:

1. {n} indicates the number of words (0 to n) that can be in this "variable length" string. Valid numbers are 1, 2, or 3 (in other words, the longest variable length string is three words)

2. If you don't indicate {n} -- for example (NOUN) -- then it would be just one word -- meaning that it will be either that one word or nothing

3. Any "slot" without parentheses around it is obligatory. For example, put * away would not match put away, since * doesn't have parentheses around it.

4. You can't include multiple "flex" operators in a search. For example, they (VERB+}{2} notice (NOUN){3} would not be possible.

The following are some additional searches. They produce interesting results in the one billion word COCA corpus), but the results in other corpora may not be as good. In each case, we show a few sample matching strings, and some strings that would not be generated by the search (and why not).

Sample search (click to run) What WOULD be matched What would NOT be matched
might (*) know

might know
might never know

might never really know (without {}, matches at most one word)
 

was (really) interesting

was interesting (really is optional)
was really interesting

was very interesting (not really)
was not really interesting (too many words)

BE (NEG) worried

is worried (NEG is optional)
are n't worried

is really worried (not NEG)
is n't so worried (two words, search is max of 1)

made (*){3} money

made more money ( {3} means 0-3 words)
made
 a lot more money (max of 3 words)

made quite a bit of money (4 words; max of 3)

take * (NOUN){2} away

take it away (it from *, which is not optional; no other words from {2}, since 0-2 words)
take the money away (the from *, money (one slot) from {2})
take even more money away (even from *, more money (two slots) from {2})

take away (* forces at least one word)
take it quickly away (no NOUN)
take even more easy money away (more easy money = 3 words)

(VERB+){3} NOTICE_v

was noticing
had never even noticed (VERB+ matches any verb, including do, be, have; VERB is only lexical verbs)

sometimes notice (no VERB+)
had never even ever noticed (4 words; max of 3)

 Some additional notes:

1. Because a "flex search" had involve up to seven different searches (see above), there are some limits on the number of flex searches in a given 24 hour period. For those who do not have a premium or academic license, there is a limit of five flex searches in 24 hours. Those who do have a license can do up to 50 flex searches in a 24 hour period.

2. Again, because of the number of searches that are done in a flex search, it would take a long time to do these searches if all of the "slots" are high frequency. This can be a real limitation in very large corpora like NOW (19+ billion words) or iWeb (14 billion words). So a search like HAVE (ADJ){3} time probably won't work in those corpora -- HAVE and time are too high of frequency. In a case like this, you will probably need to do these as a series of separate searches -- HAVE time, HAVE * time, HAVE * ADJ time, etc. But again, this should be a problem with a small corpus like the BNC.

 

posted @ 2024-10-12 17:01  hrdom  阅读(5)  评论(0编辑  收藏  举报