Of course the complete bigram table won’t fit into memory. If we keep only bigrams that
appear 100,000 or more times, that works out to a little over 250,000 entries, which does
fit. We can then estimate P(down | sit) as Count(sit down)/Count(sit). If a bigram does not
appear in the table, then we just fall back on the unigram value. We can define cPw, the
conditional probability of a word given the previous word, as: