Abstract: 
We consider Conditional random fields (CRFs) with patternbased potentials defined on a chain. In this model the energy of a string (labeling) (Formula presented.) is the sum of terms over intervals [i, j] where each term is nonzero only if the substring (Formula presented.) equals a prespecified pattern w. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.) where L is the combined length of input patterns, (Formula presented.) is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of Ye et al. (NIPS, 2009) whose complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.), where (Formula presented.) is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with nonpositive weights.
