Given a string S supposedly consisting only of two different symbols,

count the number of runs(a sequence of the same character), and the number of occurrences of the two symbols.

For example, the string "aaaabbbbbbabababbbbbbbbbbbbbbbbbb" may be rewritten

as "aaaa|bbbbbb|a|b|a|b|a|bbbbbbbbbbbbbbbbbb". has 8 runs and 7 'a's and 26 'b's.

def countruns(S): # returns number of runs, and (m,n) in string S m =n = 0 runs = 1 a = S[0] b = None pc = a # previous character. for s in S: if s == a: m += 1 if pc != a: runs += 1 pc = a elif b is None: b = s n += 1 pc = s runs += 1 elif s == b: n += 1 if pc != b: runs +=1 pc = b else: raise ValueError, "in countruns():more than two diff. symbols." if m > n: m,n = n,m return runs, (m, n) S = "aaaabbbbbbabababbbbbbbbbbbbbbbbbb" print countruns(S)

When the above program runs it outputs 8,(7,26).

Suppose you have very large strings(length >1000000)?

The specifications are hazy. But one of them is correctness.

For an application, visit The Non-parametric Runs Test.

Since this is a Python problem, using a built-in module is valid. The code below has an added benefit of not limiting the number of symbols to two. The for loop can be removed (vectorized) by using two groupby() calls instead of one. BTW, indentation doesn't work. :(

ReplyDeleteimport itertools

def countruns(S):

# returns number of runs, and (m,n) in string S

runs = 0

counts = {}

for k, v in itertools.groupby(S):

runs += 1

counts[k] = counts.get(k, 0) + sum(1 for i in v)

return runs, counts.values()

S = "aaaabbbbbbabababbbbbbbbbbbbbbbbbb"

print countruns(S)

Thanks to your elegant and shorter Python solutio, Ianalis. Itertools is one module which one needs to be more familiar with as it has quite powerful functions.

ReplyDelete