June 2009 Archives

Running Apple Mac OSX, and your system.log is filling up with

mdworker[473]: Unable to use font: no glyphs present.

/System/Library/Frameworks/ApplicationServices.framework /Frameworks/ATS.framework/Support/ATSServer[474]: Serious problems were found in font data while activating it.

/System/Library/Frameworks/ApplicationServices.framework /Frameworks/ATS.framework/Support/ATSServer[474]: You may encounter drawing or printing problems.

Well, it could be Spotlight trying to index a bad PDF file. To find the offending file with use lsof and the process id of the mdworker process

  lsof -p 473

In my case it was a PDF from the hadoop 20.0 release

Cascading and Coroutines

|0 comments |

Cascading looks quite interesting. Here is a python program that does something similar to the Technical Overview seen main in the python program.

    #!/usr/bin/env python
    # encoding: utf-8
    import sys

    def input(theFile, pipe):
        """
        pushes a file a line at a time to a coroutine pipe
        """
        for line in theFile:
            pipe.send(line)
        pipe.close()

    @coroutine
    def extract(expression, pipe, group = 0):
        """
        extract the group from a regex
        """
        import re
        r = re.compile(expression)
        while True:
            line = (yield)
            match = r.search(line)
            if match:
                pipe.send(match.group(0))

    @coroutine
    def sort(pipe):
        """
        sort the input on a pipe
        """
        import heapq
        heap = []
        try:
            while True:
                line = (yield)
                heapq.heappush(heap, line)
        except GeneratorExit:
            while heap:
                pipe.send(heapq.heappop(heap))

    @coroutine
    def group(groupPipe, pipe):
        """
        sends consectutive matching lines from pipe to groupPipe
        """
        cur = None
        g = None
        while True:
            line = (yield)
            if cur is None:
                g = groupPipe(pipe)
            elif cur != line:
                g.close()
                g = groupPipe(pipe)

            g.send(line)
            cur = line

    @coroutine
    def uniq(pipe):
        """
        implements uniq -c
        """
        lines = 0
        try:
            while True:
                line = (yield)
                lines += 1
        except GeneratorExit:
            pipe.send('%s\t%s' % (lines, line))

    @coroutine
    def output(theFile):
        while True:
            line = (yield)
            theFile.write(line + '\n')

    def main():
        input(sys.stdin,
            extract( r'^([^ ]+)',
                sort(
                    group( uniq,
                        output(sys.stdout)
                    )
                )
            )
        )

    if __name__ == '__main__':
        main()

You can achieve the same results with the unix command line:

cat  access.log | cut -d ' ' -f 1 | sort | uniq -c

Reading http://www.dabeaz.com/coroutines/ and thought this was a natural for a twitter client. Here is a pretty simple version that just prints the public timeline every 60 seconds. Next, up removing the time.sleep and scheduling the followStatus function as a task so I can follow more than one stream at a time.

    #!/usr/bin/env python
    # encoding: utf-8
    import time
    import twitter

    def coroutine(func):
        """
        A decorator function that takes care of starting a coroutine
        automatically on call.

        see: http://www.dabeaz.com/coroutines/
        """
        def start(*args,**kwargs):
            cr = func(*args,**kwargs)
            cr.next()
            return cr
        return start

    @coroutine
    def statusPrinter():
        """
        Just prints twitter status messages to the screen
        """
        while True:
             status = (yield)
             print status.id, status.user.name, status.text

    def followStatus(twitterGetter, target, timeout = 60):
        """
        Follows a twitter status message that takes a since_id
        """
        since_id = None
        while True:
            statuses = twitterGetter(since_id=since_id)
            if statuses:
                # pretty sure these are always in order
                since_id = statuses[0]
                for status in statuses:
                    target.send(status)
            # twitter caches for 60 seconds anyway
            time.sleep(timeout)

    def main():
        api = twitter.Api()
        followStatus(api.GetPublicTimeline, statusPrinter())

    if __name__ == '__main__':
        main()
 

About this Archive

This page is an archive of entries from June 2009 listed from newest to oldest.

July 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.