Selected coding projects
A standard and set of python libraries for distributing
fast, random-access access to large textual collections using
the Apache Arrow format.
Fast, animated, interactive online maps that scales easily to
billions, not millions, of points using WebGL and Apache Arrow.
Stable Random Projection
General-purpose, lightweight dimensionality reduction for
book or article-length texts. A trick involving cryptographic hashes
makes it possible to use the same space for any language without a
pre-trained model or dictionary.
An R package for training and exploring word2vec models with a fluent
vocabulary taking advantage of R's ability to add, subtract, and perform
other vector-space models.
Pandoc Svelte Components
An implementation of pandoc's
rich document model as pandoc components to allow the creation
of rich interactive documents from markdown files.
Tools for tokenizing and visually exploring large textual collections
backed by an extremely fast MySQL architecture and served over the web
through an expressive API.
Document transformation scripts for writing talks and course lectures that
simultaneous generate their own slidedecks and outlines with identifying terms,
to keep everything aligned.