python - Best database management system for nested sorting? -
i've got bunch of events need organize first location, time, organization of remaining attributes (duration, cost, description). problem there millions of events, when querying, need retrieve small section , should come out ordered, preferably third index (cost or duration).
eventually, application using database needs data ultra-fast , doing thousands of queries. unfortunately, we're bound traditional hard drive, data needs stored in order. won't updated (a few hundred writes per day, compared millions of reads per day).
we've tried mysql, indicies, takes 200ms locate portion of data need, because our hard drive has ton of seeks, if knows data is.
we've looked @ nosql solutions key-value stores (redis, couchdb), redis doesn't nesting , couchdb doesn't allow 'ordered sets' since stores in json.
what solutions there store based on 2 (or more) indicies? bonus points if has nice interface python!
without more exact description of problem can't much, i've solved problem using kd-trees, binary trees in k dimensions. allow really fast k-nearest neighbor searches (in case can query corpus of ~10 million documents latitude, longitude, , time in <1 ms.) real downside writing them annoying - keep performance have rebalance tree pretty frequently. if want give try, check out scipy.spatial.ckdtree module. assuming have scipy installed you'll , running in 10 minutes.
if you're looking more of off shelf database solution, consider postgis; let create spatial index on 2-4 dimensions. more reliable , (and more write-friendly) roll-your-own kd tree approach, @ expense of little bit of performance.
edit: i'm assuming here "location" mean geolocation (latitude, longitude). if it's discrete location "california" answer not helpful.
Comments
Post a Comment