[Discuss] No-SQL Database Recommendation?

John Hall johnhall2.0 at gmail.com
Sun Jan 11 21:12:37 EST 2015


Kent,

I'm not convinced you need a database. Writing your own simple persistence
layer can be really easy and reliable with Python.
If your critical queries have predicable parameters you could cache the
results periodically, or keep these results updated as data is recorded,
and record computations in a separate data series. You could use shelve to
persist a python dictionary to files for stored query results and and csv
to record data. Prehaps write your data to a file series with datetime
encoded in file and directory names. Shelve might not work so well when you
need to support multiple processes. Redis is a good way to create a shared
memory space between several processes.
​see:​
https://docs.python.org/3.4/library/shelve.html
https://docs.python.org/3.4/library/csv.html
http://try.redis.io


A shared directory of CSV files can be a decent interface for users. If
your data consists of a time series of numbers flat files can be very
reliable and user friendly. It's easy for users to load the data into a
spreadsheet, SageMath, or R to work with data on a workstation, and CSV
files are easy to migrate to a database with a script at a later time.

This sort of solution might only be practical if you can also predict and
store the result of the time-critical queries. How precise are times
stored, and queried? Is time in the input data or is it the time data was
reported? Is this a new solution or a replacement for something already
working?

If queries will always be by period then you could load the data for the
period with either date-time range as part of the file and path names, or
names  with serialized integers and keep a file that maps dates/times to
the sequence. If on the other hand you want to ask something like what is
the highest value recorded for CPU core temp on a Tuesday among the
workstations on the left side of the lab in the last five months then you'd
best go with the database.

I'm curious about what the intended application is, the OS distribution you
are using, hardware and other loads on that hardware. I'm concerned that
your Mongo issues might be due to some system bottleneck that you might run
into with another solution as well, anyway, try using "nice" to prioritize
your processes and "ionice" to make sure background processes and cron jobs
don't dominate your disk io.

What is your interface for data collection/ input. Is this automatically
being measured and reported by a hardware system? Do you have that set up
already. If you want to make this a web app I'd recommend Flask if you are
going to go with flat files or Django if you want to have a database or
need to create accounts for users and manage access rights and sessions.

If you decide you need a database you don't have to write sql if you don't
want to.. I assume you've heard about SQLAlchemy or the Django Model
classes. These are both object-relational mappers and completely automate
database definition. They allow you to just write Python. On the other hand
learning them may be more complicated then learning just enough SQL and the
python database api.
It seems that you may have an exaggerated sense of what is involved with
SQL. It is meant to be understandable to business people and database
management systems allow you to leverage highly optimized low level code
and efficient data structures and algorithms.  You definitely don't need to
write any stored procedures. You can learn SQL by experimenting with a
command line client or play with it from iPython. The Python db-api makes
it easy to execute a query you have in a string (usually triple quoted in
your code, or loaded from a .sql file. You pass the query and dict for
substitution parameters when you want to execute the query, then process
query results with a for-in loop as a list of objects/dicts. If you only
have one flat table it's just one "Create table" query/command to make the
table.  I'd start by playing with SQL in ipython using sqllite then go for
configuring PostgresSql.  https://docs.python.org/2/library/sqlite3.html
To get just the first few with SQL use a LIMIT clause on your queries:
http://www.postgresql.org/docs/9.4/static/queries-limit.html

I agree with others that if you need a database PostgreSQL would be a fine
solution and with Richard Pieri's original litmus tests on if you should
consider a nosql database.
If system resources like ram are short then consider staying with sqlite
and write a single daemon process to handle writes to the data. That will
be fairly reliable and add much less overhead than PostgreSQL. You can have
multiple processes query the data but you'd want just one process to write.
This would be similar for the csv solution.


Cheers + + + + + + +
John Hall


On Sun, Jan 11, 2015 at 5:58 PM, Kent Borg <kentborg at borg.org> wrote:

> On 01/11/2015 05:32 PM, Richard Pieri wrote:
>
>> If by "No-SQL" you mean "does not use SQL"
>>
>
> Just that I dislike SQL, I don't want to have to maintain a separate code
> base of stored procedures, I don't want to have to use Python to assemble
> SQL on the fly either.
>
> I would like something that has a nice Python API. But if I can't I can't.
> If the no-SQL part has to be discarded, so be it.
>
> I don't need relational features, but I don't mind if my database supports
> them. There is nothing in my data that needs a free-format key/value
> features. Maybe there is a nice Python interface to PostgreSQL that hides
> SQL bits, but I doubt it.
>
> I support the idea that irrational prejudices are irrational, but I also
> support the idea that aesthetically pleasing tools are good, make for
> better results, etc.
>
> -kb
>
>
> _______________________________________________
> Discuss mailing list
> Discuss at blu.org
> http://lists.blu.org/mailman/listinfo/discuss
>



More information about the Discuss mailing list