I've been using JSON files to store data since I began learning Node, but I'm now working on big projects so I need 0% chance of losing data.
I had a problem where Node crashed while writing to a file and I was left with half a JSON file, but I guess this can be solved by writing to a temp file first and then moving it to where the original file is.
I like how easy it is to just use JSON files as databases. I can very easily read data, amend it, and save it in one line each. No packages to install and learn how to use. I try to use no NPM packages where possible because I like learning how to do things myself and time isn't an issue and I'm the only one working on these projects.
I also read a comment by someone who makes directories for arrays and individual files for objects instead of using one single JSON file per table.
If someone could enlighten me on why JSON files might not be a good idea for production, or if there are some things I should know before doing this, that would be really helpful.
Videos
You can use any single file, including a JSON file, like this:
Lock it somehow (google PHP file locking, it's possibly as simple as adding a parameter to file open function or changing function name to locking version).
Read the data from file and parse it to internal data stucture.
Optionally modify the data in internal data structure.
If you modified the data, truncate the file to 0 length and write new data to it.
Unlock the file as soon as you can, other requests may be waiting...
You can keep using the data in internal structures to render the page, just remember it may be out-dated as soon as you release the file lock and other HTTP request can modify it.
Also, if you modify the data from user's web form, remember that it may have been modified in between. Like, load page with user details for editing, then other user deletes that user, then editer tries to save the changed details, and should probably get error instead of re-creating deleted user.
Note: This is very inefficient. If you are building a site where you expect more than say 10 simultaneous users, you have to use a more sophisticated scheme, or just use existing database... Also, you can't have too much data, because parsing JSON and generating modified JSON takes time.
As long as you have just one user at a time, it'll just get slower and slower as amount of data grows, but as user count increases, and more users means both more requests and more data, things start to get exponentially slower and you very soon hit limit where HTTP requests start to expire before file is available for handling the request...
At that point, do not try to hack it to make it faster, but instead pick some existing database framework (SQL or nosql or file-based). If you start hacking together your own, you just end up re-inventing the wheel, usually poorly :-). Well, unless it is just programming exercise, but even then it might be better to instead learn use of some existing framework.
I wrote an Object Document Mapper to use with json files called JSON ODM may be a bit late, but if it is still needed it is open source under MIT Licence.
It provides a query languge, and some GeoJSON tools
I think your question really boils down to: When should I use a NoSQL approach vs. RDBMS? You settled on JSON early (a NoSQL-ish decision), perhaps because you've got Ajax consumers.
The answer of course to when to use NoSQL approaches vs. RDBMS's is basically about what type of data you're working with and what consumers you anticipate having. If your data is essentially relational (fairly flat hierarchies, no weird data types like images or audio, predictable relationships between the schemas that can be easily described in keys), and your consumers are anticipated to eventually include people who want to do Business Intelligence queries (ad hoc querying), then an RDBMS is the way to go. It's fairly easy to turn a query into a JSON representation, so it doesn't significantly burden your Ajax consumers -- it just adds a little transformation coding into your endpoints (REST/SOAP/whatever). Conversely, if your data is very hierarchical (deep schemas), contains weird data types like images, audio, video, etc., there are few relationships between entities, and you know that your end users will not be doing BI, then NoSQL/storing JSON may be appropriate.
Of course, even these general guidelines aren't rock solid. The reason Google developed Google File System, MapReduce (work which was used by Doug Cutting to build Hadoop at Yahoo) and later BigQuery (a NoSQL oriented [schemaless] way of managing large scale data) was precisely because they had a lot of ad hoc BI requests, and they couldn't get relational approaches to scale up to the tera/peta/exa/zetta/yotta scales they were trying to manage. The only practical approach was to scale out, sacrificing some of ad-hoc-query user friendliness that an RDBMS provides, and substituting a simple algorithm (MapReduce) that could be coded fairly easily for any given query.
Given your schema above, my question would basically be: Why wouldn't you use an RDBMS? I don't see much of a reason not to. Our profession is supposed to be engineering oriented, not fashion oriented, so our instinct should be to pick the easiest solution that works, right? I mean, your endpoints may have to do a little translation if your consumers are Ajaxy, but your data looks very flat and it seems likely that business users are going to want to do all kinds of ad hoc querying on things like music events (Which event was most attended within 50 miles of our capital city last year?)
'Go not to the elves for counsel, for they will say both no and yes.' -- Frodo
I believe there are more considerations here that you may not be looking for. There are two broad concerns here:
- Storage
- Search and Retrieval
Storage
There are plenty of opinions on why to use no-sql or RDBMS store for your data. One of the most important items that we thought was useful is that we can easily define and store json objects in storage without having to worry about defining it's full structure or relationship between different types of objects. Some of the other reasons to use a NoSql db would be ability to auto shard data, location based searches and easy maintenance. There are many good NoSql databases out there, my personal preference is MongoDB. However, if you have not used NoSql database before, there is a definite learnign curve as you learn to re-wire your mind. Most of us have been using RDBMS for a while now and it takes conscious effort to break out of that habit. Plus you will find yourself wanting to redo your data model as you proceed along with your efforts and have a better usnderstanding of concepts. If ability to refactor or remodel is not an options for your project I would suggest to stick with what you already know best.
Search
If you intend to provide any kind of search that is usable, I strongly suggest that you use a dedicated text search engine such as SOLR to perform your searches. Text searches are slow and if you have multiple shards then even more so slower. SOLR supports blazing fast text searches including weighted search params, location based searches and much more. SOLR however is not suited as a primary store of your data. This does mean that you will have to create mechanisms for dual insert and update to both you primary database and your SOLR layer when adding or updating events. Plus you will have to keep the SOLR later updated by removing any outdated/ended events.
Although this does seem like lot of extra work you will thank yourself for foresight of using a full text search engine later on. None of the NoSql databases or RDBMS come close to performance and agility of SOLR/Lucene.