There is a Lot said and asked about GridFS . In this post we will be going to have a Insight into GridFS feature of MongoDB.
What Exactly Is GridFS :
GridFS is an inbuilt feature of MongoDB through which it provides a Virtual File System for Storing Files , associating them with Mongo Documents(BSON Documents) and retrieving them when required.
Question : Why can’t we simply use the Bson documents to store the file?
Size : GridFS is basically for storing the files which are greater in size then 16MB.
Sync : When you want to keep your files and metadata automatically synced and deployed across a number of systems and facilities.
Platform Agnostic : Files can be stored alleviating many concerns of local file systems like maximum size of files in local directory.
Chunk Read Access : Suppose at a point you need just a part of your file rather then the complete file. As sometimes we need to see only a part of the video instead of the complete one. GridFS provides the capability of retrieving only that chunks of the file which is needed. So GridFS can be used to recall sections of files without reading the entire file into memory thus enhancing the performance.
FailOver And BackUp : As GridFS stores file in documents it has baked in support for replication and backup.
So after having a brief Idea of why we need GridFS let’s dive into the internals:
GridFS Collection :
GridFS stores files in two collections:
- chunks stores the binary chunks.
- files stores the file’s metadata.
GridFS places the collections in a common bucket by prefixing each with the bucket name. By default, GridFS uses two collections with names prefixed by fs bucket:
You can choose a different bucket name than fs, and create multiple buckets in a single database.
Each document in the chunks collection represents a distinct chunk of a file as represented in the GridFS store. Each chunk is identified by its unique ObjectId stored in its _id field. Default Size of a chunk is 256KB which can be modified.
Elements inside these two entities will give you a better idea:
In the chunks schema :
_id is the basic object id that is assigned to every chunk. file_id is the id of the file stored in fs.files which contain the metadata of the chunk. n is used for indexing and data contain the actual binary stream of data.
In the files schema :
We have different options which can be configured like chunk size and content type(For storing Mime type).
Metadata will contain the file info , MD5 will contain a hash which is useful to confirm that the chunks are stored correctly.
So this is how GridFS provides a storng , efficient and configurable way to handle big files in MongoDB.