![]() ![]() Specifically, the files will be uploaded to the platform through a API REST, later they will be temporarily stored on an SFTP server that will act as an entry point to the architecture, once processed they will become part of the distributed HDFS file system and, on the other hand, its content and metadata will be stored in a MongoDB collection that can be reviewed later. ![]() Large-scale file storage with high availability and fault tolerance.Extraction of the content of the files.In general terms, I would like to comment on the main challenges to be solved in this architecture: ![]() ![]() It is an interesting challenge, as it is necessary to properly combine several specialized technologies in a very specific task. M ore specifically, the goal is to be able to search using a specific text term to get all the files that contain at least one occurrence of it in their content. The main objective of the project is to implement a scalable processing flow that allows the content of all types of files of any size to be extracted and then indexed, as a result of, we will have the ability to perform full-text searches and be able to quickly locate the files that have a concrete content. In this article I would like to show you in detail a project in which I have been working on a personal level with the aim of deepening the use of Apache Nifi technology, widely used for the implementation of ETL flows. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |