Hello, everyone!

I have reached the end of final coding period and it’s time for the Final evaluation. In these last weeks of final coding phase I worked on changing the functions that they now work successfully now. Tests have been made for these functions. I would say one thing that I underestimated the importance of processing and cleaning data. I started giving importance to code style during this journey.

As this is the final blog post, this is what happened during this whole program with the details of the project.

Data Retriever: Add support for more raw data formats:

The Data Retriever handles tabular data and spatial data forms. The goal of the project was to add support that will enable the Data Retriever platform to have the capability of ingesting other forms of raw data. The project introduce the support for raw data formats of XML, JSON, NetCDF, HDF, Excel, SQlite and Geojson data sources.

  1. The First part of the project mainly comprised of adding functions for converting raw data sources. These raw data sources were XML, JSON, NetCDF, HDF, Excel, SQlite and Geojson. This included searching for the raw data sources in the respective formats.
  2. The second part comprised of testing unit functions for the ability to transform the raw data into ready to ingest data. Scripts were also made to test these functions and to also set out the rules for writing the data packages. The data packages specified the type of raw data to be converted.
  3. In the final part, all the scripts used for testing were shifted to retriever-recipies.
  4. Documentations were added for these functions.
  5. We decided to leave out NetCDF to csv conversion function as most of the datasets of NetCDF contain a more complex structure. Some of the tables have complex arrays that need much more processing to do. To compensate the exclusion of NetCDF to csv function, we worked on HDF5 engine which is working without any issues currently.
  6. I made the necessary changes to the two Retriever wrapper repositories: Retriever.jl and rdataretriever.

During the course of the project, I mainly contributed to two repositories: retriever and retriever-recipes. The list of commits made in the two repositories:

List of major pull requests made for the project:

I also wrote a number of blogs writing about the work done and experience during the program here

In the end, I would like to thank my mentors Henry Senyondo, Apoorva Pandey and Ethan White for giving me this opportunity to work with them. I would also like to thank Harshit Bansal, Ratin Kumar and Bo zheng as their contributions helped me with some of the work. A special thanks to Henry Senyondo for his constant support and guidance. I might have lost focus to complete this project without his guidance.

During this program I learned a lot of things like debugging code, processing & cleaning special data types like HDF5, GeoJSON etc and making tests. I also learned a lot about Docker and Travis CI. I’ve started giving more importance to open source when I had to use open source tools and library while working on the project.

I plan to continue contributing to the project in future as well as other Open source projects. Thank You everyone.