more informationMore information: Niketan's Blog | Niketan's YouTube Channel | Niketan's LinkedIn Profile | Niketan's Github | Spark Technology Center
1. Datapath system
Data-centric is a purely-push based, research prototype database system. In DataPath, queries do not request data. Instead, data are automatically pushed onto processors, where they are then processed by any interested computation. It has been tested on multi-terabyte benchmark to show this basic design principle makes for a very lean and fast database system. Here is a video by my collaborator describing the system:
2. Online Aggregation for Large MapReduce Jobs:
In online aggregation, a database system processes a user’s aggregation query in an online fashion. At all times during processing, the system gives the user an estimate of the ﬁnal query result, with the conﬁdence bounds that become tighter over time. In this project, we built a system that does online aggregation over MapReduce environment for large-scale data analysis. Given the MapReduce paradigm’s close relationship with cloud computing (in that one might expect a large fraction of MapReduce jobs to be run in the cloud), online aggregation is a very attractive technology. Since large-scale cloud computations are typically pay-as-you-go, a user can monitor the accuracy obtained in an online fashion, and then save money by killing the computation early once sufﬁcient accuracy has been obtained. Here is video of the presentation I gave about this project:
3. SystemML on Spark:
1. Table Analysis Tools (TAT) for Cloud (Summer 2008 Internship Project@ SQL Server Data Mining)
TAT Cloud is a set of canned data mining tasks that you can use without having SQL Server installed on your machine. It consists of encapsulations of some common data mining problems, such as detecting key influencers, forecasting, generating predictive scorecards or doing market basket analysis. The tasks can be executedfrom browseras well asExcel 2007 (after installing TAT add-in).
2. SpokenWeb (Summer 2011 Internship Project @ IBM Research Lab)
3. Embedded Web Server using VxWorks Real Time Operating System (@ ECIL Hyderabad as part of PG Diploma)
Acts as a standalone web server with remote file system. Since it is booted via RS-232 (serial port), it does not require a hard disk.
4. Sure Serve (Server Monitoring Utility) (BE Final year Project @ Rediff)
Allows the administrator to monitor server performance based on the specified parameters. It comprises of modules (TCP, HTTP, Database and Application) that monitors major functional areas of a commonplace web server. It plots the parameter at real time using Multi Router Traffic Grapher.
1. Usage Reporting of Hotmail data (Data warehouse)
Gathers data directly from product teams, transform and load into data warehouse for aggregation, and generates reports for partners.
2. ERM (Employee Resource Management) website
An Ajax based web application by the means of which MAQSoftware manages its employee timesheet details, project resource allocation and report generation.
3. Crystal (for Swedish Sleep Institute)
To improve existing 14 legacy applications and migrate them to ASP.NET and SQL Server (so that they are accessible and housed under a single dashboard with a single sign-on).
1.yadmt (Yet Another Data Mining Tool):
Tool to find the best classifier for your dataset using statistical tests suggested in Machine Learning literature. The user of this tool does not need to know about inner workings of the classifiers or the statistical tests. The main goal of this tool is efficiency and load balancing. It is designed to work on a single server or on a cluster (that might be shared by multiple users), which is applicable for most research labs.
Voca is a desktop app that is designed to run in background, with minimal user interaction/interference, and that allows users to issue voice commands.
Link: Demo, Innovative algorithm to improve speech recognition
For entire list of my projects, visit my linkedin page.