Consider sorting n numbers. In the worst case, any algorithm will take O(n log  n) time to sort them. However, if the input is known to be uniformly distributed then we can sort in linear time. It would be cool to have an algorithm which performs optimally for "all" inputs without a priori knowledge of the distribution. That is, though in the worst case it takes O(n log n) time and if the input is uniformly distributed then it sorts in linear time. Such algorithms are what we call distribution sensitive algorithms. We investigated the development of distribution sensitive algorithms for the problem of finding maximal vectors.

 

This is work in collaboration with Sandeep Sen.

 

 

Most models of qualitative reasoning depend upon qualitative representations of quantity that make the necessary and relevant distinctions for the reasoning task at hand. Automatically generating such abstractions from numerical models has been pointed out to be a practically significant and potentially difficult problem. We work with a black box model that relates an output variable with known landmarks to a set of input variables for which the landmarks need to be determined. For most instances of problems of practical significance, the input space is too large to be exhaustively examined. We present a simple randomized scheme for discovering landmarks which performs surprisingly well in time that is only polylogarithmic in the input size.

This was work in collaboration with Praveen Paritosh and Reuben Thomas

Selected Past Projects

The goal of the project was to estimate various metrics like throughput, goodput, losses and delays between two communicating nodes in the internet from an intermediate node lying in the communication path passively (ie without introducing additional traffic in the network).

 

We designed and implemented algorithms for the purpose on the actual network to compare the estimated metrics with the actual.

 

This is work in collaboration with Rajeev Shourie. This was my project during my summer internship in IBM, IRL in 2001.

Distribution Sensitive Algorithms

Algorithms for Passive Network Monitoring

Sub-linear Algorithms for Landmark Discovery from Black Box Models

Given the adjacency information of n elements in a circular order, our goal is to reconstruct the order. If there are errors in the adjacency data, then the problem is NP-hard in the general case. However, if the adjacency information is derived from each

element reporting some numerical values as identifiers for itself and its front neighbor, then the reconstruction of the original order may be possible in polynomial time. This setting arises naturally in NMR data interpretation for determining protein structures.

 

This is work in collaboration with Ming-Yang Kao.

Research

I am working on  problems in motif finding from multiple sequences, NMR data interpretation for protein structure determination, a graph theoretical problem with application in inferring protein-protein interactions, and DNA Word Design.

For wireless/mobile applications, I am interested in developing incentive compatible algorithms that recognize that the individual machines are owned by autonomous self-interested (or selfish) agents and not a centralized agency. Hence, cooperation of the agents to a protocol cannot be taken for granted.

There has been interesting research in this direction using ideas from Game Theory but most of the work has required incorporation of payment mechanisms which are rather artificial.

This is work in collaboration with Fabian Bustamante.

My research interests center Design and Analysis of Algorithms. In particular, I am interested in novel techniques for analyzing algorithms that work well in practice but may not be amenable to worst-case or average case analysis. One such technique is Model Based Analysis of Algorithms where we assume the input to be generated by some probabilistic model rather than an adversary. I believe this technique  holds good promise for a number of problems arising in Data Mining, Bio-Informatics, Computational E-Commerce, Pattern Matching etc.

 

Research Interests

Current Research Projects

The goal here is to develop content based searching of musical information as opposed to standard meta-data based search. That is, instead of searching for musical files by some meta-data like the name of the composer or the title, we want to enable searching by the actual musical information. In particular we are working on Query by Humming systems where the user hums the tune of the song he/she is interested in.

 

We are developing algorithms which allow query into polyphonic musical data. This translates into a sequence alignment problem where the query sequence is to be aligned with multiple sequences.

 

This is work in collaboration with Bryan Pardo.

Bio-Informatics

Musical Information Retrieval

Incentive Compatible Algorithms for P2P Systems

NMR Data Interpretation

Manan Sanghi

This was is a part of the ASSET (Automated SynthesiS of Embedded sysTems) project in the Computer Science Department at IIT Delhi and aimed at development of an application framework for providing a common platform for facilitating the use of methodological approach developed by the ASSET team and integration of various tools developed during the execution of the project.

Deliverable: A GUI based analysis and visualization tool implemented in Java

This was work in collaboration with Ashish Gupta

Integrated Framework for Visualization and Analysis of Embedded Systems

Given their exponentially increased propagation speed, it is crucial to detect zero-day polymorphic worms in the early stages of infection.  We propose Hamsa, a network-based automated signature generation system for polymorphic worms which is fast, noise tolerant and provably attack resilient. Essentially, we propose a realistic model to analyze the invariant content of polymorphic worms which allows us to make analytical attack resilience guarantees for the signature generation algorithm. 

 

This is work in collaboration with Zhichun Li, Yan Chen and Ming-Yang Kao.

Digital copies of information goods are indistinguishable from the originals and can be created and distributed almost costlessly. Existing theory and practice fail to provide clear guidance on how digital information goods should be priced, an issue of increasing importance as the Internet provides the infrastructure for a major marketplace for electronic information.

We are developing pricing mechanisms which could yield revenue maximizing prices for such goods when the distributions of valuations is not known to the seller and the buyers are strategic in the sense that they will hide their true valuations if that would allow them to get the good at a lower price.

Online Sale of Digital Goods

Polymorphic Worm Signature Generation

For an updated compilation of my graduate research projects, click here

                                                                                                             -  January 25, 2006