|Topic||Publication||Summary / Abstract|
|An Introduction to Dynamic Data Quality Challenges||ACM Journal of Data and Information Quality (JDIQ) - 2017 January, Volume 8 Number 2, Pages 6:1-6:3, ACM Press (DOI: 10.1145/2998575)||We live in an evolving world. As time passes, data changes in content and structure, and thus becomes dynamic. Data quality, therefore, also becomes dynamic because it is an aggregate characteristic of data itself. Thus, our evolving world and Internet of Things (IoT) presents renewed challenges in data quality. IoT data is teeming with multivendor and multiprovider applications, devices, microservices, and automated processes built on social media, public and private datasets, digitized records, sensor logs, web logs, and much more. From intelligent traffic systems to smart healthcare devices, modern enterprises are inundated with a daily deluge of dynamic big data.|
|G* Studio: An Adventure in Graph Databases, Distributed Systems, and Software Development||ACM Inroads - 2016 June, Volume 7 Number 2, Pages 58 - 66, ACM Press (DOI: 10.1145/2896823)||The e-mail from the department chair was urgent. There were several graduate students with no classes to take. “Would somebody please run an independent study?” she asked. The semester was already a few days old. Alan had to strike fast. “I’m in,” he wrote, “I’ll put them to work on my graph database research.” With that, Alan and his new team, which would become known as the G-stars, began a two- semester adventure in graph databases, distributed systems, and software development that resulted in more than 8,000 lines of code over 520 Git commits. This is is their story.|
|The G* Graph Database: Efficiently Managing Large Distributed Dynamic Graphs||DAPD - The Springer Journal of Distributed and Parallel Databases - Volume 33, Issue 4, pp 479-514||
From sensor networks to transportation infrastructure to social networks, we are awash in data.
Many of these real-world networks tend to be large (``big data'') and dynamic, evolving over time.
Their evolution can be modeled as a series of graphs.
Traditional systems that store and analyze one graph at a time cannot effectively handle the complexity and subtlety inherent in dynamic graphs.
Modern analytics require systems capable of storing and processing series of graphs.
We present such a system.
G* compresses dynamic graph data based on commonalities among the graphs in the series for deduplicated storage on multiple servers. In addition to the obvious space-saving advantage, large-scale graph processing tends to be I/O bound, so faster reads from and writes to stable storage enable faster results. Unlike traditional database and graph processing systems, G* executes complex queries on large graphs using distributed operators to process graph data in parallel. It speeds up queries on multiple graphs by processing graph commonalities only once and sharing the results across relevant graphs. This architecture not only provides scalability, but since G* is not limited to processing only what is available in RAM, its analysis capabilities are far greater than other systems which are limited to what they can hold in memory.
This paper presents G*'s design and implementation principles along with evaluation results that document its unique benefits over traditional graph processing systems.
|A Demonstration of Query-Oriented Distribution and Replication Techniques for Dynamic Graph Data||23rd International World Wide Web Conference (WWW 2014)||Evolving networks can be modeled as series of graphs that represent those networks at different points in time. Our G* system enables efficient storage and querying of these graph snapshots by taking advantage of their commonalities. In extending G* for scalable and robust operation, we found the classic challenges of data distribution and replication to be imbued with renewed significance. If multiple graph snapshots are commonly queried together, traditional techniques that distribute data over all servers or create identical data replicas result in inefficient query execution.|
|Efficient Top-K Closeness Centrality Search||30th IEEE International Conference on Data Engineering (ICDE 2014)||Many of today's applications can benefit from the discovery of the most central entities in real-world networks. This paper presents a new technique that efficiently finds the K most central entities in terms of closeness centrality. Instead of computing the centrality of each entity independently, our technique shares intermediate results between centrality computations. Since the cost of each centrality computation may vary substantially depending on the choice of the previous computation, our technique schedules centrality computations in a manner that minimizes the estimated completion time. This technique also updates, with negligible overhead, an upper bound on the centrality of every entity. Using this information, our technique proactively skips entities that cannot belong to the final result. This paper presents evaluation results for actual networks to demonstrate the benefits of our technique.|
|Scalable and Robust Management of Dynamic Graph Data||First International Workshop on Big Dynamic Distributed Data (BD3) at the 39th International Conference on Very Large Data Bases (BD3@VLDB 2013)||Most real-world networks evolve over time. This evolution can be modeled as a series of graphs that represent a network at different points in time. Our G* system enables efficient storage and querying of these graph snapshots by taking advantage of the commonalities among them. We are extending G* for highly scalable and robust operation. This paper shows that the classic challenges of data distribution and replication are imbued with renewed significance given continuously generated graph snapshots. Our data distribution technique adjusts the set of worker servers for storing each graph snapshot in a manner optimized for popular queries. Our data replication approach maintains each snapshot replica on a different number of workers, making available the most efficient replica configurations for different types of queries.|
|Quickly Finding the k Most Central Entities in Large Networks||New England Database Summit 2013||Many of today's applications can benefit from the discovery of the most central entities in real-world networks. Researchers have been developing techniques for finding the k most central entities in a network where the centrality of an entity is defined as the inverse of the average shortest path length from that entity to other entities. These previous techniques compute the centrality of each entity using a traditional single-source shortest path algorithm and then select k entities with the highest centrality values. Given a large network, however, these techniques incur high computational overhead. Our technique overcomes the above limitation. A key principle of our technique is to materialize intermediate results while a vertex's centrality is computed, and then reuse those results to speed up the computation of another vertex's centrality.|
|A Demonstration of the G* Graph Database System||The 29th International Conference on Data Engineering (ICDE 2013)||G* meets new challenges in managing multiple graphs while supporting fundamental graph querying capabilities by storing graphs on a large number of servers while compressing them based on their commonalities. It also allows users to easily express queries on graphs and efficiently execute those queries by sharing computations across graphs.|
|Computational Finance with Map-Reduce in Scala||The 18th International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2012)||This paper presets results of computational finance experiments using actor-based map-reduce in Scala. In general we observe superlinear speedup, super-efficiency, and evidence for a high degree of compute and I/O overlap end-to- end for different hardware platforms. These results should be of interest to academic researchers as well as industry practitioners.|
|A Game Design & Programming Concentration within the Computer Science Curriculum||Proceedings of the 36th SIGCSE Technical Symposium on Computer Science Education ACM Press, Pages 545-550, 2005.||This paper describes initiatives to develop a Game Concentration in the undergraduate Computer Science curriculum. These initiatives contemplate recommendations for existing courses as well as adoption of new courses. (link)|
|Case Study: Oracle database development for the New York State Office of Mental Health||Proceedings of the Ninth Annual Institute on Mental Health Management Information||Summarizes the approach and development methods used to build an information system that reduced costs in time and money in its first year.|
|Core Concepts in Delphi||Series of articles for the Unofficial Newsletter of Delphi Users||Covers fundamental computer science and programming concepts and illustrates them in the object-oriented programming language Delphi.|
|Associate Professor||Compilers, Operating Systems, Graph and Relational Database Systems, Software Development||Marist College|
|Assistant Professor||Database Systems, Compilers, Operating Systems, Technology Entrepreneurship, Software Development||Marist College|
|Adjunct Professor||Compiler Design||Vassar College|
|Award Winner||2009 IBM Faculty Scholarship||IBM Scholars Program (link)|
|Sr. Professional Lecturer||Compilers, Functional Programming in Erlang and Scala, Operating Systems, Software Development Best Practices||Marist College|
|Invited Speaker||American Culture in an IT-Driven Society||Beijing University of Science and Technology|
|Award Winner||2005 IBM Eclipse Innovation Grant||IBM Scholars Program (link)|
|Invited Speaker||E-commerce Software Architecture and Implementation||College for Software Engineering, Graduate School of the Chinese Academy of Sciences|
|Professional Lecturer||E-commerce, Databases, Software Development, Compilers, Networking||Marist College|
|Adjunct Professor||Operating Systems||State University of New York at Westchester|
|Member||Curriculum Advisory Committee||State University of New York at Westchester|
|Guest Lecturer||Advanced Java Programming||Pace University|
|Adjunct Professor||Information and Data Management (Graduate)||Marist College|
|Adjunct Professor||Database Systems||State University of New York at Purchase|
|Adjunct Professor||Object-Oriented Programming in Java, Database Systems||Mount Saint Mary College|
|Language Study: Erlang||Undergraduate||Marist College|
|Introductory Programming with Games||Undergraduate||Marist College|
|Theory of Programming Languages||Undergraduate||Marist College|
|Operating Systems||Undergraduate/Graduate||Marist College|
|E-Commerce Development||Undergraduate||Marist College|
|Advanced Application Development||Undergraduate||Marist College|
|Compiler Design and Implementation||Undergraduate/Graduate||Marist College, Vassar College|
|Data Communications and Networks||Undergraduate/Graduate||Marist College|
|Operating Systems||Undergraduate||SUNY Westchester|
|Fundamentals of Database Systems||Undergraduate/Graduate||Mount Saint Mary College, SUNY Purchase, Marist College|
|Introduction to OOP in .Net||Undergraduate||Marist College|
|Language Study: ML||Undergraduate||Marist College|