Topic Publication Summary / Abstract
An Introduction to Dynamic Data Quality Challenges ACM Journal of Data and Information Quality (JDIQ) - 2017 January, Volume 8 Number 2, Pages 6:1-6:3, ACM Press (DOI: 10.1145/2998575) We live in an evolving world. As time passes, data changes in content and structure, and thus becomes dynamic. Data quality, therefore, also becomes dynamic because it is an aggregate characteristic of data itself. Thus, our evolving world and Internet of Things (IoT) presents renewed challenges in data quality. IoT data is teeming with multivendor and multiprovider applications, devices, microservices, and automated processes built on social media, public and private datasets, digitized records, sensor logs, web logs, and much more. From intelligent traffic systems to smart healthcare devices, modern enterprises are inundated with a daily deluge of dynamic big data.
G* Studio: An Adventure in Graph Databases, Distributed Systems, and Software Development ACM Inroads - 2016 June, Volume 7 Number 2, Pages 58 - 66, ACM Press (DOI: 10.1145/2896823) The e-mail from the department chair was urgent. There were several graduate students with no classes to take. “Would somebody please run an independent study?” she asked. The semester was already a few days old. Alan had to strike fast. “I’m in,” he wrote, “I’ll put them to work on my graph database research.” With that, Alan and his new team, which would become known as the G-stars, began a two- semester adventure in graph databases, distributed systems, and software development that resulted in more than 8,000 lines of code over 520 Git commits. This is is their story.
The G* Graph Database: Efficiently Managing Large Distributed Dynamic Graphs DAPD - The Springer Journal of Distributed and Parallel Databases - Volume 33, Issue 4, pp 479-514 From sensor networks to transportation infrastructure to social networks, we are awash in data. Many of these real-world networks tend to be large (``big data'') and dynamic, evolving over time. Their evolution can be modeled as a series of graphs. Traditional systems that store and analyze one graph at a time cannot effectively handle the complexity and subtlety inherent in dynamic graphs. Modern analytics require systems capable of storing and processing series of graphs. We present such a system.

G* compresses dynamic graph data based on commonalities among the graphs in the series for deduplicated storage on multiple servers. In addition to the obvious space-saving advantage, large-scale graph processing tends to be I/O bound, so faster reads from and writes to stable storage enable faster results. Unlike traditional database and graph processing systems, G* executes complex queries on large graphs using distributed operators to process graph data in parallel. It speeds up queries on multiple graphs by processing graph commonalities only once and sharing the results across relevant graphs. This architecture not only provides scalability, but since G* is not limited to processing only what is available in RAM, its analysis capabilities are far greater than other systems which are limited to what they can hold in memory.

This paper presents G*'s design and implementation principles along with evaluation results that document its unique benefits over traditional graph processing systems.

A Demonstration of Query-Oriented Distribution and Replication Techniques for Dynamic Graph Data 23rd International World Wide Web Conference (WWW 2014) Evolving networks can be modeled as series of graphs that represent those networks at different points in time. Our G* system enables efficient storage and querying of these graph snapshots by taking advantage of their commonalities. In extending G* for scalable and robust operation, we found the classic challenges of data distribution and replication to be imbued with renewed significance. If multiple graph snapshots are commonly queried together, traditional techniques that distribute data over all servers or create identical data replicas result in inefficient query execution.
Efficient Top-K Closeness Centrality Search 30th IEEE International Conference on Data Engineering (ICDE 2014) Many of today's applications can benefit from the discovery of the most central entities in real-world networks. This paper presents a new technique that efficiently finds the K most central entities in terms of closeness centrality. Instead of computing the centrality of each entity independently, our technique shares intermediate results between centrality computations. Since the cost of each centrality computation may vary substantially depending on the choice of the previous computation, our technique schedules centrality computations in a manner that minimizes the estimated completion time. This technique also updates, with negligible overhead, an upper bound on the centrality of every entity. Using this information, our technique proactively skips entities that cannot belong to the final result. This paper presents evaluation results for actual networks to demonstrate the benefits of our technique.
Scalable and Robust Management of Dynamic Graph Data First International Workshop on Big Dynamic Distributed Data (BD3) at the 39th International Conference on Very Large Data Bases (BD3@VLDB 2013) Most real-world networks evolve over time. This evolution can be modeled as a series of graphs that represent a network at different points in time. Our G* system enables efficient storage and querying of these graph snapshots by taking advantage of the commonalities among them. We are extending G* for highly scalable and robust operation. This paper shows that the classic challenges of data distribution and replication are imbued with renewed significance given continuously generated graph snapshots. Our data distribution technique adjusts the set of worker servers for storing each graph snapshot in a manner optimized for popular queries. Our data replication approach maintains each snapshot replica on a different number of workers, making available the most efficient replica configurations for different types of queries.
Quickly Finding the k Most Central Entities in Large Networks New England Database Summit 2013 Many of today's applications can benefit from the discovery of the most central entities in real-world networks. Researchers have been developing techniques for finding the k most central entities in a network where the centrality of an entity is defined as the inverse of the average shortest path length from that entity to other entities. These previous techniques compute the centrality of each entity using a traditional single-source shortest path algorithm and then select k entities with the highest centrality values. Given a large network, however, these techniques incur high computational overhead. Our technique overcomes the above limitation. A key principle of our technique is to materialize intermediate results while a vertex's centrality is computed, and then reuse those results to speed up the computation of another vertex's centrality.
A Demonstration of the G* Graph Database System The 29th International Conference on Data Engineering (ICDE 2013) G* meets new challenges in managing multiple graphs while supporting fundamental graph querying capabilities by storing graphs on a large number of servers while compressing them based on their commonalities. It also allows users to easily express queries on graphs and efficiently execute those queries by sharing computations across graphs.
Computational Finance with Map-Reduce in Scala The 18th International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2012) This paper presets results of computational finance experiments using actor-based map-reduce in Scala. In general we observe superlinear speedup, super-efficiency, and evidence for a high degree of compute and I/O overlap end-to- end for different hardware platforms. These results should be of interest to academic researchers as well as industry practitioners.
A Browser-based Operating Systems project - JavaScript adventures in Dinosaur Slaying Inroads, the ACM SIGCSE Bulletin Volume 41, Number 4, Pages 71-75, ACM Press, December 2009. This paper presents one educator's experience with a browser-based project for an upper-level/graduate Operating Systems course. The author explains the project goals, why the browser in general and JavaScript in particular are so well suited for this task, challenges and their solutions, the incremental assignments that ultimately result in a fairly complex OS simulation by the end of the semester, the response to the project, and some ideas about where to go next. (link)
A Game Design & Programming Concentration within the Computer Science Curriculum Proceedings of the 36th SIGCSE Technical Symposium on Computer Science Education ACM Press, Pages 545-550, 2005. This paper describes initiatives to develop a Game Concentration in the undergraduate Computer Science curriculum. These initiatives contemplate recommendations for existing courses as well as adoption of new courses. (link)
Case Study: Oracle database development for the New York State Office of Mental Health Proceedings of the Ninth Annual Institute on Mental Health Management Information Summarizes the approach and development methods used to build an information system that reduced costs in time and money in its first year.
Core Concepts in Delphi Series of articles for the Unofficial Newsletter of Delphi Users Covers fundamental computer science and programming concepts and illustrates them in the object-oriented programming language Delphi.


Role Subject Institution
Associate Professor Compilers, Operating Systems, Graph and Relational Database Systems, Software Development Marist College
Assistant Professor Database Systems, Compilers, Operating Systems, Technology Entrepreneurship, Software Development Marist College
Adjunct Professor Compiler Design Vassar College
Award Winner 2009 IBM Faculty Scholarship IBM Scholars Program (link)
Sr. Professional Lecturer Compilers, Functional Programming in Erlang and Scala, Operating Systems, Software Development Best Practices Marist College
Invited Speaker American Culture in an IT-Driven Society Beijing University of Science and Technology
Award Winner 2005 IBM Eclipse Innovation Grant IBM Scholars Program (link)
Invited Speaker E-commerce Software Architecture and Implementation College for Software Engineering, Graduate School of the Chinese Academy of Sciences
Professional Lecturer E-commerce, Databases, Software Development, Compilers, Networking Marist College
Adjunct Professor Operating Systems State University of New York at Westchester
Member Curriculum Advisory Committee State University of New York at Westchester
Guest Lecturer Advanced Java Programming Pace University
Adjunct Professor Information and Data Management (Graduate) Marist College
Adjunct Professor Database Systems State University of New York at Purchase
Adjunct Professor Object-Oriented Programming in Java, Database Systems Mount Saint Mary College

Courses Developed

Topic Level Institution
Language Study: Erlang Undergraduate Marist College
Introductory Programming with Games Undergraduate Marist College
Theory of Programming Languages Undergraduate Marist College
Operating Systems Undergraduate/Graduate Marist College
E-Commerce Development Undergraduate Marist College
Advanced Application Development Undergraduate Marist College
Compiler Design and Implementation Undergraduate/Graduate Marist College, Vassar College
Data Communications and Networks Undergraduate/Graduate Marist College
Operating Systems Undergraduate SUNY Westchester
Fundamentals of Database Systems Undergraduate/Graduate Mount Saint Mary College, SUNY Purchase, Marist College
Introduction to OOP in .Net Undergraduate Marist College
Language Study: ML Undergraduate Marist College