Hoang Bui's Homepage

Data Management for Exascale Scientific Applications

My current research focuses on optimizing coupled scientific workflow applications on high-end computing platforms. My work enables in-situ/in-transit execution of user-defined data analysis operations as part of the coupled simulation-analysis workflow, and employs the data-aware task mapping and scheduling approach to reduce the amount of network data movement.

Publication:

  • Jin, Tong and Zhang, Fan and Sun, Qian and Romanus, Melissa and Bui, Hoang and Parashar, Manish,
    Towards autonomic data management for staging-based coupled scientific workflows ,
    Journal of Parallel and Distributed Computing. 2020
  • Zhang, Fan and Jin, Tong and Sun, Qian and Romanus, Melissa and Bui, Hoang and Klasky, Scott and Parashar, Manish,
    In-memory staging and data-centric task placement for coupled scientific simulation workflows ,
    Concurrency and Computation: Practice and Experience. 2017
  • Romanus, Melissa and Zhang, Fan and Jin, Tong and Sun, Qian and Bui, Hoang and Parashar, Manish and Choi, Jong and Janhunen, Saloman and Hager, Robert and Klasky, Scott and others,
    Persistent data staging services for data intensive in-situ scientific workflows ,
    Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing. 2016
  • Sun, Qian and Jin, Tong and Romanus, Melissa and Bui, Hoang and Zhang, Fan and Yu, Hongfeng and Kolla, Hemanth and Klasky, Scott and Chen, Jacqueline and Parashar, Manish,
    Adaptive data placement for staging-based coupled scientific workflows ,
    Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2015
  • Tong Jin, Fan Zhang, Qian Sun, Hoang Bui, Norbert Podhorszki, Scott Klasky, Heman Kolla, Jackie Chen, Robert Hager, C.S. Chang, Manish Parashar,
    Exploring Data Staging Across Deep Memory Hierarchies for Coupled Data Intensive Simulation Workflows,
    In Proc. of the 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS'15) , Hyderabad, India, May 2015.
  • Ciprian Docan, Fan Zhang, Tong Jin, Hoang Bui, Qian Sun, Julian Cummings, Norbert Podhorszki, Scott Klasky, Manish Parashar,
    ActiveSpaces: Exploring dynamic code deployment for extreme scale data processing”,
    In Concurrency and Computation: Practice and Experience 2014 , CCPE November 2014.
  • Qian Sun, Fan Zhang, Tong Jin, Hoang Bui, Kesheng Wu, Arie Shoshani, Hemanth Kolla, Scott Klasky, Jacqueline Chen, Manish Parashar,
    Scalable Run-time Data Indexing and Querying for Scientific Simulations,
    In Big Data Analytics: Challenges and Opportunities (BDAC-14) Workshop at Supercomputing Conference, New Orleans, Louisiana, U.S.A., November, 2014.
  • Solomon Lasluisa, Fan Zhang, Tong Jin, Ivan Rodero, Hoang Bui, Manish Parashar,
    In-situ feature-based objects tracking for data-intensive scientific and enterprise analytics workflows,
    In Cluster Computing , September 2014.
  • Tong Jin, Fan Zhang, Qian Sun, Hoang Bui, Norbert Podhorszki, Scott Klasky, Heman Kolla, Jackie Chen, Robert Hager, C.S. Chang, Manish Parashar,
    Leveraging Deep Memory Hierarchies for Data Staging in Coupled Data Intensive Simulation Workflows,
    In IEEE Cluster 2014 , Madrid, Spain, September, 2014.
  • T. Jin, F. Zhang, Q. Sun, H. Bui, M. Parashar, H. Yu, S. Klasky, N. Podhorszki, H. Abbasi,
    Using Cross-Layer Adaptations for Dynamic Data Management in Large Scale Coupled Scientific Workflows,
    ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) , Denver, Colorado, U.S.A., November, 2013.
  • Fan Zhang, Solomon Lasluisa, Tong Jin, Ivan Rodero, Hoang Bui and Manish Parashar,
    In-situ Feature-based Objects Tracking for Large-Scale Scientific Simulations,
    International Workshop on Data-Intensive Scalable Computing Systems, SC12 November, 2012, Salt Lake City, Utah, USA.
  • F. Zhang, C. Docan, H. Bui, M. Parashar, S. Klasky,
    XpressSpace: A Programming Framework for Coupling PGAS Simulation Codes,
    Concurrency and Computation: Practice and Experience, 2013.

  • Distributed Storage

    For my PhD dissertation, I developed ROARS (Rich Object ARchive System), a distributed storage system for scientific repositories. ROARS utilizes a cluster filesystem for data archiving and a SQL database for persistent metadata caching. ROARS is currently used for BXGRID, a biometric data repository at Notre Dame.

    Publication:

  • Hoang Bui, Peter Bui, Patrick Flynn and Douglas Thain,
    ROARS: A Robust Object Archival System for Data Intensive Scientific Computing,
    Distributed and Parallel Databases, Springer, August, 2012,
  • Hoang Bui,
    A Rich Metadata Filesystem for Scientific Data,
    Ph.D. Thesis, University of Notre Dame, May, 2012,
  • Hoang Bui, Peter Bui, Patrick Flynn and Douglas Thain,
    ROARS: A Scalable Repository for Data Intensive Scientific Computing,
    The Third International Workshop on Data Intensive Distributed Computing at ACM HPDC 2010,
  • Hoang Bui, Diane Wright, Clarence Helm, Rachel Witty, Patrick Flynn and Douglas Thain,
    Towards Long Term Data Quality in a Large Scale Biometrics Experiment,
    Managing Data Quality for Collaborative Science at ACM HPDC 2010, June, 2010.
  • Hoang Bui, Michael Kelly, Christopher Lyon, Mark Pasquier, Deborah Thomas, Patrick Flynn, and Douglas Thain,
    Experience with BXGrid: A Data Repository and Computing Grid for Biometrics Research,
    Journal of Cluster Computing, 12(4), pages 373, April, 2009.
  • Hoang Bui, Deborah Thomas, Michael Kelly, Christopher Lyon, Douglas Thain, and Patrick J. Flynn,
    Poster: BXGrid: A Data Repository and Workflow Abstraction for Biometrics Research,
    IEEE International Conference on e-Science, pages 394-395, December, 2008.
  • Christopher Moretti, Hoang Bui, Karen Hollingsworth, Brandon Rich, Patrick Flynn, and Douglas Thain,
    All-Pairs: An Abstraction for Data Intensive Computing on Campus Grids,
    IEEE Transactions on Parallel and Distributed Systems, 21(1), pages 33-46, January, 2010.

  • Image Processing

    For my master thesis, I worked on reconstructing fragments of objects using 2D scan. My research advisors were Dr. Catherine V. Stringfellow and Prof. Richard P. Simpson.

    Publication:

  • Hoang Bui, YuChun Peng, Catherine Stringfellow, Richard Simpson, and Jeffery Hood,
    Matching 2D Fragments of Objects,
    Proceedings of the Int'l Conference on Computers and Their Applications in Industry and Engineering (CAINE-2008), November, 2008, Honolulu, Hawaii, USA, pp. 150-156.

  • Wireless WEB Caching

    As an undergraduate student , I worked with Dr. Nelson Luiz Passos to better understand content delivery in a wireless environment, and to come up with new approach for a more efficent integrated caching and content delivery.

    Publication:

  • R. Zuck, A. Williams, B. Kair, H. Bui, C. Stringfellow, and N. L. Passos,
    Adjusting Web Caching Computers to Reduce Communication Channel Allocation,
    ,Proceedings of the ISCA 17th International Conference on Computer Applications in Industry and Engineering, November, 2004, Orlando, Florida, pp. 17-20.
  • R. Zuck, A. Williams, B. Kair, H. Bui, C. Stringfellow, and N. L. Passos,
    Network Centric Improvements to Resource Caching,
    Proceedings of the IEEE Consumer Communications and Networking Conference, January, 2005, Las Vegas, Nevada, pp. 202-205.(nominated for the best paper award)

  • Loop Transformations

    My very first research experience was working with Dr. Nelson Luiz Passos . I researched new mothods to transform any uniform nested loops so they can be executed in parallel at instruction level

    Publication:

  • P. Xue, H. Bui, A. Joseph, and N. L. Passos,
    Modeling and Retiming Non-Uniform Acyclic Loops,
    ,Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2003), November, 2003, Marina Del Rey, CA, pp. 328-332.
  • P. Xue, H. Bui and N. L. Passos,
    Instruction Level Parallelism of Non-Uniform Acyclic Loops,
    ,Proceedings of the 15th Annual CCSC South Central Conference, in the Journal of Computing in Small Colleges, April 2004, Austin, TX, pp. 279-286.
  • K. P. Mayfield, A. Joseph, S. Black, H. Bui, and N. L. Passos,
    Scheduling Multi-Dimensional Loops in a Computer Cluster,
    in the Proceedings of the 19th International Conference on Computer Applications in Industry and Engineering, November, 2006, Las Vegas, NV, pp. 78-82.