Publications


Past Projects

Astrostatistics     Expository Writings  Machine Learning        Undergrad Thesis

Astrostatistics

1).Two Methods to Remove Blaze Function from Echelle Spectrum Orders

Python implementation of the Alpha-shape Fitting to Spectrum algorithm (AFS) and the Alpha-shape and Lab Source Fitting to Spectrum algorithm (ALSFS) proposed by Xin Xu et al (2019), whose paper is available here . The AFS and the ALSFS algorithms can be used to flatten the spectrum continuum, an important data analysis step in spectroscopic analysis.

Both algorithms use alphashape to approximate the shape of a spectrum and estimate its corresponding blaze function using local polynimail regression. The key difference between these two algorithms is that AFS does not require a lab source spectrum, whereas ALSFS is preferred when such a reference is available. You can check out a working demo of these two algorithms here .

2).Quasar variability

Quasars are highly variable astronomical sources and there are many statistical metrics(variability indices) that quantify how variable quasars are. We computed eleven different variability indices to compare the difference in variability between radio-quiet and radio- loud quasars. While most of the variability indices of these two populations are distributed in a similar manner, the results of Two-sample Kolmogorov-Smirnov test and Anderson-Darling test both suggest that Robust median statistics and SB variability metrics could both reject the null hypothesis (same population) at less than 1% level, even though the statistical distributions of these two statistical metrics look alike by the following histograms. This result may point to physical differences in accretion behavior between these two populations of quasars.

Robust Median
SB Index

Deep Learning and Medical Imaging

My recent medical imaging project involves building deep learning models to predict abnormality from patients' x-ray images. I implemented multiple customized Densenet and Resnet models and applied ensemble learning methods to improve prediction power. I tested our ensemble models on the Stanford MURA Dataset , and achieved 0.92 AUC and 0.75 Kappa in the test set. Our model demonstrated better prediction power than the baseline model (0.705 kappa) outlined by the original 2018 MURA paper .

Some sample saliency maps of our current deep learning models can be found here . Currently, we are working on training GAN models to perform image morphing in order to investigate the common characteristics of abnormal x-ray images.

Expository Writings in Probablity/Statistics

1) Robust Estimation of Wasserstein Distance via Factored Couplings

The goal of this project is to provide a detailed exposition of a recent paper on statistical optimal transport , in which the authors purpose a robust estimator to approximate Wasserstein distance between two probability distributions under high-dimensional sampling noise.

One challenging problem in optimal transport is to estimate the Wasserstein distances between two probability distributions, which can suffer greatly from the curse of dimensionality and sampling noise when adopting a naive plugin method. One interesting point about this paper is that instead of adding an entropic penalty to alleviate measurement noise, the authors impose an intuitive structural assumptions on possible couplings, which yield a more robust estimator of 2−Wasserstein distance.

2).Hausdorff Dimension and Fractal Properties of Brownian Path

Abstract: Hausdorff dimension is a convenient tool that provides a description of how much space a set occupies in d-dimensional Euclidean space. It is based on Hausdorff measure and has the advantage of being well-defined for every set. In this paper, we give an introduction to Hausdorff dimension and measure, along with multiple useful examples and techniques. In section 4, we investigate how Hausdorff dimension can be applied to the study of the fractal properties of Brownian Path. The mass distribution principle, the energy method, and the Frostman’s Lemma are introduced to determine the Hausdorff dimensions of zeros, range, and graph of Brownian path.

3).Convex Optimization on Fastest Mixing Markov Chain

I explored the problem of finding the transition probabilities on the edges of a graph that gives the fastest mixing Markov chain as proposed in Stephen Boyd, Persi Diaconis, and Lin Xiao . The novel part of this project is that I used weak duality to deduce the optimal transition probability matrix P* for star graphs analytically (See Conjecture 5.2 of my project paper). My proof of this conjecture was inspired by the proof a similar result for line graph . In addition, I present a detailed derivation of the dual problem of semidefinite program (see theorem 4.1), which the original authors have omitted.

Undergrad Theses in Math and Computer Science

1).The Chevalley-Warning Theorem: Its Proofs, Generalisations, and Applications

Abstract: The Chevalley-Warning Theorem states that if we are given a common zero of a system of polynomials with n variables over a finite field, and the sum of the degrees of each polynomial is smaller than n, the number of common zeros of these polynomials is a multiple of the characteristic of the field. In this paper, we give an introduction to Chevalley-Warning Theorem and walk through different proofs of the theorem. We also discuss how mathematicians have generalized this theorem and produced more powerful results recently. This paper analyzes the proofs of four distinct generalizations of Chevalley-type theorems. We give a collection of applications of Chevalley- Warning Theorem to different fields of mathematics, such as combinatorics, group theory, graph theory, and affine geometry.

2.) 3D Reconstruction

In this project, we implemented 3DKinect, an easy-to-use software that streamlines the seential components in the 3D reconstruction pipeline. Our software has five major functionalities: (a) upload, visualize, capture, and save point cloud data. (b) view recorded RGB-D image and use mouse to rotate and zoom the point cloud data on screen; (c) change the size and color of the points on screen; (d) edit and crop existing point clouds. (e) stitch multiple point cloud data together(using the Iterative Closest Point Algorithm) and demonstrate the process of registration step by step.

You can also check the poster of our project.

The following video is a short demo of 3D registration of a still chair using our 3DKinect software.