About
Activity
4K followers
Experience & Education
Publications
-
Optimization of molecules via deep reinforcement learning
Scientific Reports
We present a framework, which we call Molecule Deep Q-Networks (MolDQN), for molecule optimization by combining domain knowledge of chemistry and state-of-the-art reinforcement learning techniques (double Q-learning and randomized value functions). We directly define modifications on molecules, thereby ensuring 100% chemical validity. Further, we operate without pre-training on any dataset to avoid possible bias from the choice of that set. MolDQN achieves comparable or better performance…
We present a framework, which we call Molecule Deep Q-Networks (MolDQN), for molecule optimization by combining domain knowledge of chemistry and state-of-the-art reinforcement learning techniques (double Q-learning and randomized value functions). We directly define modifications on molecules, thereby ensuring 100% chemical validity. Further, we operate without pre-training on any dataset to avoid possible bias from the choice of that set. MolDQN achieves comparable or better performance against several other recently published algorithms for benchmark molecular optimization tasks. However, we also argue that many of these tasks are not representative of real optimization problems in drug discovery. Inspired by problems faced during medicinal chemistry lead optimization, we extend our model with multi-objective reinforcement learning, which maximizes drug-likeness while maintaining similarity to the original molecule. We further show the path through chemical space to achieve optimization for a molecule to understand how the model works.
Other authorsSee publication -
Can exact conditions improve machine-learned density functionals?
The Journal of Chemical Physics
Historical methods of functional development in density functional theory have often been guided by analytic conditions that constrain the exact functional one is trying to approximate. Recently, machine-learned functionals have been created by interpolating the results from a small number of exactly solved systems to unsolved systems that are similar in nature. For a simple one-dimensional system, using an exact condition, we find improvements in the learning curves of a machine learning…
Historical methods of functional development in density functional theory have often been guided by analytic conditions that constrain the exact functional one is trying to approximate. Recently, machine-learned functionals have been created by interpolating the results from a small number of exactly solved systems to unsolved systems that are similar in nature. For a simple one-dimensional system, using an exact condition, we find improvements in the learning curves of a machine learning approximation to the non-interacting kinetic energy functional. We also find that the significance of the improvement depends on the nature of the interpolation manifold of the machine-learned functional.
Other authorsSee publication -
Efficient prediction of 3D electron densities using machine learning
NeurIPS 2018 Workshop on Machine Learning for Molecules and Materials
The Kohn-Sham scheme of density functional theory is one of the most widely used methods to solve electronic structure problems for a vast variety of atomistic systems across different scientific fields. While the method is fast relative to other first principles methods and widely successful, the computational time needed is still not negligible, making it difficult to perform calculations for very large systems or over long time-scales. In this submission, we revisit a machine learning model…
The Kohn-Sham scheme of density functional theory is one of the most widely used methods to solve electronic structure problems for a vast variety of atomistic systems across different scientific fields. While the method is fast relative to other first principles methods and widely successful, the computational time needed is still not negligible, making it difficult to perform calculations for very large systems or over long time-scales. In this submission, we revisit a machine learning model capable of learning the electron density and the corresponding energy functional based on a set of training examples. It allows us to bypass solving the Kohn-Sham equations, providing a significant decrease in computation time. We specifically focus on the machine learning formulation of the Hohenberg-Kohn map and its decomposability. We give results and discuss challenges, limits and future directions.
Other authorsSee publication -
Tensor Field Networks: Rotation-and Translation-Equivariant Neural Networks for 3D Point Clouds
arXiv:1802.08219
We introduce tensor field neural networks, which are locally equivariant to 3D rotations, translations, and permutations of points at every layer. 3D rotation equivariance removes the need for data augmentation to identify features in arbitrary orientations. Our network uses filters built from spherical harmonics; due to the mathematical consequences of this filter choice, each layer accepts as input (and guarantees as output) scalars, vectors, and higher-order tensors, in the geometric sense…
We introduce tensor field neural networks, which are locally equivariant to 3D rotations, translations, and permutations of points at every layer. 3D rotation equivariance removes the need for data augmentation to identify features in arbitrary orientations. Our network uses filters built from spherical harmonics; due to the mathematical consequences of this filter choice, each layer accepts as input (and guarantees as output) scalars, vectors, and higher-order tensors, in the geometric sense of these terms. We demonstrate the capabilities of tensor field networks with tasks in geometry, physics, and chemistry.
Other authorsSee publication -
Bypassing the Kohn-Sham equations with machine learning
Nature Communications
Last year, at least 30,000 scientific papers used the Kohn-Sham scheme of density functional theory to solve electronic structure problems in a wide variety of scientific fields, ranging from materials science to biochemistry to astrophysics. Machine learning holds the promise of learning the kinetic energy functional via examples, by-passing the need to solve the Kohn-Sham equations. This should yield substantial savings in computer time, allowing either larger systems or longer time-scales to…
Last year, at least 30,000 scientific papers used the Kohn-Sham scheme of density functional theory to solve electronic structure problems in a wide variety of scientific fields, ranging from materials science to biochemistry to astrophysics. Machine learning holds the promise of learning the kinetic energy functional via examples, by-passing the need to solve the Kohn-Sham equations. This should yield substantial savings in computer time, allowing either larger systems or longer time-scales to be tackled, but attempts to machine-learn this functional have been limited by the need to find its derivative. The present work overcomes this difficulty by directly learning the density-potential and energy-density maps for test systems and various molecules. Both improved accuracy and lower computational cost with this method are demonstrated by reproducing DFT energies for a range of molecular geometries generated during molecular dynamics simulations. Moreover, the methodology could be applied directly to quantum chemical calculations, allowing construction of density functionals of quantum-chemical accuracy.
Other authors -
-
Lazy stochastic principal component analysis
IEEE International Conference on Data Mining Workshop
Stochastic principal component analysis (SPCA) has become a popular dimensionality reduction strategy for large, high-dimensional datasets. We derive a simplified algorithm, called Lazy SPCA, which has reduced computational complexity and is better suited for large-scale distributed computation. We prove that SPCA and Lazy SPCA find the same approximations to the principal subspace, and that the pairwise distances between samples in the lower-dimensional space is invariant to whether SPCA is…
Stochastic principal component analysis (SPCA) has become a popular dimensionality reduction strategy for large, high-dimensional datasets. We derive a simplified algorithm, called Lazy SPCA, which has reduced computational complexity and is better suited for large-scale distributed computation. We prove that SPCA and Lazy SPCA find the same approximations to the principal subspace, and that the pairwise distances between samples in the lower-dimensional space is invariant to whether SPCA is executed lazily or not. Empirical studies find downstream predictive performance to be identical for both methods, and superior to random projections, across a range of predictive models (linear regression, logistic lasso, and random forests). In our largest experiment with 4.6 million samples, Lazy SPCA reduced 43.7 hours of computation to 9.9 hours. Overall, Lazy SPCA relies exclusively on matrix multiplications, besides an operation on a small square matrix whose size depends only on the target dimensionality.
Other authorsSee publication -
Pure density functional for strong correlations and the thermodynamic limit from machine learning
Phys. Rev. B
We use the density-matrix renormalization group, applied to a one-dimensional model of continuum Hamiltonians, to accurately solve chains of hydrogen atoms of various separations and numbers of atoms. We train and test a machine-learned approximation to F[n], the universal part of the electronic density functional, to within quantum chemical accuracy. We also develop a data-driven, atom-centered basis set for densities which greatly reduces the computational cost and accurately represents the…
We use the density-matrix renormalization group, applied to a one-dimensional model of continuum Hamiltonians, to accurately solve chains of hydrogen atoms of various separations and numbers of atoms. We train and test a machine-learned approximation to F[n], the universal part of the electronic density functional, to within quantum chemical accuracy. We also develop a data-driven, atom-centered basis set for densities which greatly reduces the computational cost and accurately represents the physical information in the machine-learning calculation. Our calculation (a) bypasses the standard Kohn-Sham approach, avoiding the need to find orbitals, (b) includes the strong correlation of highly stretched bonds without any specific difficulty (unlike all standard DFT approximations), and (c) is so accurate that it can be used to find the energy in the thermodynamic limit to quantum chemical accuracy.
Other authorsSee publication -
Understanding kernel ridge regression: Common behaviors from simple functions to density functionals
Int. J. Quant. Chem.
Accurate approximations to density functionals have recently been obtained via machine learning (ML). By applying ML to a simple function of one variable without any random sampling, we extract the qualitative dependence of errors on hyperparameters. We find universal features of the behavior in extreme limits, including both very small and very large length scales, and the noise-free limit. We show how such features arise in ML models of density functionals.
Other authors -
-
Understanding machine-learned density functionals
Int. J. Quant. Chem.
Kernel ridge regression is used to approximate the kinetic energy of non-interacting fermions in a one-dimensional box as a functional of their density. The properties of different kernels and methods of cross-validation are explored, and highly accurate energies are achieved. Accurate {\em constrained optimal densities} are found via a modified Euler-Lagrange constrained minimization of the total energy. A projected gradient descent algorithm is derived using local principal component…
Kernel ridge regression is used to approximate the kinetic energy of non-interacting fermions in a one-dimensional box as a functional of their density. The properties of different kernels and methods of cross-validation are explored, and highly accurate energies are achieved. Accurate {\em constrained optimal densities} are found via a modified Euler-Lagrange constrained minimization of the total energy. A projected gradient descent algorithm is derived using local principal component analysis. Additionally, a sparse grid representation of the density can be used without degrading the performance of the methods. The implications for machine-learned density functional approximations are discussed.
Other authors -
-
Graded index photonic hole: Analytical and rigorous full wave solution
Physical Review B
See publicationWe present a rigorous full wave approach to the omnidirectional photonic hole
(PH), an optical system inspired by celestial phenomena and characterized by a radially
graded refractive index n (r)∼ 1/r α/2. It is analytically demonstrated that light capture is
effective for α≥ α c= 2. Our analyses are corroborated by precise numerical simulations of
steady-state and time-evolution behaviors.
Patents
-
Protecting devices from malicious files based on n-gram processing of sequential data
Issued US 15490797
Under one aspect, a method is provided for protecting a device from a malicious file. The method can be implemented by one or more data processors forming part of at least one computing device and can include extracting from the file, by at least one data processor, sequential data comprising discrete tokens. The method also can include generating, by at least one data processor, n-grams of the discrete tokens. The method also can include generating, by at least one data processor, a vector of…
Under one aspect, a method is provided for protecting a device from a malicious file. The method can be implemented by one or more data processors forming part of at least one computing device and can include extracting from the file, by at least one data processor, sequential data comprising discrete tokens. The method also can include generating, by at least one data processor, n-grams of the discrete tokens. The method also can include generating, by at least one data processor, a vector of weights based on respective frequencies of the n-grams. The method also can include determining, by at least one data processor and based on a statistical analysis of the vector of weights, that the file is likely to be malicious. The method also can include initiating, by at least one data processor and responsive to determining that the file is likely to be malicious, a corrective action.
Other inventorsSee patent
Projects
-
State Farm Distracted Driver Detection @ Kaggle.com
See projectUsing deep learning to detect drivers' distracted behaviors automatically from dashboard cameras.
- Rank 90th/1440. (top 7%)
- Because there are only 26 unique drivers in the training set, it is very easy to overfit. Two pre-trained model are used.
- Fine tuning the VGG-16 and VGG-19 network with different cross-validation strategy.
- Ensemble 4 best convolutional neural network models. -
Facebook V: Predicting Check Ins @Kaggle.com
See projectIdentify the correct place for check ins in an artificial world consisting of more than 100,000 places located in a 10 km by 10 km square.
- Rank 5th/1212.
- Write own framework code for easy and fast model selection and ensemble.
- Ensemble model of k-nearest neighbors, random forest, extra-trees, gradient boosting trees, naive bayes, kernel density estimation.
- Detail solution…Identify the correct place for check ins in an artificial world consisting of more than 100,000 places located in a 10 km by 10 km square.
- Rank 5th/1212.
- Write own framework code for easy and fast model selection and ensemble.
- Ensemble model of k-nearest neighbors, random forest, extra-trees, gradient boosting trees, naive bayes, kernel density estimation.
- Detail solution explanation:
https://www.kaggle.com/c/facebook-v-predicting-check-ins/forums/t/22112/5th-place-solution -
Expedia Hotel Recommendations @ Kaggle.com
Contextualize customer data and predict the likelihood a user will stay at 100 different hotel groups.
- Top 2%. Rank 40th/1974.Other creatorsSee project -
Home Depot Product Search Relevance @Kaggle.com
Predict the relevance of search results from product title, description, search_term and attribute files.
- Top 2%. Rank 44th/2125.
- Impute important data (e.g. brand, material...) by customized local dictionary from existing data. Improve the percentage of data having brand from 80.9% to 99.4%.
- Achieve professional and accurate spell correction by crawling google. Correct 13% typos in search term.
- Feature engineering from natural language. Including semantic analysis, word…Predict the relevance of search results from product title, description, search_term and attribute files.
- Top 2%. Rank 44th/2125.
- Impute important data (e.g. brand, material...) by customized local dictionary from existing data. Improve the percentage of data having brand from 80.9% to 99.4%.
- Achieve professional and accurate spell correction by crawling google. Correct 13% typos in search term.
- Feature engineering from natural language. Including semantic analysis, word vectors (from spacy and local data TF-IDF, bag of words), string distance (cosine similarity, Dice distance, Jacquard distance), statistics distance, cooccurrence. Inter-feature distributions and intra-feature distributions are considered for distance measure.
- Customize stratified cross validation. Reduce variance by over half (~0.0040 to ~0.0017).
- Take advantage of different models: gradient boosting tree (xgboost), neural network (keras), random forest (sklearn), ridge regression (sklearn) and lasso regression (sklearn).
- Optimize each model with automatic parameter selection processes (hyperopt).
- Ensemble by stacking metafeatures and important raw features. Metafeatures are from the prediction of 15 models and important raw features are 959 features with correlation to label greater than 0.05.
Other creatorsSee project -
Airbnb: New User Bookings @Kaggle.com
See projectPredict users' first booking destinations from user profiles and web sessions logs.
- Rank 43rd/1463. Top 2.9%
- Improved the accuracy of gradient boosting tree algorithms (xgboost) and random forest (sklearn) predictions by feature selection and engineering mainly on age, timestamp and sessions data.
- Apply n-gram, tf-idf, NMF and PCA to extract features from web sessions data.
- Ensemble model by bagging and stacking. -
Prudential Life Insurance Assessment: Classifying Risk @Kaggle.com
See projectDeveloping a predictive model that accurately classifies risk 1 - 8 from over a hundred variables describing attributes of life insurance applicants.
-Rank 158th/2613. Top 6%.
-For this data set, xgboost performance is very sensitive to hyperparameters. Apply stacking to eliminate to the sensitivity of parameter and reduce the risk of overfitting. Local cross validation scores improve from ~0.61 to ~0.64.
-As an ordinal regression problem, improves the offset optimization by 3 fold…Developing a predictive model that accurately classifies risk 1 - 8 from over a hundred variables describing attributes of life insurance applicants.
-Rank 158th/2613. Top 6%.
-For this data set, xgboost performance is very sensitive to hyperparameters. Apply stacking to eliminate to the sensitivity of parameter and reduce the risk of overfitting. Local cross validation scores improve from ~0.61 to ~0.64.
-As an ordinal regression problem, improves the offset optimization by 3 fold cross validation with back and forth scanning. Local score improves to ~0.688. Bagging 5 models with different random seeds to improve stability.
Honors & Awards
-
Kaggle Master
Kaggle
A Kaggle competitor with consistent and stellar competition results.
Consistency: at least 2 Top 10% finishes in public competitions
Excellence: at least 1 of those finishes in the top 10 positions -
The Regents’ Fellowship
University of California, Irvine
-
The Regents’ Fellowship
University of California, Irvine
-
Chinese National Scholarship
Ministry of Education of the People's Republic of China
Languages
-
English
-
-
Chinese
-
-
Shanghainese
-
Other similar profiles
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content