My Story
I am an Environmental Scientist and Data Scientist with a decade of dedicated experience in the environmental sector. I am committed
to leveraging my knowledge of geosciences, GIS, data science, machine learning, and contract management to tackle
complex challenges.
Skills
Python, SQL, Regex, Markdown, Pandas, Matplotlib, Seaborn, BeautifulSoup, Scikit-Learn,
NLTK, GIT, NumPy, virtual environments, Scala.
Machine Learning, KNN, Random Forest, Neural Networks, TensorFlow, Keras, Jupyter, Google Colab, DataBricks, Cluster Computing
ETL.
ESRI products, ArcGIS Pro, ArcGIS Online, Remote Sensing, QGIS, contract management, report writing, editing,
team leader.
(Photo by USGS)
Below are a few projects that showcase some of my skills in Data Science and GIS.
Reddit v OPENAI
Air Quality Parameters
SAT v ACT
Blossom Aquifer
Lipan Aquifer
Reddit vs OPENAI
For this project I attempted to simulate a task that might be asked
of a data scientist to determine what proportion of users on a
website are actual human users and how many are bots.
To do this,
first I collected approximately 5000 question-answer pairs from Reddit
using the Python Reddit API Wrapper. Then I fed the questions collected
from Reddit to ChatGPT using the OpenAI API and collected the responses.
Using the responses from Reddit and ChatGPT I trained multiple Natural Language
Processing machine learning classifier models capable of predicting whether
a response came from a human or AI.
I used pipelines and gridsearches to find optimal hyperparameter values
for the various models I generated. Overall I fit 10 classification models:
Multinomial Naive Bayes, Logistic Regression, Bernoulli Naive Bayes, and Linear
Support Vector Classification with Countvectorizer preprocessing and Multinomial
Naive Bayes, Logistic Regression, Gaussian Naive Bayes, K-nearest neighbors, and
Random Forest with TFID Vectorizer.
See some of the results below.
(More information on the project repo)
Air Toxicity
For this project colleagues and I participated in a Kaggle competition calling for machine learning
models that could accurately predict the incidence of respiratory illnesses based on the concentration
of numerous air quality parameters.
After cleaning the data and using various mehtods to impute missing values, we explored, tuned, and assesed the performance of numerous
machine learning methods.
The best performing model in this case was RandomForestRegressor with mean imputed values in the training dataset.
See below:
(More information on the project repo)
Effect of SAT and ACT
on college admissions
Many colleges require applicants to submit ACT and SAT scores as criteria for admission.
In this project I looked into whether states that opt for one exam over the other have different rates of college acceptance.
States have different participation rates in each of the exams, i.e. most states have a majority of its 11th and 12th graders to take one or the other exam. Most states tend to favor one exam over the other.
A minority of states either use both tests approximately equally or have only a small number of its 11th and 12th grades students take one of the exams.
SAT participation rate by state has a -0.54 correlation with college acceptance.
ACT participation rate by state has a 0.56 correlation with college acceptance.
(More information on the project repo)
Blossom Aquifer Brackish
Groundwater Resources
Maps resulting from the characterization of the brackish groundwater resources of the Blossom Aquifer in Northeast, Texas.
Full report here
Lipan Aquifer Brackish
Groundwater Resources
Maps resulting from the characterization of the brackish groundwater resources of the Lipan Aquifer.
Full report here
Published works
Alan G. Andrews, P.G. and Andrea Croskrey, P.G. Brackish Groundwater Production Zone Recommendations for the
Blossom Aquifer, Texas Open File Report 19-01
Mark C. Robinson, P.G., Matthew L. Webb, Jean Broce Perez, Alan G. Andrews, P.G. Brackish Groundwater in the
Lipan Aquifer Area, Texas Report 384
Andrews, A., Hunt, Brian B., and Smith, Brian A., 2013, Hydrological and geochemical characteristics in the Edwards
and Trinity hydrostratigraphic units using multiport monitor wells in the Balcones Fault Zone, Hays County,
central Texas: Geological Society of America Abstracts with Programs. Vol. 45, No. 3, p. 91.
Brian B. Hunt, P.G., Alan G. Andrews, Brian A. Smith, Ph.D., P.G., Hydraulic Conductivity Testing in the Edwards
and Trinity Aquifers Using Multiport Monitor Well Systems, Hays County, Central Texas. BSEACD Report of
Investigations.
Hunt, B.B., Smith, B.A., Andrews, A., Wierman, D.A., Broun, A.S., and Gary, M.O., 2015. Relay ramp structures and
their influence on groundwater flow in the Edwards and Trinity Aquifers, Hays and Travis Counties, Central
Texas, in Doctor, D.H., L and, L., and Stephenson, J.B., eds., Proceedings of the 14th Multidisciplinary Conference on
sinkholes and the engineering and environmental impacts of Karst: National Cave and Karst Research Institute (NCKRI)
Symposium 5, p. 189-200.
Smith, B., Hunt, B., Andrews, A., Watson, J., Gary, M., Wierman, D., and Broun, A. 2015. Hydrologic Influences of the
Blanco River on the Trinity and Edwards Aquifers, Central Texas, USA, in Hydrogeological and Environmental
Investigations in Karst Systems, (Eds) B. Andreo, F. Carrasco, J. Duran, P. Jimenez, and J. LaMoreaux, Environmental
Earth Sciences, Springer Berlin Heidelberg, Volume 1, pp 153- 161.
C. Andrews, G. A. Catania, J. L. Buttles, A. Andrews, M. Markowski, A physical model of ice sheet response to
changes in subglacial hydrology, AGU Fall Meeting, San Francisco, California, December, 2010
MacGregor, J. A., Catania , G. A., Markowski, M., Andrews, A., Evolving ice fronts and surface speeds in the Amundsen
Sea Embayment between 1972-2010, AGU Fall Meeting, San Francisco, California, December, 2010.
Contact
Please feel free to connect with me on LinkedIn or send me a message using the form below.
Elements
Text
This is bold and this is strong. This is italic and this is emphasized.
This is superscript text and this is subscript text.
This is underlined and this is code: for (;;) { ... }
. Finally, this is a link.
Heading Level 2
Heading Level 3
Heading Level 4
Heading Level 5
Heading Level 6
Blockquote
Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.
Preformatted
i = 0;
while (!deck.isInOrder()) {
print 'Iteration ' + i;
deck.shuffle();
i++;
}
print 'It took ' + i + ' iterations to sort the deck.';
Lists
Unordered
- Dolor pulvinar etiam.
- Sagittis adipiscing.
- Felis enim feugiat.
Alternate
- Dolor pulvinar etiam.
- Sagittis adipiscing.
- Felis enim feugiat.
Ordered
- Dolor pulvinar etiam.
- Etiam vel felis viverra.
- Felis enim feugiat.
- Dolor pulvinar etiam.
- Etiam vel felis lorem.
- Felis enim et feugiat.
Icons
Actions
Table
Default
Name |
Description |
Price |
Item One |
Ante turpis integer aliquet porttitor. |
29.99 |
Item Two |
Vis ac commodo adipiscing arcu aliquet. |
19.99 |
Item Three |
Morbi faucibus arcu accumsan lorem. |
29.99 |
Item Four |
Vitae integer tempus condimentum. |
19.99 |
Item Five |
Ante turpis integer aliquet porttitor. |
29.99 |
|
100.00 |
Alternate
Name |
Description |
Price |
Item One |
Ante turpis integer aliquet porttitor. |
29.99 |
Item Two |
Vis ac commodo adipiscing arcu aliquet. |
19.99 |
Item Three |
Morbi faucibus arcu accumsan lorem. |
29.99 |
Item Four |
Vitae integer tempus condimentum. |
19.99 |
Item Five |
Ante turpis integer aliquet porttitor. |
29.99 |
|
100.00 |