My Story

I am an Environmental Scientist and Data Scientist with a decade of dedicated experience in the environmental sector. I am committed to leveraging my knowledge of geosciences, GIS, data science, machine learning, and contract management to tackle complex challenges.

Skills

Python, SQL, Regex, Markdown, Pandas, Matplotlib, Seaborn, BeautifulSoup, Scikit-Learn, NLTK, GIT, NumPy, virtual environments, Scala.
Machine Learning, KNN, Random Forest, Neural Networks, TensorFlow, Keras, Jupyter, Google Colab, DataBricks, Cluster Computing ETL.
ESRI products, ArcGIS Pro, ArcGIS Online, Remote Sensing, QGIS, contract management, report writing, editing, team leader.

(Photo by USGS)

Below are a few projects that showcase some of my skills in Data Science and GIS.

Reddit v OPENAI
Air Quality Parameters
SAT v ACT
Blossom Aquifer
Lipan Aquifer

Reddit vs OPENAI

For this project I attempted to simulate a task that might be asked of a data scientist to determine what proportion of users on a website are actual human users and how many are bots.

To do this, first I collected approximately 5000 question-answer pairs from Reddit using the Python Reddit API Wrapper. Then I fed the questions collected from Reddit to ChatGPT using the OpenAI API and collected the responses. Using the responses from Reddit and ChatGPT I trained multiple Natural Language Processing machine learning classifier models capable of predicting whether a response came from a human or AI.

I used pipelines and gridsearches to find optimal hyperparameter values for the various models I generated. Overall I fit 10 classification models: Multinomial Naive Bayes, Logistic Regression, Bernoulli Naive Bayes, and Linear Support Vector Classification with Countvectorizer preprocessing and Multinomial Naive Bayes, Logistic Regression, Gaussian Naive Bayes, K-nearest neighbors, and Random Forest with TFID Vectorizer.

See some of the results below.

(More information on the project repo)

Air Toxicity

​For this project colleagues and I participated in a Kaggle competition calling for machine learning models that could accurately predict the incidence of respiratory illnesses based on the concentration of numerous air quality parameters.

After cleaning the data and using various mehtods to impute missing values, we explored, tuned, and assesed the performance of numerous machine learning methods.

The best performing model in this case was RandomForestRegressor with mean imputed values in the training dataset. See below:

​​

(More information on the project repo)

Effect of SAT and ACT
on college admissions

Many colleges require applicants to submit ACT and SAT scores as criteria for admission. In this project I looked into whether states that opt for one exam over the other have different rates of college acceptance.

States have different participation rates in each of the exams, i.e. most states have a majority of its 11th and 12th graders to take one or the other exam. Most states tend to favor one exam over the other. A minority of states either use both tests approximately equally or have only a small number of its 11th and 12th grades students take one of the exams. SAT participation rate by state has a -0.54 correlation with college acceptance. ACT participation rate by state has a 0.56 correlation with college acceptance.

(More information on the project repo)

Blossom Aquifer Brackish
Groundwater Resources

Maps resulting from the characterization of the brackish groundwater resources of the Blossom Aquifer in Northeast, Texas.

Full report here

Lipan Aquifer Brackish
Groundwater Resources

Maps resulting from the characterization of the brackish groundwater resources of the Lipan Aquifer.

Full report here

Published works

Alan G. Andrews, P.G. and Andrea Croskrey, P.G. Brackish Groundwater Production Zone Recommendations for the Blossom Aquifer, Texas Open File Report 19-01

Mark C. Robinson, P.G., Matthew L. Webb, Jean Broce Perez, Alan G. Andrews, P.G. Brackish Groundwater in the Lipan Aquifer Area, Texas Report 384

Andrews, A., Hunt, Brian B., and Smith, Brian A., 2013, Hydrological and geochemical characteristics in the Edwards and Trinity hydrostratigraphic units using multiport monitor wells in the Balcones Fault Zone, Hays County, central Texas: Geological Society of America Abstracts with Programs. Vol. 45, No. 3, p. 91.

Brian B. Hunt, P.G., Alan G. Andrews, Brian A. Smith, Ph.D., P.G., Hydraulic Conductivity Testing in the Edwards and Trinity Aquifers Using Multiport Monitor Well Systems, Hays County, Central Texas. BSEACD Report of Investigations.

Hunt, B.B., Smith, B.A., Andrews, A., Wierman, D.A., Broun, A.S., and Gary, M.O., 2015. Relay ramp structures and their influence on groundwater flow in the Edwards and Trinity Aquifers, Hays and Travis Counties, Central Texas, in Doctor, D.H., L and, L., and Stephenson, J.B., eds., Proceedings of the 14th Multidisciplinary Conference on sinkholes and the engineering and environmental impacts of Karst: National Cave and Karst Research Institute (NCKRI) Symposium 5, p. 189-200.

Smith, B., Hunt, B., Andrews, A., Watson, J., Gary, M., Wierman, D., and Broun, A. 2015. Hydrologic Influences of the Blanco River on the Trinity and Edwards Aquifers, Central Texas, USA, in Hydrogeological and Environmental Investigations in Karst Systems, (Eds) B. Andreo, F. Carrasco, J. Duran, P. Jimenez, and J. LaMoreaux, Environmental Earth Sciences, Springer Berlin Heidelberg, Volume 1, pp 153- 161.

C. Andrews, G. A. Catania, J. L. Buttles, A. Andrews, M. Markowski, A physical model of ice sheet response to changes in subglacial hydrology, AGU Fall Meeting, San Francisco, California, December, 2010

MacGregor, J. A., Catania , G. A., Markowski, M., Andrews, A., Evolving ice fronts and surface speeds in the Amundsen Sea Embayment between 1972-2010, AGU Fall Meeting, San Francisco, California, December, 2010.

Contact

Please feel free to connect with me on LinkedIn or send me a message using the form below.

Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is superscript text and this is subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.


Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5
Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}

print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Alternate

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Ordered

  1. Dolor pulvinar etiam.
  2. Etiam vel felis viverra.
  3. Felis enim feugiat.
  4. Dolor pulvinar etiam.
  5. Etiam vel felis lorem.
  6. Felis enim et feugiat.

Icons

Actions

Table

Default

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Alternate

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Buttons

  • Disabled
  • Disabled

Form