Recently I wrote an article about “Measuring Attention” that discussed different principles and methods for logging your training progress, both regarding fitness and grappling.
AI has been a hot topic for a while now, and it doesn’t seem to be going away any time soon. Rather than being stuck within the confines of a specific app’s features, clunky spreadsheets, or inconvenient pen-and-paper notebooks, I decided to put some popular AI assistants to the test.
I prompted Grok (x.AI), ChatGPT (OpenAI), and Gemini (Google) with my training methodology (systems, definitions, and references) to see how they compared analyzing data and creating training plans. Links to the full chats are at the bottom of the article.
Regarding versioning, standard / default settings were used except where otherwise noted (e.g. “think / reason” or “deep research” modes).
GPT-40
Gemini Flash (experimental)
Grok (v3)
Evaluation Criteria (1-5, 3 = satisfactory):
GPTGeminiGrokSpeed of Response:354Intuition / Intelligence:434Functionality / Ease of Use:533Analysis / Data Evaluation:534Fitness Programming:435Grappling Programming:543Total:262123
See “Caveats and Updates” regarding why I chose Grok as my daily driver!
Speed of Responses:
GPT was, by far the slowest. Gemini and Gok were both pretty fast, though Grok slowed down a bit due to it’s verbose responses later on. Notably, GPT is probably the most well known and often used AI platform, so I did notice that if you caught it at a time when the servers were not busy, then it’s speed was comparable to the others. But when it was slow, it was really slow.
Intuition and Intelligence:
All of the platforms did a fine job when I started inputting my training systems and definitions (e.g. endurance, strength, development, rebound, etc.).
GPT was more intuitive when I prompted it with resources for inspiration (like FRC, Ollin, Westside, CrossFit, etc.). GPT also anticipated my programming preferences (i.e. gave me .yaml content for static site blogging), but it got a bit ahead of itself by including python scripts that were cluttered and unsolicited. Though, this behavior stopped when it wasn’t responded to.
Grok seemed to be a bit rigid with responses and often unnecessarily repeated itself. Gemini was somewhere in the middle in this regard, interpreting information correctly and giving decent responses that were relevant, but not particularly intuitive.
Functionality and Ease of Use:
GPT was able to inherit my training archives from GitLab while Grok and Gemini both told me I had to upload the files manually. When I asked GPT to create a review summary (see data analysis below) it didn’t catch all of the 100+ sessions, but it did pull 20+ of them for the review.
All three platforms pointed out when I missed items from pre- and post-session inventories (e.g. “you didn’t add sleep quality to your session…” or “this session is missing a support / cooldown segment, do you want to add one?”).
Grok did an old GPT trick (which doesn’t appear to be an issue for GPT in this experiment) of ramrodding me with it’s proposed content and solutions which lead to slower responses and detailed but distracting output.
Gemini automatically searched for the ADCC Open date when I told it that that’s what I was training for, which was a nice touch! Gemini also gave me the cleanest markdown output for exporting sessions to static website blogs which, for me, was nice as well.
Analysis and Data Evaluation:
Gemin’s analysis was, by far, the worst; only giving a brief narrative review. Grok’s review was the most detailed and had good statistical information (like average sleep hours and pointing out that “rebound” sessions correlated to lower soreness).
See notes below regarding follow up updates.
GPT’s review was clear and concise, offering similar information to Grok, and formatted it in a visually appealing output. It also included a markdown file without being prompted, picking up on my styling preferences.
Quality of Programming:
General Fitness:
For spitting out a full 8-week program in just a few seconds, GPT’s training sessions weren’t bad honestly; defaulting to one strength, one mobility, and one capacity session per week. Technically, it only programmed 2 x 1 week sessions so that the individual workouts would repeat through each 4-week block with the instructions to either add volume or load accordingly.
Gemini followed my methodology appropriately, but only gave suggestions for the sessions initially. When I prompted it to be more specific it pumped out a few AMRAP and straight set sessions, but was still pretty vague with “light stretching or foam rolling” and “standard RBND work” as the session content.
Grok’s programming was the best, providing specific movements (like shoulder or thoracic CARs) and provided load recommendations as well. I have to say, I’m kind of impressed. If anything, Gork’s rigidity and complexity only reflected back to me that sometimes I try to over-complicate things and do too much in a session – but the robot did what I asked it to do!
Grappling:
The same trends above continued here. GPT’s response was detailed enough to be useful and still leave enough room for individual adaptation. Likewise, the output format was clear and concise. It did make some specific movement suggestions, but they were limited to “common submissions like kimuras and armbars” which was congruent with the prompt.
Grok did a good job sticking to the structure I recommended, and provided specific technique drills (e.g. 10 reps of knee-cut pass); basically following the fitness template, but with grappling movements. This isn’t really a knock on Grok, a lot of academies still train this way, but I’ve found a general time frame focusing on quality of those reps is more important than an arbitrary number of reps or junk volume. This is especially true since my prompt was for a “recreational blue belt getting ready for their first local tournament.”
Gemini was surprisingly good here, though I had talked to it before about my ADCC, so that may have helped. It followed the proposed structure like Grok, but recommended time-based intervals and principled movements like GPT. The only slight downside was that the output content shared Gork’s clunky bullet points.
Summary:
Base on total criteria we have:
OpenAI ChatGPT: 26 points
aiX Grok: 23 points
Google Gemini: 21 points
Overall, all the platforms earned at least a “satisfactory” (3) rating from me in all scored categories – nothing was terrible or frustrating, and they all more-or-less “worked.” If you have a particular affinity or more contextual history with a given platform, turning on some of the advanced features and prompting the AI more specifically, you’d likely get similar results between them.
GPT seemed like the most useful and accurate overall. The biggest caveat would be speed. Though, in the overall scheme of things we’re talking about waiting a minute or two rather than a few seconds for a response. What stood out to me here too was the clear and concise formatting of responses – rather than walls of text and bullet points.
Grok was a solid contender and actually stole the show when it came to fitness programming. It was good enough that I may actually continue to use it for my online group training – with human oversight of course! The few complaints I had were ramrodding rigid structures – though as I mentioned above, this is partly what I asked it to do.
Gemini worked well enough, but just seemed to lack anything special as far as quality. It was, however, reliably the fastest to respond and had some appealing insights.
Overall it’s better to “hit slow” rather than “miss fast.”
Public Chat Links for Review:
Caveats and Updates:
There’s a short memory limit on GPT-4o which limits some of it’s useful functionality. However, it resets daily, so this may only be an issue for initial programming or heavy analysis. With “4o-mini” the results were much more similar to Grok and Gemini.
“Gemini 2.5 Pro” still couldn’t ready or analyze my historical logs from either the GitLab repository, or the corresponding GitLab page (public website).
“Gemini Deep Research” again couldn’t access an external site, just giving a summary, albeit with useful resources, definitions, and data organization suggestions.
“Grok + Think + Deeper Search + Concise Output (setting)” got bogged down when I prompted it to analyze my data and prompted me to try a different model.
“Grok + Think + Concise Output (setting)” was a huge upgrade. The concise output was much more readable and the analysis and recommendations surpassed GPT. GPT’s output still “looked prettier” on screen though if all you’re wanting is an immediate answer rather than data to export. With the “Think” function turned on Grok slowed down a bit, but responses were still under 1 minute; which I think is acceptable given the quality of responses. “Think”, like GPT-4o, has a credit system the renews daily, but again, this is only intermittently relevant.
DeepSeek (r1): Turning on it’s advanced and search functions yielded comparable results to Grok and GPT.
Like what you’re reading? Support my work by:
Subscribing on Substack for FREE!
Joining the Onward Fitness Community on Thinkific.
Shopping from my affiliates, Redmond Real Salt and Animal Pak Supplements (15% off).