feat: Add floating recorder overlay#151
Conversation
Add a default-off V2 floating recorder overlay that starts after the app has opened, controls recording through the existing recording service, and supports overlay-based rename flow when the rename-after-recording setting is enabled. Persist overlay and rename dialog positions, localize all overlay strings, and add disc-style save feedback with a continuous three-second rainbow-to-grey confirmation animation. Add focused tests for overlay permissions, settings decisions, geometry, save feedback colors, and preference persistence. Agentic harness: OpenCode with OpenAI GPT-5.5 (openai/gpt-5.5).
Style the floating rename overlay from the V2 dark-theme preference so dark mode uses a dark panel with white bold filename text, and light mode uses a white panel with black bold filename text. Add focused tests for rename overlay dark and light style selection. Agentic harness: OpenCode with OpenAI GPT-5.5 (openai/gpt-5.5).
|
I personally tested all the changes, everything works to me. I will be using it extensively, so I will see if there are any rough edges, but I already cleared up most of the polishing I think (the first commit is in fact a squash merge of 5 or 6 different commits done locally). Let me know if you have any feedback, I'll try to update asap! |
Persist the floating recorder overlay diameter and restore it with display-aware clamping so the button keeps a user-selected size across app restarts. Add tested geometry helpers for size bounds, saved-size clamping, and proportional record-disc scaling, then wire two-finger pinch handling into the overlay service while suppressing tap and drag during resize. Agentic harness: OpenCode with OpenAI gpt-5.5.
Add system SpeechRecognizer dictation to the floating rename overlay, with a large mic button that shows the persisted append/replace mode and a long-press popup to change it. Persist the rename speech mode, sanitize and normalize recognized text, and cap the visible filename to 251 characters so the hidden extension stays within the filename budget. Add focused tests for filename composition and preference persistence, plus Android package visibility and mic resources for recognition providers such as FUTO Voice Input when exposed as a RecognitionService. Agentic harness: OpenCode with OpenAI gpt-5.5.
Replace the embedded SpeechRecognizer backend with a transparent RecognizerIntent proxy Activity so activity-based providers such as FUTO Voice Input can handle rename dictation like Chromium. Keep the existing large mic button, persisted long-press append/replace mode, and filename truncation while delegating recognition UI, permissions, and lifecycle to the recognizer app instead of managing low-level callbacks. Agentic harness: OpenCode with OpenAI gpt-5.5.
|
I added a few more quality of life features such as being able to resize the overlay record button and a big microphone button to be able to rename files by voice (it uses the system defined voice transcriptor, an open-source one that is multilingual and works well for this purpose even without Google Play Services is FUTO Voice Input using the Whisper models). I think I won't add any more features, because I am pretty satisfied with how this turned out. But I will extensively test the feature in the upcoming days so maybe I will come back on my words ;-) |
|
Ah I just saw #152 where you add the possibility to add notes in audio files, that is an amazing coincidence! I thought about doing this too to allow for the voice transcription to have more leeway rather than being added to the filename with limited characters. But I did not want to introduce too many changes in the core with my PR, I conceived it more like a companion set of functions that are plugged over the core to provide overlay functionalities. So that's why I tried to separate the overlay features into distinct files as much as possible (but it's not possible for everything because we have to set permissions, a new option in the settings, translations of the displayed test, etc). So once you merge #152, if you agree that's a better idea, I can make the voice transcription save the result as a comment inside the audio file instead of renaming the file (or allow the user to choose). |
Make the floating rename overlay save action more prominent and turn the previous default-name shortcut into a non-saving reset action. The reset action now restores the original record name, clears inline feedback, and leaves the cursor at the end so mistakes can be corrected without closing the overlay. Agentic harness: OpenCode with OpenAI gpt-5.5.
|
Here is a video of the feature (sorry for the french in the settings menu): SVID_20260619_203305_1-high_3.mp4And here is a debug build for those who want to try already (I provide no guarantee whatsoever, only that there are no viruses to my knowledge and my account is associated to my real identity and I work in OSS since a long time): https://github.com/lrq3000/AudioRecorder/releases/tag/v0.0.2 |
|
Ah and I forgot to say that if the option to ask for filenames after recording is disabled in the settings, the record button overlay also won't ask for renaming the file, so if you want to go to the most minimalistic setting, you get an overlay button that you tap once to record, and another tap to stop recording (and you can always rename later). |
Avoid focusing the rename text field or opening the soft keyboard when the floating rename overlay appears. Keep speech append and replace working by updating the filename field directly, and keep reset from focusing the text field so GPS/navigation views remain unobstructed. Agentic harness: OpenCode with OpenAI gpt-5.5.
|
Thanks for adding the flating overlay. I'll review and test it later. I don't mind adding this feature. |
I was thinking about the voice transcription feature. But this feature looks complicated to me. To implement voice recognition, it might require using API (which I don't want to do for now) or using a library that might significantly increase app size. Anyway, if you feel that you can implement the Voice Transcription feature, feel free to create a PR. I believe it is possible to combine the Record Notes feature and Voice Transcription. I tested Google's Recorder app with the voice transcription feature; it works, but with very few languages🤷♂️ |
|
For the voice transcription don't worry I was mindful about this, it delegates to the app set to be the default voice transcription by calling a RecognizerIntent, just like what Chromium for Android does when tapping the microphone icon in the url bar, so that AudioRecorder does not handle anything about speech recognition, it's just an Intent call. I tested today extensively while driving and it worked wonderfully well, the voice transcription was particularly helpful to tag the filenames to find them easily later. I juste noticed a minor issue when stopping a record in-app with the overlay enabled (it spawns both the in-app rename dialog and the overlay dialog). I will fix this tomorrow when I will access my computer. Most of the code is to add the overlay and manage asking for the permission to do that and guide the user where to give the permission in Android parameters. |
|
Btw I used FUTO Voice Input, it works for all languages supported by Whisper so that's a lot, i personally tested english and french and even combing both and it works very well. Just make sure to go into the setting to download the biggest, multilingual "slow" model. I have been using Whisper based transcription since it went out, it still not perfect but it is accurate much more often than it is wrong. A big common factors that drastically reduce the transcription accuracy for whisper is background noise. There are new models that work incredibly well despite heavy background noise but they are not made into an easy android app yet to my knowledge. Meanwhile, I have found that bring the phone microphone closer to the mouth and articulating well results in a big accuracy jump, I did specific tests in front of various rollercoasters and to my surprise it resulted in about a 90% accuracy even with complex jargon words in my scientific discipline ! Anyway all that is to say that Whisper multilingual via FUTO Voice Input works already very well to be perfectly usable for practical everyday voice transcription imho even though it is running locally. I use it everyday to transcribe notes because I don't have Google Play Services on my old Huawei phone. It's just a bit slow so it's good for a paragraph but not for a 10 pages brainstorming which is what I use AudioRecorder to do, I record and transcribe offline later :-) |
Prevent the floating rename overlay from opening after recordings stopped from the in-app record controls. Track whether the current stop request came from the overlay, and route post-recording rename UI through a shared policy so in-app stops show the app dialog while overlay stops show the overlay dialog. Agentic harness: OpenCode with OpenAI gpt-5.5.
Add a third floating rename mic mode that appends recognized speech to the record description as a new line while leaving the rename dialog open. Persist the new mode through the existing rename speech mode preference and save notes through updateRecordDescription so the existing saveDescriptionToFile setting controls COMMENT tag embedding. Agentic harness: OpenCode with OpenAI gpt-5.5.
Add a compact one-line visible description field to the floating rename overlay, update it when speech is captured in description mode, and save pending filename and description changes together. Rename speech mode labels so filename and description targets are explicit in the mic long-press menu. Agentic harness: OpenCode with OpenAI gpt-5.5.
Use an overlay-specific short description hint and clear the default EditText minimum height, vertical padding, font padding, and scrollbar so the description draft field stays one visible line high. Add localized short hint strings for all supported resource folders and cover the compact field configuration with a focused unit test. Agentic harness: OpenCode with OpenAI gpt-5.5.
Speech transcripts used for overlay filename append or replace could still contain characters that are invalid on common desktop filesystems, causing non-portable generated filenames. Sanitize filename speech against cross-platform reserved characters, trailing Windows-invalid dots or spaces, and reserved Windows device names while preserving punctuation for audio-note transcripts. Agentic harness: OpenCode, OpenAI GPT-5.5.
Manual in-app and floating-overlay rename saves could still pass unsafe filename characters through different paths, even after speech input was sanitized. Add a shared V2 filename cleaner and call it from overlay save, overlay speech filename drafts, home active-record rename, and records-list rename so invalid characters are silently removed at save time. Agentic harness: OpenCode, OpenAI GPT-5.5.
|
Ok I think this PR is now completed. I added the support for inputting a description by voice in the overlay instead of renaming the filename (both option can be selected by holding the microphone button), and I added a filename sanitization to ensure filenames are compatible across OSes. Please let me know if you would like me to implement anything else @Dimowner , and thank you once again for this amazing piece of software you have made and maintained! Latest apk : https://github.com/lrq3000/AudioRecorder/releases/tag/v0.0.3 |
This PR adds an optional floating recorder overlay, so that recordings can be enabled and stopped (with saving) while using other apps, for example overlaid on a GPS map, or a book reader.
This overlay is disabled by default.
When enabled, it will follow the setting for whether to ask to rename after stopping the recording, if so, another transparent overlay will appear to rename the file, and the position of this renaming dialog overlay can also be moved and will be memorized between instanciations.
On tapping, files are saved anyway (so if you tap in quick successions without renaming, the file is still saved with the default name).
When tapping to stop a recording, the overlay displays a color wheel animation for about 3 seconds to signal clearly to the user that the file was saved, so this can be seen at a glance without even looking at the screen.
This PR was AI-assisted with OpenCode using ChatGPT-5.5 (High reasoning) and with custom agentic coding instructions (minimize changes, literal programming, avoid redundant and bad patterns, etc).
It's a feature I wanted since a long time, I could finally make it using the latest models although it churned a lot of credits! Hopefully it can be useful to others.
Thank you so much for this incredible audio recording app, I am still using it all the time for many purposes but especially to record my own thoughts and later convert using VibeTranscribe.
AI-generated summary of the code changes
User-Facing Changes
Savebutton and a lightweightResetaction.Resetfrom focusing the text field or opening the keyboard.Code Changes
FloatingRecorderOverlayServiceto render and manage the floating recorder overlay window.AudioRecordingServiceso it can start and stop recordings while staying synchronized with recording state.PrefsV2/PrefsV2Impl, including overlay size and rename speech mode.RenameSpeechModeto model append versus replace behavior for speech-based rename input.FloatingRenameSpeechRecognitionActivityas a transparent proxy activity forRecognizerIntent.ACTION_RECOGNIZE_SPEECH.AndroidManifest.xmlwith overlay service/activity declarations and speech recognition query support.AudioRecordingServiceto track whether a recording was started from the overlay and whether the stop request came from the overlay.RecordingStoppedRenamePolicyto centralize post-stop rename surface selection.HomeViewModelto use the shared rename policy so in-app rename dialogs are suppressed for overlay stop events.FloatingRecorderOverlayServiceto use the shared rename policy so overlay rename panels are shown only for overlay stop events.Implementation Motivation
RecognizerIntentthrough a transparent activity because Android’s speech UI is better handled by the platform activity contract than by a long-lived service-levelSpeechRecognizerso that AudioRecorder does not have to micro-manage the technical intricacies of processing speech recognition, we delegate to another app and just get back the textual transcription.HomeViewModelandFloatingRecorderOverlayServiceconsistent, preventing duplicate rename surfaces and making the expected behavior testable.Tests units ran
All tests passed.