نتائج البحث

يعرض 1 - 10 نتائج من 4,698 نتيجة بحث عن '"Fu, Jie"', وقت الاستعلام: 0.90s تنقيح النتائج

النتائج لكل صفحة

فرز بـ

تحديد الصفحة | بالمحدد:

تحديد النتيجة رقم 1
1

دورية أكاديمية

Comparison study on rock breaking characteristics of disc cutters under coupled static–dynamic loads and static loads

المؤلفون: Lin, Laikuang, Xia, Yimin, Zhang, Xuhui, Yi, Liang, Fu, Jie

المصدر: Comptes Rendus. Mécanique, Vol 351, Iss G1, Pp 1-15 (2023)

مصطلحات موضوعية: TBM, Disc cutter, Coupled static–dynamic loads, Rock breaking efficiency, Cutting force, Materials of engineering and construction. Mechanics of materials, TA401-492

الوصف: Rock cutting methods for disc cutters with coupled static–dynamic loads and static loads are explored and compared through rock breaking experiments to improve the TBM excavation efficiency. Results indicate that the rock breaking characteristics, including rock debris, cutting force, and rock breaking efficiency, significantly varies with different cutting methods. The average size of rock fragmentation produced under coupled static–dynamic loads is 1.6 times larger than that under static loads. The cutting forces of disc cutter under the coupled static–dynamic loads are larger than those under static loads when the cutting depth $(h)$ is lower than 4 mm, whereas is contrary when $h$ exceeds 4 mm. The specific energy of disc cutter under the coupled static–dynamic load is approximately 1.5 times smaller than that under the static load, indicating the cutting method with the coupled static–dynamic load can significantly improve the cutting performance. There is an optimal cutter spacing $(S)$ for the cutter under each cutting method. The optimal $S$ under the coupled static–dynamic loads is larger than that with static loads. This study provides new insights into improving the tunneling efficiency in high-strength rock conditions.

وصف الملف: electronic resource

العلاقة: https://comptes-rendus.academie-sciences.fr/mecanique/articles/10.5802/crmeca.163Test/; https://doaj.org/toc/1873-7234Test

الوصول الحر: https://doaj.org/article/4b94d1c261cb478ea75789a8ba6c1cc5Test

View record in DOAJ

عرض رمز QR

أضف إلى السلة حذف من سلة الكتب
أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 2
2

تقرير

A Closer Look into Mixture-of-Experts in Large Language Models

المؤلفون: Lo, Ka Man, Huang, Zeyu, Qiu, Zihan, Wang, Zili, Fu, Jie

مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Machine Learning

الوصف: Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance, especially for language tasks. By sparsely activating a subset of parameters for each token, MoE architecture could increase the model size without sacrificing computational efficiency, achieving a better trade-off between performance and training costs. However, the underlying mechanism of MoE still lacks further exploration, and its modularization degree remains questionable. In this paper, we make an initial attempt to understand the inner workings of MoE-based large language models. Concretely, we comprehensively study the parametric and behavioral features of three recent MoE-based models and reveal some intriguing observations, including (1) Neurons act like fine-grained experts. (2) The router of MoE usually selects experts with larger output norms. (3) The expert diversity increases as the layer increases, while the last layer is an outlier. Based on the observations, we also provide suggestions for a broad spectrum of MoE practitioners, such as router design and expert allocation. We hope this work could shed light on future research on the MoE framework and other modular architectures. Code is available at https://github.com/kamanphoebe/Look-into-MoEsTest.

الوصول الحر: http://arxiv.org/abs/2406.18219Test

View record in Arxiv

عرض رمز QR

أضف إلى السلة حذف من سلة الكتب
أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 3
3

تقرير

Unlocking Continual Learning Abilities in Language Models

المؤلفون: Du, Wenyu, Cheng, Shuang, Luo, Tongxu, Qiu, Zihan, Huang, Zeyu, Cheung, Ka Chun, Cheng, Reynold, Fu, Jie

مصطلحات موضوعية: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language

الوصف: Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task information are often unavailable or costly to collect, hindering the availability of current CL approaches for LMs. To address this limitation, we introduce $\textbf{MIGU}$ ($\textbf{M}$agn$\textbf{I}$tude-based $\textbf{G}$radient $\textbf{U}$pdating for continual learning), a rehearsal-free and task-label-free method that only updates the model parameters with large magnitudes of output in LMs' linear layers. MIGU is based on our observation that the L1-normalized magnitude distribution of the output in LMs' linear layers is different when the LM models deal with different task data. By imposing this simple constraint on the gradient update process, we can leverage the inherent behaviors of LMs, thereby unlocking their innate CL abilities. Our experiments demonstrate that MIGU is universally applicable to all three LM architectures (T5, RoBERTa, and Llama2), delivering state-of-the-art or on-par performance across continual finetuning and continual pre-training settings on four CL benchmarks. For example, MIGU brings a 15.2% average accuracy improvement over conventional parameter-efficient finetuning baselines in a 15-task CL benchmark. MIGU can also seamlessly integrate with all three existing CL types to further enhance performance. Code is available at \href{https://github.com/wenyudu/MIGUTest}{this https URL}.
Comment: preprint, 19 pages

الوصول الحر: http://arxiv.org/abs/2406.17245Test

View record in Arxiv

عرض رمز QR

أضف إلى السلة حذف من سلة الكتب
أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 4
4

تقرير

LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

المؤلفون: Du, Jiangshu, Wang, Yibo, Zhao, Wenting, Deng, Zhongfen, Liu, Shuaiqi, Lou, Renze, Zou, Henry Peng, Venkit, Pranav Narayanan, Zhang, Nan, Srinath, Mukund, Zhang, Haoran Ranran, Gupta, Vipul, Li, Yinghui, Li, Tao, Wang, Fei, Liu, Qin, Liu, Tianlin, Gao, Pengzhi, Xia, Congying, Xing, Chen, Cheng, Jiayang, Wang, Zhaowei, Su, Ying, Shah, Raj Sanjay, Guo, Ruohao, Gu, Jing, Li, Haoran, Wei, Kangda, Wang, Zihao, Cheng, Lu, Ranathunga, Surangika, Fang, Meng, Fu, Jie, Liu, Fei, Huang, Ruihong, Blanco, Eduardo, Cao, Yixin, Zhang, Rui, Yu, Philip S., Yin, Wenpeng

مصطلحات موضوعية: Computer Science - Computation and Language

الوصف: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload? This study focuses on the topic of LLMs assist NLP Researchers, particularly examining the effectiveness of LLM in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with "deficiency" labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) "LLMs as Reviewers", how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) "LLMs as Metareviewers", how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis.

الوصول الحر: http://arxiv.org/abs/2406.16253Test

View record in Arxiv

عرض رمز QR

أضف إلى السلة حذف من سلة الكتب
أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 5
5

تقرير

Efficient Continual Pre-training by Mitigating the Stability Gap

المؤلفون: Guo, Yiduo, Fu, Jie, Zhang, Huishuai, Zhao, Dongyan, Shen, Yikang

مصطلحات موضوعية: Computer Science - Computation and Language

الوصف: Continual pre-training has increasingly become the predominant approach for adapting Large Language Models (LLMs) to new domains. This process involves updating the pre-trained LLM with a corpus from a new domain, resulting in a shift in the training distribution. To study the behavior of LLMs during this shift, we measured the model's performance throughout the continual pre-training process. we observed a temporary performance drop at the beginning, followed by a recovery phase, a phenomenon known as the "stability gap," previously noted in vision models classifying new classes. To address this issue and enhance LLM performance within a fixed compute budget, we propose three effective strategies: (1) Continually pre-training the LLM on a subset with a proper size for multiple epochs, resulting in faster performance recovery than pre-training the LLM on a large corpus in a single epoch; (2) Pre-training the LLM only on high-quality sub-corpus, which rapidly boosts domain performance; and (3) Using a data mixture similar to the pre-training data to reduce distribution gap. We conduct various experiments on Llama-family models to validate the effectiveness of our strategies in both medical continual pre-training and instruction tuning. For example, our strategies improve the average medical task performance of the OpenLlama-3B model from 36.2% to 40.7% with only 40% of the original training budget and enhance the average general task performance without causing forgetting. Furthermore, we apply our strategies to the Llama-3-8B model. The resulting model, Llama-3-Physician, achieves the best medical performance among current open-source models, and performs comparably to or even better than GPT-4 on several medical benchmarks. We release our models at \url{https://huggingface.co/YiDuo1999/Llama-3-Physician-8B-InstructTest}.

الوصول الحر: http://arxiv.org/abs/2406.14833Test

View record in Arxiv

عرض رمز QR

أضف إلى السلة حذف من سلة الكتب
أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 6
6

تقرير

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

المؤلفون: Wang, Junjie, Zhang, Yin, Ji, Yatai, Zhang, Yuxiang, Jiang, Chunyang, Wang, Yubo, Zhu, Kang, Wang, Zekun, Wang, Tiezhen, Huang, Wenhao, Fu, Jie, Chen, Bei, Lin, Qunshu, Liu, Minghao, Zhang, Ge, Chen, Wenhu

مصطلحات موضوعية: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia

الوصف: Recent advancements in Large Multimodal Models (LMMs) have leveraged extensive multimodal datasets to enhance capabilities in complex knowledge-driven tasks. However, persistent challenges in perceptual and reasoning errors limit their efficacy, particularly in interpreting intricate visual data and deducing multimodal relationships. Addressing these issues, we introduce a novel dataset format, PIN (Paired and INterleaved multimodal documents), designed to significantly improve both the depth and breadth of multimodal training. The PIN format is built on three foundational principles: knowledge intensity, scalability, and support for diverse training modalities. This innovative format combines markdown files and comprehensive images to enrich training data with a dense knowledge structure and versatile training strategies. We present PIN-14M, an open-source dataset comprising 14 million samples derived from a diverse range of Chinese and English sources, tailored to include complex web and scientific content. This dataset is constructed meticulously to ensure data quality and ethical integrity, aiming to facilitate advanced training strategies and improve model robustness against common multimodal training pitfalls. Our initial results, forming the basis of this technical report, suggest significant potential for the PIN format in refining LMM performance, with plans for future expansions and detailed evaluations of its impact on model capabilities.

الوصول الحر: http://arxiv.org/abs/2406.13923Test

View record in Arxiv

عرض رمز QR

أضف إلى السلة حذف من سلة الكتب
أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 7
7

تقرير

GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory

المؤلفون: Wu, Haoze, Qiu, Zihan, Wang, Zili, Zhao, Hang, Fu, Jie

مصطلحات موضوعية: Computer Science - Machine Learning, Computer Science - Artificial Intelligence

الوصف: Mixture-of-Experts (MoE) has been demonstrated as an efficient method to scale up models. By dynamically and sparsely selecting activated experts, MoE can effectively reduce computational costs. Despite the success, we observe that many tokens in the MoE models have uncertain routing results. These tokens have nearly equal scores for choosing each expert, and we demonstrate that this uncertainty can lead to incorrect selections. Inspired by the Global Workspace Theory (GWT), we propose a new fine-tuning method, GW-MoE, to address this issue. The core idea is to broadcast the uncertain tokens across experts during fine-tuning. Therefore, these tokens can acquire the necessary knowledge from any expert during inference and become less sensitive to the choice. GW-MoE does not introduce additional inference overhead. We validate that GW can mitigate the uncertain problem and consistently improve in different tasks (text classification, question answering, summarization, code generation, and mathematical problem solving) and model sizes (650M and 8B parameters).

الوصول الحر: http://arxiv.org/abs/2406.12375Test

View record in Arxiv

عرض رمز QR

أضف إلى السلة حذف من سلة الكتب
أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 8
8

تقرير

Input Conditioned Graph Generation for Language Agents

المؤلفون: Vierling, Lukas, Fu, Jie, Chen, Kai

مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Artificial Intelligence

الوصف: Recent progress in Large Language Models (LLMs) and language agents has demonstrated significant promise for various future applications across multiple disciplines. While traditional approaches to language agents often rely on fixed, handcrafted designs, our research aims to develop both learnable and dynamic agents. Our method uses an existing framework that abstracts language agents as graphs. Within this graph framework, we aim to learn a model that can generate edges for every given input to the language agent. This allows us to generate edges that represent the flow of communication within the graph based on the given input, thereby adjusting the internal communication of a language agent. We learn to generate these edges using a pretrained LLM that is fine-tuned with reinforcement learning. This LLM can be fine-tuned on several datasets simultaneously, and we hypothesize that the model learns to adapt to these different domains during training, achieving good overall performance when encountering data from different domains during deployment. We demonstrate that our approach surpasses the previous static approach by nearly 6% accuracy on a combined dataset of MMLU and CMMLU, and by more than 10% when trained with a sparsity-inducing loss. It also performs superior in additional experiments conducted with the MMLU and Mini Crossword Puzzles datasets. The code is available at https://github.com/lukasVierling/DynamicGPTSwarmTest.

الوصول الحر: http://arxiv.org/abs/2406.11555Test

View record in Arxiv

عرض رمز QR

أضف إلى السلة حذف من سلة الكتب
أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 9
9

تقرير

MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

المؤلفون: Li, Lu, Zhang, Tianyu, Bu, Zhiqi, Wang, Suyuchen, He, Huan, Fu, Jie, Wu, Yonghui, Bian, Jiang, Chen, Yong, Bengio, Yoshua

مصطلحات موضوعية: Computer Science - Machine Learning

الوصف: Model merging has emerged as an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the objectives of different tasks can lead to trade-offs during model merging. In real-world applications, a set of solutions with various trade-offs can be more informative, helping practitioners make decisions based on diverse preferences. In this paper, we introduce a novel low-compute algorithm, Model Merging with Amortized Pareto Front (MAP). MAP identifies a Pareto set of scaling coefficients for merging multiple models to reflect the trade-offs. The core component of MAP is approximating the evaluation metrics of the various tasks using a quadratic approximation surrogate model derived from a pre-selected set of scaling coefficients, enabling amortized inference. Experimental results on vision and natural language processing tasks show that MAP can accurately identify the Pareto front. To further reduce the required computation of MAP, we propose (1) a Bayesian adaptive sampling algorithm and (2) a nested merging scheme with multiple stages.

الوصول الحر: http://arxiv.org/abs/2406.07529Test

View record in Arxiv

عرض رمز QR

أضف إلى السلة حذف من سلة الكتب
أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 10
10

تقرير

VCR: Visual Caption Restoration

المؤلفون: Zhang, Tianyu, Wang, Suyuchen, Li, Lu, Zhang, Ge, Taslakian, Perouz, Rajeswar, Sai, Fu, Jie, Liu, Bang, Bengio, Yoshua

مصطلحات موضوعية: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning

الوصف: We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images. While numerous works have integrated text embedded in images into visual question-answering tasks, approaches to these tasks generally rely on optical character recognition or masked language modeling, thus reducing the task to mainly text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct a dataset for VCR called VCR-Wiki using images with captions from Wikipedia, comprising 2.11M English and 346K Chinese entities in both easy and hard split variants. Our results reveal that current vision language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-Wiki and the data construction code to facilitate future research.
Comment: 17 pages, 2 figures

الوصول الحر: http://arxiv.org/abs/2406.06462Test

View record in Arxiv

عرض رمز QR

أضف إلى السلة حذف من سلة الكتب
أضف إلى المفضلة

محفوظ في: